Status of JUNOSW + Opticks : GPU ray trace accelerated optical photon simulation

Status of JUNOSW + Opticks :
GPU ray trace accelerated optical photon simulation

Open source, https://github.com/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — JUNO Collaboration Meeting, Wuhan — 22 January 2026

Outline

Introduction
- Optical Photon Simulation Problem
- Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow
- OJ : Opticks+JUNOSW Automated Gitlab-CI/CD Releases

Opticks enhancements
- GPU Hit Merging, overall speedup > 200x
- Giga-photons eg: ~8 billions (> 0x1<<32)
- Server-Client impl

Recent Geometry Additions + Opticks Translations
- WaterDistributor from Yu Peidong
- MPMT 8-inch implementation from Yu Peidong
- EMF Coils implementation from Chen Jing

(JUNO) Optical Photon Simulation Problem...

Opticks solves this using GPU ray tracing via NVIDIA OptiX

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow

Opticks enables Geant4 based simulation to offload optical photon simulation to the GPU

NVIDIA GPU ray tracing of billions[1] of rays per second applied to optical simulation

[1] Actual performance depends on geometry and its modelling, JUNO optical simulation speedups > 1000x Geant4 have been measured

OJ : Opticks+JUNOSW Automated Gitlab-CI/CD Releases

OJ scripts just like J, except: source envset.sh plus needs to be on GPU node OR cluster:

source /cvmfs/opticks.ihep.ac.cn/oj/releases/J25.7.2_Opticks-v0.5.8/el9_amd64_gcc11/2026_01_18/envset.sh

Every day + when JUNOSW/Simulation changes => automated OJ build + test + release
For reproducibility, use dated reference releases eg 2026_01_18
tut_detsim.py --opticks-mode 1 is default (optical simulation on GPU)
prior releases mis-named J25.4.0... despite being latest J25.7.1... (Thanks Jiabin)

Many optional envvars can control Opticks, eg:

CUDA_VISIBLE_DEVICES: which device to use

GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112244.png

EVT=muon_cxs cxr_min.sh #14 (snapshot from animation of photon positions)

Large counts motivate merging of hits with same (pmtid,timebucket) : adding counts and keeping earliest time

Opticks Enhancements : directed by OJ muon production experience

Thanks to Zhenning for production test iterations

Add Opticks "lite" photons : used with JUNOSW "Muon" hits (--pmt-hit-type 2)

Opticks sphotonlite (16 bytes), 4x smaller than sphoton (64 bytes)
reduce overheads for big events, photons per launch ~250M -> ~500M [32 GB VRAM]

Removed 32-bit max photon limits -> simulation of giga optical photon events

handles monster double+ muon events, tested with > 8 billion photon events

Add CUDA implementation of hit merging (thrust::sort_by_key,reduce_by_key)

merge hits onto same PMT within time buckets (eg: 1 ns)
hits selected + merged on GPU
benefit twice : reduce download time + avoid slow CPU merge

GPU Hit Merging : High Level Parallelization with CUDA Thrust

struct key_functor {   //  Bitwise-OR (pmtid,timebucket) 
  float    timewindow;
  uint64_t operator()(const sphotonlite& p) const // 16+48 = 64
  {
     return (uint64_t(p.identity()) << 48) | uint64_t(p.time/timewindow);
  }
};

Opticks/sysrap SPM::merge_partial_select using CUDA Thrust (higher level C++ way to use CUDA)

Thrust method	Action	Note
`copy_if`	photon -> hit	using flagmask
`transform`	hit -> key	bitwise-OR (pmtid, timebucket)
`sort_by_key`	hit, key -> hit	hit ordered with same (pmtid,timebucket) contiguous
`reduce_by_key`	hit, key -> hitmerged	merge two hit : earlier time, sum hitcount

https://github.com/simoncblyth/opticks/blob/master/sysrap/SPM.cu

https://github.com/simoncblyth/opticks/blob/master/sysrap/sphotonlite.h

GPU Hit Merging : Avoids hiding Opticks performance

Detsim timings for one double muon event, ~150M photons, 28M hit, 6.4M mergedHit, 1ns bucket merge

JUNOSW J_Std1,2	--pmt-hit-type 1	--pmt-hit-type 2	Total processing time excluding Initialize
--opticks-mode 0	7112 s (118min)	6904 s (115min)	]junoSD_PMT_v2::Initialize → [junoSD_PMT_v2::EndOfEvent

Opticks+J [1]	(CPUMerge)+Coll.[s]	Kernel [s] PREL→POST	(GPUMerge)+Download POST→DOWN [s]	Total (no-init) [s] HEAD→RESET	Speedup vs J_Std1,2
hit	190.445	22.996	1.949	215.560	x32
hitmerged	6.712	22.988	0.543	30.400	x233
hitlite	146.471	23.108	0.484	170.226	x31
hitlitemerged	0.403	23.097	0.181	23.835	x221

huge speedup of hit collection [eg: 146 -> 0.4 seconds] from GPU pre-merging
GPU hit merging time more than compensated by reduced download time

Opticks+J : overall speedup > x200 [~2 hrs → ~30 s]

[1] : Workstation (Dell Precision 7960), NVIDIA RTX 5000 Ada, 3rd gen. RT cores, 32 GB

https://code.ihep.ac.cn/blyth/j/-/blob/main/zhenning_double_muon/detsim.sh

Split Workflow : Share GPUs between OpticksClients

OpticksClient : Detector Simulation Framework (Geant4 etc..) GPU

U4.h : collect gensteps
NP_CURL.h : HTTP POST (libcurl)
- request : genstep array
- response : hits array

OpticksService : CSGOptiX + NVIDIA OptiX + GPU

FastAPI : ASGI python web framework (alt: sanic, aiohttp) grandmetric.com/python-rest-frameworks-performance-comparison/
nanobind : python <=> C++ (alt: pybind11), uv : pip
CSGFoundry.h : load persisted geometry
CSGOptiXService.h : simulate

Prototype clients + service under development

scale up to MC production ? ~100/1000 clients ?
use multi-GPU to serve more clients ?

OR C++ web framework eg:

binding, less standard

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4 ?

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4?

Production running via "Monolithic" scaling : inefficient use of scarce GPU resources ?

OpticksClients + OpticksService : Share GPUs

Client.png

How many clients ? Depends on server, event photon count, network, ... (Experimentation needed)

Recent Geometry Additions + Translations

Recent Geometry Additions + Opticks Translations

Water Distributer (WD) implementation from Peidong
MPMT 8-inch PMT implementation from Peidong
EMF Coils implementation from Chen Jing, SJTU

Water Distributer (WD) implementation from Peidong

Complex shape and position

many R+T transforms, multi-union, long way from origin
tricky position between Water Pool and CD, crossing Tyvek

Multiple JUNOSW issues found and now fixed:

overlap between WD and Hama PMTs
pipe poking through Tyvek
pipe segments mis-placed
maximally coincident pipe cut

Branches :

yupd-water-distributor

yupd_bottompipe_adjust

yupd_waterdistributor_heightfix

WDP 89middle

Lower and Upper WD

WDP 3

WDP 3 : Uppermost Water Distributor in Water Pool

WDP 4

WDP 4 : Water Distributor at top of CD

WDP 5

WDP 5 : mid-CD

WDP 6

WDP 6 : Water Distributor at bottom of CD

WDP 7

WDP 7 : Water Distributor at bottom of pool

Water Distributer Issues (FIXED)

3D Raytrace directly reveals (MR1062):

clear overlap with PMTs
midview : pipe segments stick thru tyvek
lowervier : again upper segments thru tyvek

LOROS : simtrace boundary wide view at top

simtrace boundary wide view at top

LOROS : simtrace wide bottom view 20251218_100415

simtrace wide bottom view 20251218_100415

LOROS : simtrace primtab : snake thru the dead zone with normals

simtrace primtab : snake thru the dead zone with normals

LWDS : Simtrace Split Snake

cxt_min.sh : Simtrace Split Snake

LWDS : xz simtrace showing the mis-placed gap

XZ plane simtrace showing the mis-placed gap

MOI=264.136,-566.442,-20475,1500 # XY:middle-of-pipe Z:middle-of-outer-tyvek

WaterDistributor Overlap Fixed

cxt_min.sh : pipes now segmented avoiding overlap

MPMT 8-inch implementation from Peidong

Working with Peidong on MR 1063

https://code.ihep.ac.cn/JUNO/offline/junosw/-/merge_requests/1063

branch : yupd-8inch-pmt

details in talk by Peidong
changed tubs-minus-torus -> polycone
torus is computationally terrible, especially in CSG combination
- imprecise intersects even in double precision with Geant4
added PMT info serialization for Opticks

MPMT 19,20,21,construction

MPMT (branch yupd-8inch-pmt)

tubs-subtract-torus neck : abombination
polycone neck
construction of shape

MPMT : 16

MPMT in context

MPMT : 18

wide view showing added PMTs

EMF Coils implementation from Chen Jing, SJTU

Working with Chen Jing on MR 1086

https://code.ihep.ac.cn/JUNO/offline/junosw/-/merge_requests/1086

branch : Chenjing-EMFcoilsgeometry

Currently using G4Polycone with phi ranges

details in talk by Chen Jing
Number of solids : +642 (JUNO Total: 344 -> 986)
Number of volumes : +1096 (quarter of former impl)
analytic phicut not yet supported by Opticks
- triangulated geometry works OK
- Many solids => added flexible name matching to Opticks eg: ^s_EMF

EMF2 2

ELV=^s_EMF MOI=/tmp/emf2.npy EYE=0,0,-2 UP=0,1,0 cxr_min.sh ## raytrace

ELV : select solids

MOI : pick frame

View along EMF axis : (theta,phi) (56,-54) degrees

EMF2 3

EMF2 3 View from bottom of pool, with very large volumes excluded

EMF X306

Former EMFcoils impl:

many small pieces
unexplained breaks
+4000 volumes
Converted analytically to Opticks
- as no phicuts

Plans

Get Remaining Geometry Changes Merged + Validated

WaterDistributor:DONE, EMFcoils:TODO, 8-inch MPMT:DONE
Validations following all the recent geometry additions

Automated Validation/Performance Monitoring

integrate tests using GPU with gitlab CI/CD

Production Optimization

smarter CPU hit global->local, not 1-by-1 [reduce from 6.7 sec per 28M hits]
continue iterative productions (primary progress driver)
improve Release build, remove all Debug functionality
profile with NVIDIA tools, try Shader Execution Reordering
test JUNOSW+Opticks under DCI (Dirac)

Improve Opticks User Experience

examples, documentation, training, streamline repository

CI/CD : Continuous Integration/Continuous Deployment

Summary and Links

Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form. GPU hit merging drastically increases overall speedup when merging.

Overall JUNOSW+Opticks performance with GPU hit merging > 200x [3rd gen RTX]
NVIDIA Ray Trace Performance (2x each gen., every ~2 yrs)

https://github.com/simoncblyth/opticks	day-to-day code repository
https://simoncblyth.github.io	presentations and videos
https://groups.io/g/opticks	forum/mailing list archive
email: `opticks+subscribe@groups.io`	subscribe to mailing list
`simon.c.blyth@gmail.com`	any questions