Status of JUNOSW + Opticks : GPU ray trace accelerated optical photon simulation

Status of JUNOSW + Opticks :
GPU ray trace accelerated optical photon simulation

Open source, https://github.com/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — JUNO Collaboration Meeting, Wuhan — 22 January 2026


Outline

newtons-opticks.png

(JUNO) Optical Photon Simulation Problem...

Opticks solves this using GPU ray tracing via NVIDIA OptiX


Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow

Opticks enables Geant4 based simulation to offload optical photon simulation to the GPU

NVIDIA GPU ray tracing of billions[1] of rays per second applied to optical simulation

[1] Actual performance depends on geometry and its modelling, JUNO optical simulation speedups > 1000x Geant4 have been measured


OJ : Opticks+JUNOSW Automated Gitlab-CI/CD Releases

OJ scripts just like J, except: source envset.sh plus needs to be on GPU node OR cluster:

source /cvmfs/opticks.ihep.ac.cn/oj/releases/J25.7.2_Opticks-v0.5.8/el9_amd64_gcc11/2026_01_18/envset.sh

Many optional envvars can control Opticks, eg:

CUDA_VISIBLE_DEVICES
which device to use

GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112244.png

EVT=muon_cxs cxr_min.sh #14   (snapshot from animation of photon positions)

Large counts motivate merging of hits with same (pmtid,timebucket) : adding counts and keeping earliest time


Opticks Enhancements : directed by OJ muon production experience

Add Opticks "lite" photons : used with JUNOSW "Muon" hits (--pmt-hit-type 2)

Removed 32-bit max photon limits -> simulation of giga optical photon events

Add CUDA implementation of hit merging (thrust::sort_by_key,reduce_by_key)


GPU Hit Merging : High Level Parallelization with CUDA Thrust

struct key_functor {   //  Bitwise-OR (pmtid,timebucket) 
  float    timewindow;
  uint64_t operator()(const sphotonlite& p) const // 16+48 = 64
  {
     return (uint64_t(p.identity()) << 48) | uint64_t(p.time/timewindow);
  }
};

Opticks/sysrap SPM::merge_partial_select using CUDA Thrust (higher level C++ way to use CUDA)

Thrust method Action Note
copy_if photon -> hit using flagmask
transform hit -> key bitwise-OR (pmtid, timebucket)
sort_by_key hit, key -> hit hit ordered with same (pmtid,timebucket) contiguous
reduce_by_key hit, key -> hitmerged merge two hit : earlier time, sum hitcount

https://github.com/simoncblyth/opticks/blob/master/sysrap/SPM.cu

https://github.com/simoncblyth/opticks/blob/master/sysrap/sphotonlite.h


GPU Hit Merging : Avoids hiding Opticks performance

Detsim timings for one double muon event, ~150M photons, 28M hit, 6.4M mergedHit, 1ns bucket merge

JUNOSW J_Std1,2 --pmt-hit-type 1 --pmt-hit-type 2 Total processing time excluding Initialize
--opticks-mode 0 7112 s (118min) 6904 s (115min) ]junoSD_PMT_v2::Initialize → [junoSD_PMT_v2::EndOfEvent
Opticks+J [1] (CPUMerge)+Coll.[s] Kernel [s] PREL→POST
(GPUMerge)+Download
POST→DOWN [s]
Total (no-init) [s]
HEAD→RESET
Speedup vs
J_Std1,2
hit   190.445   22.996   1.949   215.560     x32
hitmerged       6.712   22.988   0.543     30.400   x233
hitlite   146.471   23.108   0.484   170.226     x31
hitlitemerged       0.403   23.097 0.181     23.835   x221

Opticks+J : overall speedup > x200 [~2 hrs → ~30 s]

[1] : Workstation (Dell Precision 7960), NVIDIA RTX 5000 Ada, 3rd gen. RT cores, 32 GB

https://code.ihep.ac.cn/blyth/j/-/blob/main/zhenning_double_muon/detsim.sh


Split Workflow : Share GPUs between OpticksClients

OpticksClient : Detector Simulation Framework (Geant4 etc..) GPU
  • U4.h : collect gensteps
  • NP_CURL.h : HTTP POST (libcurl)
    • request : genstep array
    • response : hits array
OpticksService : CSGOptiX + NVIDIA OptiX + GPU
  • FastAPI : ASGI python web framework (alt: sanic, aiohttp) grandmetric.com/python-rest-frameworks-performance-comparison/
  • nanobind : python <=> C++ (alt: pybind11), uv : pip
  • CSGFoundry.h : load persisted geometry
  • CSGOptiXService.h : simulate
Prototype clients + service under development
  • scale up to MC production ? ~100/1000 clients ?
  • use multi-GPU to serve more clients ?

OR C++ web framework eg:
  • binding, less standard

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4 ?

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4?

Production running via "Monolithic" scaling : inefficient use of scarce GPU resources ?


OpticksClients + OpticksService : Share GPUs


Client.png

How many clients ? Depends on server, event photon count, network, ... (Experimentation needed)


Recent Geometry Additions + Translations

Recent Geometry Additions + Opticks Translations


Water Distributer (WD) implementation from Peidong

Complex shape and position

Multiple JUNOSW issues found and now fixed:

Branches :

yupd-water-distributor

yupd_bottompipe_adjust

yupd_waterdistributor_heightfix


WDP 89middle

Lower and Upper WD


WDP 3

WDP 3 : Uppermost Water Distributor in Water Pool


WDP 4

WDP 4 : Water Distributor at top of CD


WDP 5

WDP 5 : mid-CD


WDP 6

WDP 6 : Water Distributor at bottom of CD


WDP 7

WDP 7 :  Water Distributor at bottom of pool


Water Distributer Issues (FIXED)

3D Raytrace directly reveals (MR1062):


LOROS : simtrace boundary wide view at top

simtrace boundary wide view at top


LOROS : simtrace wide bottom view 20251218_100415

simtrace wide bottom view 20251218_100415


LOROS : simtrace primtab : snake thru the dead zone with normals

simtrace primtab : snake thru the dead zone with normals


LWDS : Simtrace Split Snake

cxt_min.sh : Simtrace Split Snake


LWDS : xz simtrace showing the mis-placed gap

XZ plane simtrace showing the mis-placed gap

MOI=264.136,-566.442,-20475,1500  # XY:middle-of-pipe Z:middle-of-outer-tyvek


WaterDistributor Overlap Fixed

cxt_min.sh : pipes now segmented avoiding overlap


MPMT 8-inch implementation from Peidong

Working with Peidong on MR 1063


MPMT 19,20,21,construction

MPMT (branch yupd-8inch-pmt)


MPMT : 16

MPMT in context


MPMT : 18

wide view showing added PMTs


EMF Coils implementation from Chen Jing, SJTU

Working with Chen Jing on MR 1086

Currently using G4Polycone with phi ranges


EMF2 2

ELV=^s_EMF MOI=/tmp/emf2.npy EYE=0,0,-2 UP=0,1,0 cxr_min.sh  ## raytrace

ELV : select solids

MOI : pick frame

View along EMF axis : (theta,phi) (56,-54) degrees


EMF2 3

EMF2 3 View from bottom of pool, with very large volumes excluded


EMF X306

Former EMFcoils impl:


Plans

Get Remaining Geometry Changes Merged + Validated

Automated Validation/Performance Monitoring

Production Optimization

Improve Opticks User Experience

CI/CD : Continuous Integration/Continuous Deployment