Opticks + JUNO : Deploy GPU accelerated MC simulation ?

Opticks + JUNO : Deploy GPU accelerated MC production ?
Can Dirac(X) help ?

Open source, https://github.com/simoncblyth/opticks

Simon C Blyth, IHEP, CAS — The 11th DIRAC Users Workshop, IHEP — (19 September 2025)


Outline

newtons-opticks.png
 


(JUNO) Optical Photon Simulation Problem...


Optical Photon Simulation ≈ Ray Traced Image Rendering

simulation
photon parameters at sensors (PMTs)
rendering
pixel values at image plane

Much in common : geometry, light sources, optical physics

Many Applications of ray tracing :


NVIDIA RTX Generations 1=>4

ray trace performance : ~2x every ~2 years
Opticks optical speed directly scales with RT speed

AB_Substamp_ALL_Etime_vs_Photon_rtx_gen1_gen3.png

Event Time(s) vs PH(M)
PH(M) G1 G3 G1/G3
1 0.47 0.14 3.28
10 0.44 0.13 3.48
20 4.39 1.10 3.99
30 8.87 2.26 3.93
40 13.29 3.38 3.93
50 18.13 4.49 4.03
60 22.64 5.70 3.97
70 27.31 6.78 4.03
80 32.24 7.99 4.03
90 37.92 9.33 4.06
100 41.93 10.42 4.03

Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]

        
        

NVIDIA® OptiX™ Ray Tracing Engine -- Accessible GPU Ray Tracing

OptiX makes GPU ray tracing accessible

OptiX features

User provides (Green):

Latest Release : NVIDIA® OptiX™ 9.0.0 (Feb 2025)


GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112242.png

EVT=muon_cxs cxr_min.sh #12 : photons from muon crossing JUNO Scintillator


GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112243.png

EVT=muon_cxs cxr_min.sh #13


GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112244.png

EVT=muon_cxs cxr_min.sh #14


amdahl_p_sensitive.png

parallel/amdahl.png

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow

https://bitbucket.org/simoncblyth/opticks

Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory


Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 2x2 ?

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 2x2?


Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4 ?

Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4?

"Monolithic" scaling : very inefficient use of scarce GPU resources


OpticksClients + OpticksService : Share GPUs


Client.png


Split Workflow : Share GPUs between OpticksClients

OpticksClient : Detector Simulation Framework (Geant4 etc..) GPU
  • U4.h : collect gensteps
  • NP_CURL.h : HTTP POST (libcurl)
    • request : genstep array
    • response : hits array
OpticksService : CSGOptiX + NVIDIA OptiX + GPU
  • FastAPI : ASGI python web framework (alt: sanic, aiohttp) grandmetric.com/python-rest-frameworks-performance-comparison/
  • nanobind : python <=> C++ (alt: pybind11), uv : pip
  • CSGFoundry.h : load persisted geometry
  • CSGOptiXService.h : simulate
Prototype clients + service under development
  • scale up to MC production ? ~100/1000 clients ?
  • use multi-GPU to serve more clients ?

OR C++ web framework eg:
  • binding, less standard

Opticks MC production monolithic deployment with Dirac(X) ? Require:

NVIDIA GPU resources : expensive, high demand, difficult to fully utilize

Dirac(X) matching job to resources : Are Dirac "tags" expressive enough ?

GPU workloads becoming ubiquitous, others will have similar needs


Opticks MC production Server/Client deployment with Dirac(X) ?

GPU-less OpticksClients : Geant4 + JUNOSW + libcurl : HTTP POST to OpticksService

Restrictions/Quotas ?

Scale Opticks GPU optical photon simulation to large MC productions, with efficient GPU use
  • benefit from open source packages/examples with similar requirements