Opticks + JUNO : Deploy GPU accelerated MC simulation ?
Opticks + JUNO : Deploy GPU accelerated MC production ?
Can Dirac(X) help ?
Open source, https://github.com/simoncblyth/opticks
Simon C Blyth, IHEP, CAS — The 11th DIRAC Users Workshop, IHEP — (19 September 2025)
Outline
- Optical Photon Simulation : Context and Problem
- (JUNO) Optical Photon Simulation Problem...
- Optical Photon Simulation ≈ Ray Traced Image Rendering
- NVIDIA RTX Generations 1=>4: RT performance : ~2x every ~2 years
- Opticks Optical simulation 4x faster 1st->3rd gen RTX
- NVIDIA OptiX : Ray Tracing Engine
- Photons from muon crossing JUNO Scintillator
- Parallelized speedup
- Scaling Opticks Solution to Optical Photon Simulation Problem
- Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow
- Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 2x2
- Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4
- OpticksClients + OpticksService : Split the Monolith
- Split Workflow : Share GPUs between OpticksClients
- Opticks MC production monolithic deployment with Dirac(X) ?
- Opticks MC production server/client deployment with Dirac(X) ?
- Summary + Links
(JUNO) Optical Photon Simulation Problem...
Optical Photon Simulation ≈ Ray Traced Image Rendering
- simulation
- photon parameters at sensors (PMTs)
- rendering
- pixel values at image plane
Much in common : geometry, light sources, optical physics
- both limited by ray geometry intersection, aka ray tracing
Many Applications of ray tracing :
- advertising, design, architecture, films, games,...
- -> huge efforts to improve hw+sw over 30 yrs
NVIDIA RTX Generations 1=>4
- RT Core : ray trace dedicated GPU hardware
- Each gen : doubled ray tracing speed:
- Blackwell (2025) ~2x ray trace over Ada
- Ada (2022) ~2x ray trace over Ampere
- Ampere (2020) ~2x ray trace over Turing (2018)
- NVIDIA Blackwell 4th Gen RTX : released 2025/01
- ray trace performance : ~2x every ~2 years
- Opticks optical speed directly scales with RT speed
AB_Substamp_ALL_Etime_vs_Photon_rtx_gen1_gen3.png
Event Time(s) vs PH(M)
PH(M) |
G1 |
G3 |
G1/G3 |
1 |
0.47 |
0.14 |
3.28 |
10 |
0.44 |
0.13 |
3.48 |
20 |
4.39 |
1.10 |
3.99 |
30 |
8.87 |
2.26 |
3.93 |
40 |
13.29 |
3.38 |
3.93 |
50 |
18.13 |
4.49 |
4.03 |
60 |
22.64 |
5.70 |
3.97 |
70 |
27.31 |
6.78 |
4.03 |
80 |
32.24 |
7.99 |
4.03 |
90 |
37.92 |
9.33 |
4.06 |
100 |
41.93 |
10.42 |
4.03 |
Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]
NVIDIA® OptiX™ Ray Tracing Engine -- Accessible GPU Ray Tracing
OptiX makes GPU ray tracing accessible
- Programmable GPU-accelerated Ray-Tracing Pipeline
- Single-ray shader programming model using CUDA
- ray tracing acceleration using RT Cores (RTX GPUs)
- "...free to use within any application..."
OptiX features
- acceleration structure creation + traversal (eg BVH)
- instanced sharing of geometry + acceleration structures
- compiler optimized for GPU ray tracing
User provides (Green):
- ray generation
- geometry bounding boxes
- intersect functions
- instance transforms
Latest Release : NVIDIA® OptiX™ 9.0.0 (Feb 2025)
- NVIDIA R570 driver or newer
- EACH RELEASE, NEWER MINIMUM DRIVER
GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112242.png
EVT=muon_cxs cxr_min.sh #12 : photons from muon crossing JUNO Scintillator
GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112243.png
EVT=muon_cxs cxr_min.sh #13
GEOM_J25_4_0_opticks_Debug_cxr_min_muon_cxs_20250707_112244.png
EVT=muon_cxs cxr_min.sh #14
amdahl_p_sensitive.png
Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow
https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 2x2 ?
Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 2x2?
Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4 ?
Geant4 + Opticks + NVIDIA OptiX : Hybrid Workflow 4x4?
"Monolithic" scaling : very inefficient use of scarce GPU resources
OpticksClients + OpticksService : Share GPUs
Client.png
Split Workflow : Share GPUs between OpticksClients
- OpticksClient : Detector Simulation Framework (Geant4 etc..) GPU
- U4.h : collect gensteps
- NP_CURL.h : HTTP POST (libcurl)
- request : genstep array
- response : hits array
- OpticksService : CSGOptiX + NVIDIA OptiX + GPU
- FastAPI : ASGI python web framework (alt: sanic, aiohttp) grandmetric.com/python-rest-frameworks-performance-comparison/
- nanobind : python <=> C++ (alt: pybind11), uv : pip
- CSGFoundry.h : load persisted geometry
- CSGOptiXService.h : simulate
- Prototype clients + service under development
- scale up to MC production ? ~100/1000 clients ?
- use multi-GPU to serve more clients ?
- OR C++ web framework eg:
-
Opticks MC production monolithic deployment with Dirac(X) ? Require:
- NVIDIA GPU (5~10x performance benefit with RTX GPUs)
- docker container support : https://hub.docker.com/r/simoncblyth/cuda/tags
- NVIDIA Container Toolkit
- Minimum NVIDIA Driver version requirement of NVIDIA OptiX
- NVIDIA OptiX : implementation is within Driver
NVIDIA GPU resources : expensive, high demand, difficult to fully utilize
Dirac(X) matching job to resources : Are Dirac "tags" expressive enough ?
- requirement for NVIDIA GPU
- mandatory NVIDIA Driver version >= ?
- GPU type order of preference (eg RTX generation "0",1,2,3,..) ?
- GPU VRAM requirement, min/max ?
GPU workloads becoming ubiquitous, others will have similar needs
Opticks MC production Server/Client deployment with Dirac(X) ?
GPU-less OpticksClients : Geant4 + JUNOSW + libcurl : HTTP POST to OpticksService
Restrictions/Quotas ?
- HTTP POST request/response jobs<->server
- ~100 MB of network traffic per simulated event
- gensteps (~10k,6,4) float [~1 MB]
- hits ( ~1M, 4, 4) float [64 MB]
- proxy to avoid blocks ? (libcurl very mature)
- complications : authentication, authorization, accounting
- Scale Opticks GPU optical photon simulation to large MC productions, with efficient GPU use
- benefit from open source packages/examples with similar requirements
Summary and Links
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4,
with automated geometry translation.
GPU-less OpticksClient + OpticksService in development,
bringing Opticks everywhere + improving GPU utilization.
- NVIDIA Ray Trace Performance continues rapid progress (2x each gen., every ~2 yrs)
- any simulation limited by optical photons can benefit from Opticks
- more photon limited -> more overall speedup (99% -> ~90x)