Opticks solves this using GPU ray tracing via NVIDIA OptiX
Huge CPU Memory+Time Expense
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
Blackwell(4th Gen RTX) vs Ada(3rd Gen RTX):
Server Edition
| Gen | Model | Year | VRAM | CUDA Cores | RT (Ray tracing) | |||
|---|---|---|---|---|---|---|---|---|
| GB | GB/s | Cores | TFLOPS | Rise | ||||
| Turing | Quadro RTX 6000 | 2018 | 24 | 672 | 4,608 | 72 | ~34 | [1] |
| Ampere | RTX A6000 | 2020 | 48 | 768 | 10,752 | 84 | ~76 | 2.2x |
| Ada | RTX 6000 Ada | 2023 | 48 | 960 | 18,176 | 142 | ~211 | 2.7x |
| Blackwell | RTX PRO 6000 | 2025 | 96 | 1792 | 24,064 | 188 | ~380 | 1.8x |
NVIDIA RT TFLOPS: synthetic ray trace metric -- Turing -> Blackwell : 11x
(Equivalent FLOPs per Ray Intersection) x (Intersections per clock) x (Core Clock) x (Number of RT Cores)
[1] baseline : 2018 "World's First Ray-Tracing GPU -- 10 Gigarays/sec"
| PH(M) | G1 | G3 | G1/G3 |
|---|---|---|---|
| 1 | 0.47 | 0.14 | 3.28 |
| 10 | 0.44 | 0.13 | 3.48 |
| 20 | 4.39 | 1.10 | 3.99 |
| 30 | 8.87 | 2.26 | 3.93 |
| 40 | 13.29 | 3.38 | 3.93 |
| 50 | 18.13 | 4.49 | 4.03 |
| 60 | 22.64 | 5.70 | 3.97 |
| 70 | 27.31 | 6.78 | 4.03 |
| 80 | 32.24 | 7.99 | 4.03 |
| 90 | 37.92 | 9.33 | 4.06 |
| 100 | 41.93 | 10.42 | 4.03 |
Optical simulation 4x faster 1st->3rd gen RTX
3rd gen Ada : 100M ph sim. in 10s [TMM PMT model, Custom CSG]
Opticks optical simulation speed directly scales with ray tracing speed.
TMM : Transfer-Matrix Method
Flexible Ray Tracing Pipeline
Green: User Programs, Grey: Fixed function/HW
Analogous to OpenGL rasterization pipeline
OptiX makes GPU ray tracing accessible
OptiX features
User provides (Green):
Latest Release : NVIDIA® OptiX™ 9.1.0 (Dec 2025)
Opticks enables Geant4 based simulation to offload optical photon simulation to the GPU
NVIDIA GPU ray tracing of billions[1] of rays per second applied to optical simulation
[1] Actual performance depends on geometry and its modelling, JUNO optical simulation speedups > 1000x Geant4 have been measured
EVT=muon_cxs cxr_min.sh #12 : photons from muon crossing JUNO Scintillator
EVT=muon_cxs cxr_min.sh #13
EVT=muon_cxs cxr_min.sh #14
Add Opticks "lite" photons : used with JUNOSW "Muon" hits (--pmt-hit-type 2)
Removed 32-bit max photon limits -> simulation of giga optical photon events
Add CUDA implementation of hit merging (thrust::sort_by_key,reduce_by_key)
struct key_functor { // Bitwise-OR (pmtid,timebucket)
float timewindow;
uint64_t operator()(const sphotonlite& p) const // 16+48 = 64
{
return (uint64_t(p.identity()) << 48) | uint64_t(p.time/timewindow);
}
};
Opticks/sysrap SPM::merge_partial_select using CUDA Thrust (higher level C++ way to use CUDA)
| Thrust method | Action | Note |
|---|---|---|
| copy_if | photon -> hit | using flagmask |
| transform | hit -> key | bitwise-OR (pmtid, timebucket) |
| sort_by_key | hit, key -> hit | hit ordered with same (pmtid,timebucket) contiguous |
| reduce_by_key | hit, key -> hitmerged | merge two hit : earlier time, sum hitcount |
https://github.com/simoncblyth/opticks/blob/master/sysrap/SPM.cu
https://github.com/simoncblyth/opticks/blob/master/sysrap/sphotonlite.h
Simulation times (excl. init) for one double muon event, ~150M photons, 28M hit, 6.4M mergedHit, 1ns bucket merge
| JUNOSW Standard | full PMT hit | summary "muon" hit |
|---|---|---|
| 7112 s (118min) | 6904 s (115min) |
| Opticks+JUNOSW hit_mode | merge | ph-lite | Simulate Kernel [s] |
|
|
Total [s] | Speedup vs std |
|---|---|---|---|---|---|---|---|
| hit | CPU | ✘ | 22.996 | 1.949 | 190.445 | 215.560 | x32 |
| hitlite | ✔ | 23.108 | 0.484 | 146.471 | 170.226 | x31 | |
| hitmerged | GPU | ✘ | 22.988 | 0.543 | 6.712 | 30.400 | x233 |
| hitlitemerged | ✔ | 23.097 | 0.181 | 0.403 | 23.835 | x221 |
Opticks+J : overall speedup > x200 [~2 hrs → ~30 s]
Geant4 + Opticks + NVIDIA OptiX : Production Scaling ?
=> Client-Server architecture
Geant4 + Opticks + NVIDIA OptiX : Monolith x4 ?
Geant4 + Opticks + NVIDIA OptiX : Monolith x16 ?
"Monolithic" scaling : very inefficient use of scarce GPU resources
| Package | Role | Client | Server |
|---|---|---|---|
| SysRap | Geometry and event types, array NP.hh | ✔ | ✔ |
| CSG | CPU/GPU geometry model | ✔ | ✔ |
| QUDARap | CUDA optical simulation | ✘ | ✔ |
| CSGOptiX | OptiX 7+ geometry, GPU ray trace | ✘ | ✔ |
| U4 | Geometry convert, collect gensteps, return hits | ✔ | ✘ |
| G4CX | Top level interface, acts via SSimulator | ✔ | ✘ |
| Client | Server | |
|---|---|---|
| depends: | NVIDIA GPU + CUDA + OptiX | Geant4, U4, G4CX |
| depends: | libcurl 7.76.1+, NP_CURL.h | python, FastAPI, nanobind |
| SSimulator: | SOpticksClientSimulator | CSGOptiX |
Client build from common Opticks codebase with OPTICKS_CONFIG=Client
Basis for Opticks Client - using libcurl 7.76.1+ (2021) - default in many Linux distro
NP.hh : C++ array with NumPy serialization (Opticks numerical base), NP_CURL.h headers:
| HTTP metadata headers | Note |
|---|---|
| x-opticks-shape | array shape eg "(10,6,4)" for 10 gensteps |
| x-opticks-dtype | eg "float32" |
| x-opticks-index | eg eventID controlling random number stream offsets |
| x-opticks-count | eg: number of photons in genstep, cost of request |
| x-opticks-meta | general eg geometry root node digest - assert same geometry |
HTTP 429, 503 : Too Many Requests, Service Unavailable (temporary downtime)
https://github.com/simoncblyth/np/blob/master/NP_CURL.h -- https://github.com/simoncblyth/np/blob/master/NP.hh
CSGOptiX/tests/CSGOptiXService_FastAPI_test/CSGOptiXService_FastAPI_test.sh
Prototype client + service operational
"Roll your own" Prototype Server : Educational, BUT:
Potential basis for Opticks Server
Triton[1] : open-source, designed to accelerate AI deployment at scale
Wrap Opticks as Custom C++ Triton Backend "model(s)" ?
Make request load like inference : smaller, more uniform
[1] https://developer.nvidia.com/dynamo-triton
[2] cudaMallocFromPoolAsync
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation.
GPU-less OpticksClient + OpticksService in development, bringing Opticks everywhere + improving GPU utilization.
| https://github.com/simoncblyth/opticks | day-to-day code repository |
| https://simoncblyth.github.io | presentations and videos |
| https://groups.io/g/opticks | forum/mailing list archive |
| email: opticks+subscribe@groups.io | subscribe to mailing list |
| simon.c.blyth@gmail.com | any questions |