Huge CPU Memory+Time Expense
EXPT | Reactor neutrino |
Daya Bay | neutrino oscillations |
JUNO | mass heirarchy + oscillations => NVIDIA CN Contacts |
Long baseline neutrino beam | |
DUNE | FermiLab->Sanford, LAr TPC, => Assistance from Fermilab Geant4 Group |
Neutrinoless double beta decay, dark matter, other search | |
LZ | LUX-ZEPLIN dark matter experiment, Sandford => NVIDIA US Contacts |
LEGEND | Large Enriched Germanium Experiment, Gran Sasso/SNOLAB |
SABRE | dark matter direct-detection, Australia |
AMoRE | Mo-based Rare process Experiment, S.Korea |
nEXO | next Enriched Xenon Observatory, LLNL |
NEXT-CRAB0 | High Pressure Gaseous Xenon TPC with a Direct VUV Camera Based Readout |
Neutrino telescope | |
KM3Net | Cubic Kilometre Neutrino Telescope, Mediterranean |
IceCube | IceCube Neutrino Observatory, South Pole |
Air shower : gamma-ray and cosmic-ray observatory | |
LHAASO | Large High Altitude Air Shower Observatory, Sichuan |
Accelerator | |
LHCb-RICH | LHCb ring imaging Cherenkov sub-detector, CERN => NVIDIA EU Contacts |
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
ray trace performance : ~2x every ~2 years
Flexible Ray Tracing Pipeline
Green: User Programs, Grey: Fixed function/HW
Analogous to OpenGL rasterization pipeline
OptiX makes GPU ray tracing accessible
OptiX features
User provides (Green):
Latest Release : NVIDIA® OptiX™ 8.0.0 (Aug 2023) NEW:
https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
CSGFoundry Model
Geant4 Geometry Model (JUNO: 400k PV, deep hierarchy)
PV | G4VPhysicalVolume | placed, refs LV |
LV | G4LogicalVolume | unplaced, refs SO |
SO | G4VSolid,G4BooleanSolid | binary tree of SO "nodes" |
Opticks CSGFoundry Geometry Model (index references)
struct | Notes | Geant4 Equivalent |
---|---|---|
CSGFoundry | vectors of the below, easily serialized + uploaded + used on GPU | None |
qat4 | 4x4 transform refs CSGSolid using "spare" 4th column (becomes IAS) | Transforms ref from PV |
CSGSolid | refs sequence of CSGPrim | Grouped Vols + Remainder |
CSGPrim | bbox, refs sequence of CSGNode, root of CSG Tree of nodes | root G4VSolid |
CSGNode | CSG node parameters (JUNO: ~23k CSGNode) | node G4VSolid |
NVIDIA OptiX 7/8 Geometry Acceleration Structures (JUNO: 1 IAS + 10 GAS, 2-level hierarchy)
IAS | Instance Acceleration Structures | JUNO: 1 IAS created from vector of ~50k qat4 (JUNO) |
GAS | Geometry Acceleration Structures | JUNO: 10 GAS created from 10 CSGSolid (which refs CSGPrim,CSGNode ) |
JUNO : Geant4 ~400k volumes "factorized" into 1 OptiX IAS referencing ~10 GAS
Full JUNO, Opticks, OptiX 7.5/8.0
raytrace 2M pixels | |
---|---|
TITAN RTX (1st) | 0.0118s (85 fps) |
Ada 5000 RTX (3rd) | 0.0031s (323 fps) |
Interactive ray traced visualization via OpenGL/OptiX interop
initial viewpoint, geometry exclusions via envvars
WASDQE+mouse 3D navigation
Render on NVIDIA RTX 5000 Ada Generation in 0.0060 s (not 0.0200 s)
Intersect with torus expensive on GPU
Triangulation using G4Polyhedron
G4Poly..::SetNumberOfRotationSteps
NumberOfRotationSteps | |
---|---|
HepPolyhedron Default | 24 |
Top Right | 48 |
Bottom Right | 480 |
Adjustable: precision of intersect, number of triangles
GPUs evolved for triangles => fast even with many
With list-node : shrink CSG tree
+------------------------------+ | U | | / \ | | / \ | | S U[A,B,C,D,E,F,G,H] | | / \ | | I J | +------------------------------+
Problematic deep CSG tree without list-node
+------------------------------------------+ | | | | | U | | / \ | | / \ | | / S | | U / \ | | / \ I J | | U H | | / \ | | U G | | / \ | | U F | | / \ | | U E | | / \ | | U D | | / \ | | U C | | / \ | | A B | | | +------------------------------------------+ U : Union S : Subtraction A-J : Tubs (cylinder) primitive
Simple G4MultiUnion is translated to Opticks list-node
TEST=medium_scan ~/opticks/cxs_min.sh
Generate optical only events with 1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH ## "Torch" running enables num_photon scan OPTICKS_NUM_PHOTON=M1,10,20,30,40,50,60,70,80,90,100 OPTICKS_NUM_EVENT=11 OPTICKS_EVENT_MODE=Hit
Compare simulation scans on two Dell Precision Workstations:
GPU (VRAM) | Arch | GPU Release | CUDA(RT) Cores | RTX Gen | Driver | CUDA | OptiX |
---|---|---|---|---|---|---|---|
NVIDIA TITAN RTX(24G) | Turing | Dec 2018 | 4,608(72) | 1st | 515.43 | 11.7 | 7.5 |
NVIDIA RTX 5000(32G) | Ada | Aug 2023 | 12,800(100) | 3rd | 550.76 | 12.4 | 8.0 |
PH(M) | G1 | G3 | G1/G3 |
---|---|---|---|
1 | 0.47 | 0.14 | 3.28 |
10 | 0.44 | 0.13 | 3.48 |
20 | 4.39 | 1.10 | 3.99 |
30 | 8.87 | 2.26 | 3.93 |
40 | 13.29 | 3.38 | 3.93 |
50 | 18.13 | 4.49 | 4.03 |
60 | 22.64 | 5.70 | 3.97 |
70 | 27.31 | 6.78 | 4.03 |
80 | 32.24 | 7.99 | 4.03 |
90 | 37.92 | 9.33 | 4.06 |
100 | 41.93 | 10.42 | 4.03 |
Optical simulation 4x faster 1st->3rd gen RTX, (3rd gen, Ada : 100M photons simulated in 10 seconds) [TMM PMT model]
Amdahls "Law" : Expected Speedup
Overall speed limited by serial portion
optical photon simulation, P ~ 99% of CPU time
Traditional simulation use:
Extra Benefits of Adopting Opticks
=> using Opticks improves CPU simulation too !!
Opticks : state-of-the-art GPU ray traced optical simulation integrated with Geant4, with automated geometry translation into GPU optimized form.
https://bitbucket.org/simoncblyth/opticks | day-to-day code repository |
https://simoncblyth.bitbucket.io | presentations and videos |
https://groups.io/g/opticks | forum/mailing list archive |
email: opticks+subscribe@groups.io | subscribe to mailing list |
simon.c.blyth@gmail.com | any questions |
New active bug reporting Opticks user : Ilker Parmaksiz
Chroma : S.Siebert, A.LaTorre
Chroma tracks photons thru triangle-mesh geometry, using BVH acceleration structure, authors claim:
With a CUDA GPU Chroma has propagated 2.5M photons per second in a detector with 29k PMTs. This is 200x faster than GEANT4.
Issues:
Made many efficiency fixes to work on mobile GPU:
https://bitbucket.org/simoncblyth/chroma/
CAVEAT : I LAST USED CHROMA IN 2015
Chroma : Disadvantages
Chroma : Fundamental Problem, triangles only
(g4daeview.py) Chroma Raycast of Daya Bay geometry (3x3 CUDA kernel lunches, 1.8s for 1.23M pixels, Geforce 750M GPU)
(g4daeview.py) Chroma raycast render of triangulated geometry
(g4daeview.py) OpenGL rasterized render of triangulated geometry
G4DAE Geometry Exporter
Exports Geant4 geometry into standard 3D files:
Many apps/libs can view/edit DAE/COLLADA files
Triangle Visualization Advantage
CUDA/OpenGL interoperation
(g4daeview.py) Chroma GPU photon propagation at 12 nanoseconds. The photons are generated by Geant4 simulation of a 100 GeV muon travelling from right to left. Photon colors indicate reemission (green), absorption(red), specular reflection (magenta), scattering(blue), no history (white).
(g4daeview.py) Chroma GPU photon propagation at 14 nanoseconds. The interface provides interactive control of the propagation time allowing any stage of the propagation to be viewed by scrubbing time backwards/forwards. The speed of this visualization is achieved by interoperation of CUDA kernels and OpenGL shaders accessing the same GPU resident photon propagation data.
(g4daeview.py) Initial photon positions of a Geant4 simulated muon that crosses between the Dayabay Near hall ADs. Colors represent photon wavelengths. Optical photons: collected in G4 StackAction, serialized, sent over ZeroMQ, deserialized, presented using OpenGL GLSL shaders.
GGeoview OptiX raycast
https://bitbucket.org/simoncblyth/env/src/tip/graphics/ggeoview/
Why switch to NVIDIA OptiX ?
DBNS geometry raycast comparison using mobile GPU
Performance improvement ~50x
"Opticks" started as synthesis:
Package name "Opticks", taken from world changing publication: