Huge CPU Memory+Time Expense
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
Monte Carlo Path Tracing
Path Tracing is built on top of Ray Tracing
Movies ≈ monte carlo optical photon simulations
Sum over increasing "bounces"
Top row : individual terms, Bottom row : cumulative
improving realism as more bounces included
Monte Carlo Path Tracing
=> Limited by ray tracing performance
CG Rendering "Simulation" | Particle Physics Simulation |
---|---|
simulates: image formation, vision | simulates photons: generation, propagation, detection |
(red, green, blue) | wavelength range eg 400-700 nm |
ignore polarization | polarization vector propagated throughout |
participating media: clouds,fog,fire [1] | bulk scattering: Rayleigh, MIE |
human exposure times | nanosecond time scales |
equilibrium assumption | transient phenomena |
ignores light speed, time | arrival time crucial, speed of light : 30 cm/ns |
Despite differences many techniques+hardware+software directly applicable to physics eg:
Potentially Useful CG techniques for "billion photon simulations"
[1] search for: "Volumetric Rendering Equation"
10 Giga Rays/s |
2018 : Leap in Ray Tracing speed
NVIDIA Quadro RTX GPU
if: 10 rays per photon => 1 billion photons/sec
Offload Ray Trace to Dedicated HW
SM : Streaming Multiprocessor
BVH : Bounding Volume Hierarchy
2018: NVIDIA RTX 2080Ti (Turing)
"Project Sol" : NVIDIA RTX Demo
real-time cinematic raytracing on single GPU
GTC 2020, NVIDIA Marbles at Night RTX Demo
GTC 2020, NVIDIA Marbles at Night RTX Demo
Realtime RTX render, 1 Ampere GPU
https://www.youtube.com/watch?v=NgcYLIvlp_k
NVIDIA GeForce RTX 3090 [USD 1499]
2560*1440 = 3.7M pixels -> x30 -> 110M pixels/s
DLSS : Deep Learning Super Sampling
Interfaces over NVIDIA Driver
Driver Updates : Independant of Application
Three Similar Interfaces over same RTX tech:
NVIDIA OptiX (Linux, Windows) [2009]
Vulkan RT (Linux, Windows) [final spec 2020]
Microsoft DXR : DirectX 12 Ray Tracing (Windows) [2018]
Metal Ray Tracing API (macOS) [introduced 2020[1]]
[1] https://developer.apple.com/videos/play/wwdc2020/10012/
Tree of Bounding Boxes (bbox)
OptiX Raytracing Pipeline
Analogous to OpenGL rasterization pipeline:
OptiX makes GPU ray tracing accessible
NVIDIA expertise:
https://developer.nvidia.com/rtx
User provides (Yellow):
[1] Turing+ GPUs eg NVIDIA TITAN RTX
GPU Ray Tracing APIs Converged
NVIDIA OptiX 6->7 : drastically slimmed down
More control/flexibility over everything.
Demands much more developer effort than OptiX 6
LATEST: Opticks transition from 6->7 is ongoing
Integrate NVIDIA OptiX with Geant4
GPU Resident Photons
Thrust: high level C++ access to CUDA
OptiX : single-ray programming model -> line-by-line translation
Outside/Inside Unions
dot(normal,rayDir) -> Enter/Exit
Complete Binary Tree, pick between pairs of nearest intersects:
UNION tA < tB | Enter B | Exit B | Miss B |
---|---|---|---|
Enter A | ReturnA | LoopA | ReturnA |
Exit A | ReturnA | ReturnB | ReturnA |
Miss A | ReturnB | ReturnB | ReturnMiss |
Materials/Surfaces -> GPU Texture
Material/Surface/Scintillator properties
Material/surface boundary : 4 indices
Primitives labelled with unique boundary index
simple/fast properties + reemission wavelength
G4 Structure Tree -> Instance+Global Arrays -> OptiX
Group structure into repeated instances + global remainder:
instancing -> huge memory savings for JUNO PMTs
Geant4 -> Opticks/GGeo -> OptiX
Multi-stage translation
Structural volumes : G4PVPlacement ->
Solid shapes : G4VSolid ->
Material/surface properties as function of wavelength
Translation steered by X4 package
https://bitbucket.org/simoncblyth/opticks/src/master/extg4/X4PhysicalVolume.hh
Form of GPU Detector Geometry
JUNO: ~300,000 GVolume -> ~10 GMergedMesh
JUNO: ~300,000 GVolume : mostly small repeated groups (PMTs)
GGeo/GInstancer
For each repeat+remainder create GMergedMesh:
GMergedMesh -> IAS+GAS
https://bitbucket.org/simoncblyth/opticks/src/master/ggeo/GInstancer.hh
IAS < Inst < Solid < Prim < Node
struct CSGFoundry { void upload(); // to GPU ... std::vector<CSGSolid> solid ; // compounds (eg PMT) std::vector<CSGPrim> prim ; std::vector<CSGNode> node ; // shapes, operators std::vector<float4> plan ; // planes std::vector<qat4> tran ; // CSG transforms std::vector<qat4> itra ; // inverse CSG transforms std::vector<qat4> inst ; // instance transforms // entire geometry in four GPU allocations CSGPrim* d_prim ; CSGNode* d_node ; float4* d_plan ; qat4* d_itra ; };
referencing by offset, count
GAS : Geometry Acceleration Structure
IAS : Instance Acceleration Structure
CSG : Constructive Solid Geometry
OptiX supports multiple instance levels : IAS->IAS->GAS BUT: Simple two-level is faster : works in hardware RT Cores
SBT : Shader Binding Table
Flexibly binds together:
Hidden in OptiX 1-6 APIs
General Geometry Translation
BUT: every new geometry likely to have problems
surface based intersection in float precision
JUNO Opticks OptiX 7 Ray-trace
"CSGFoundry" CPU/GPU Geometry
-e t0, : NOT 0 : 3084:sWorld : exclude global remainder volumes
Same viewpoint, vary GPU geometry
Recent Geometry Fixes
Substantial speedup after fixing geometry issues
ALL PMTs : 0.0097 -> 0.0061 (x1.6 faster)
ALL : 0.6240 -> 0.0054 (x155 faster)
idx | -e | time(s) | relative | enabled geometry description 3dbec4dc |
---|---|---|---|---|
0 | 5, | 0.0004 | 0.0643 | ONLY: 1:sStrutBallhead |
1 | 9, | 0.0004 | 0.0658 | ONLY: 130:sPanel |
2 | 7, | 0.0005 | 0.0782 | ONLY: 1:base_steel |
3 | 8, | 0.0006 | 0.0966 | ONLY: 1:uni_acrylic1 |
4 | 6, | 0.0006 | 0.1009 | ONLY: 1:uni1 |
5 | 1, | 0.0009 | 0.1476 | ONLY: 5:PMT_3inch_pmt_solid FAST cf 20in |
6 | 4, | 0.0015 | 0.2386 | ONLY: 4:mask_PMT_20inch_vetosMask |
7 | 3, | 0.0033 | 0.5373 | ONLY: 5:HamamatsuR12860sMask SLOW cf 3in |
8 | 0, | 0.0040 | 0.6556 | ONLY: 3084:sWorld |
9 | 2, | 0.0040 | 0.6627 | ONLY: 5:NNVTMCPPMTsMask SLOW cf 3in |
10 | t4, | 0.0050 | 0.8307 | EXCL: 4:mask_PMT_20inch_vetosMask |
11 | t2, | 0.0051 | 0.8391 | EXCL: 5:NNVTMCPPMTsMask |
12 | t3, | 0.0052 | 0.8514 | EXCL: 5:HamamatsuR12860sMask |
13 | t6, | 0.0053 | 0.8799 | EXCL: 1:uni1 |
14 | t7, | 0.0054 | 0.8809 | EXCL: 1:base_steel |
15 | t0 | 0.0054 | 0.8843 | ALL |
16 | t5, | 0.0054 | 0.8843 | EXCL: 1:sStrutBallhead |
17 | t9, | 0.0054 | 0.8855 | EXCL: 130:sPanel |
18 | t1, | 0.0054 | 0.8860 | EXCL: 5:PMT_3inch_pmt_solid |
19 | t8, | 0.0055 | 0.9013 | EXCL: 1:uni_acrylic1 |
20 | t0, | 0.0059 | 0.9753 | EXCL: 3084:sWorld |
21 | 1,2,3,4 | 0.0061 | 1.0000 | ONLY PMT |
22 | t8,0 | 0.0062 | 1.0217 | EXCL: 1:uni_acrylic1 3084:sWorld |
Random Aligned Bi-Simulation
Same inputs to Opticks and Geant4:
Common recording into OpticksEvents:
Aligned random consumption, direct comparison:
Bi-simulations of all JUNO solids, with millions of photons
Primary sources of problems
Primary cause : float vs double
Geant4 uses double everywhere, Opticks only sparingly (observed double costing 10x slowdown with RTX)
Conclude
Test Hardware + Software
Workstation
Software
IHEP GPU Cluster
Full JUNO Analytic Geometry j1808v5
Production Mode : does the minimum
Multi-Event Running, Measure:
Photon Launch Size : VRAM Limited
NVIDIA Quadro RTX 8000 (48 GB)
400M photons x 112 bytes ~ 45G
JUNO analytic, 400M photons from center | Speedup | |
---|---|---|
Geant4 Extrap. | 95,600 s (26 hrs) | |
Opticks RTX ON (i) | 58 s | 1650x |
JUNO analytic, 400M photons from center | Speedup | |
---|---|---|
Opticks RTX ON (i) | 58s | 1650x |
Opticks RTX OFF (i) | 275s | 350x |
Geant4 Extrap. | 95,600s (26 hrs) |
100M photon RTX times, avg of 10
Launch times for various geometries | |||
---|---|---|---|
Geometry | Launch (s) | Giga Rays/s | Relative to ana |
JUNO ana | 13.2 | 0.07 | |
JUNO tri.sw | 6.9 | 0.14 | 1.9x |
JUNO tri.hw | 2.2 | 0.45 | 6.0x |
Boxtest ana | 0.59 | 1.7 | |
Boxtest tri.sw | 0.62 | 1.6 | |
Boxtest tri.hw | 0.30 | 3.3 | 1.9x |
JUNO 15k triangles, 132M without instancing
Simple Boxtest geometry gets into ballpark
OptiX Performance Tools and Tricks, David Hart, NVIDIA https://developer.nvidia.com/siggraph/2019/video/sig915-vid
Highlights
Opticks : state-of-the-art GPU ray tracing applied to optical photon simulation and integrated with Geant4, eliminating memory and time bottlenecks.
- ~30yrs/Giga-$ of HW+SW optimization applied to optical simulation
- Drastic speedup -> better detector understanding -> greater precision
- any simulation limited by optical photons can benefit
- more photon limited -> more overall speedup (99% -> 100x)
https://bitbucket.org/simoncblyth/opticks | code repository |
https://simoncblyth.bitbucket.io | presentations and videos |
https://groups.io/g/opticks | forum/mailing list archive |
email:opticks+subscribe@groups.io | subscribe to mailing list |