http://simoncblyth.bitbucket.io/env/presentation/optical_photon_simulation_with_nvidia_optix.html (July 2015) http://simoncblyth.bitbucket.io/env/presentation/gpu_accelerated_geant4_simulation.html (Jan 2015)
OptiX Ray Tracing
Extreme speed ~200M ray intersections/second/GPU, regular releases, performance scales with CUDA cores across multiple GPUs.
Lack of multi-GPU support
Production running demands efficient use of multiple GPUs. Lack of this difficult to implement feature is a show stopper for Chroma use in production.
Chroma Features
My additions to Chroma
Missing Features
https://bitbucket.org/chroma/chroma
https://bitbucket.org/simoncblyth/chroma (my fork)
OptiX Tutorial App
Image pixels calculated by recursively bouncing rays around geometry doing shadow, reflection, refraction calculations. Runs at interactive speeds with GeForce GT 750M.
OptiX provides: CUDA compiler optimized for Ray Tracing
NVIDIA expertise on efficient GPU/multi-GPU usage
https://developer.nvidia.com/optix
https://research.nvidia.com/publication/optix-general-purpose-ray-tracing-engine
OptiX Glass Sample App
Realistic image creation uses physically based techniques and material definitions. Obvious parallels:
Same rate determining step: geometry intersection
Applying techniques/hardware developed for fast ray tracing can be hugely beneficial to optical photon simulation.
Render Split into 3x3 CUDA kernel launches, 1 thread per pixel, ~1.8s for 1.23M pixels, 2.4M tris (with [1])
[1] | MacBook Pro (2013), NVIDIA GeForce GT 750M 2048 MB (384 cores); Workstation GPU performance expected to scale by core count |
GGeoview OptiX raycast
https://bitbucket.org/simoncblyth/env/src/tip/graphics/ggeoview/
DBNS geometry raycast comparison using mobile GPU
Performance improvement ~50x
Performance Linearity with CUDA cores
OptiX sample rendering with 2 GPU IHEP workstation,
Performance linear with GPU cores, compared to laptop:
Future scaling possibilities, with VCA
OptiX apps can connect to remote Visual Computing Appliances
Clusters of ~10 VCAs are in use by design/advertising companies for interactive product rendering.
http://www.nvidia.com/object/visual-computing-appliance.html (8 Maxwell GPUs)
OptiX Control Flow
OptiX provides a ray tracing pipeline analogous to OpenGL rasterization pipeline.
Higher level API than pure CUDA, eg:
Optical Photon Simulation port currently using:
https://research.nvidia.com/sites/default/files/publications/Parker10Optix_1.pdf
Defer to NVIDIA
Adoption of OptiX is compelling
Costs of adoption
~10 Packages Developed
Organized by dependencies
Interop between OpenGL/OptiX/Thrust/CUDA
Externals:
See backup for details, source links
Basis packages
Geometry packages
GPU library interface packages
Main package
https://bitbucket.org/simoncblyth/env/src/tip/graphics/ggeoview/
Some details of GPU developments described over the next pages
Optical Physics
Supplying the OptiX Programs
Handling Outputs
Synthesis of sources
rtTrace OptiX fast geometry intersection
Photon p ; State s ; PerRayData prd ; while(bounce < bounce_max) // PSEUDO-CODE { bounce++ ray = optix::make_Ray(p.pos, p.dir,...) rtTrace(geom, ray, prd) if(!prd.boundary) break // MISS cmd = propagate_to_boundary(p, s) if(cmd == BREAK) break // ABSORB if(cmd == CONTINUE) continue // REEMIT, SCATTER // survivors pass to boundary if(s.surface_index) { cmd = propagate_at_surface(p, s, g) if(cmd == BREAK) break // SURFACE_ABSORB, SURFACE_DETECT if(cmd == CONTINUE) continue // REFLECT_DIFFUSE, REFLECT_SPECULAR } else { propagate_at_boundary(p, s) // BOUNDARY_REFLECT BOUNDARY_TRANSMIT } }
https://bitbucket.org/simoncblyth/env/src/tip/graphics/ggeoview/cu/generate.cu
Approach
Texture lookups of material/surface properties and reemission wavelengths keeps kernels simple
https://bitbucket.org/simoncblyth/env/src/tip/graphics/ggeoview/cu/rayleigh.h
https://bitbucket.org/simoncblyth/env/src/tip/graphics/ggeoview/cu/propagate.h
cuRAND library from CUDA toolkit features:
cuRAND Initialization demands large stack size
Stack sizes 10x typical for OptiX programs were needed, resulting in slow OptiX running.
Workaround:
Packaged solution into CUDAWrap
Fast GPU texture lookup
GPUs contain hardware dedicated to fast texture lookup and interpolation. Using texture lookup for all properties and reemission wavelengths keeps OptiX programs simple.
AssimpWrap creates GGeo boundary instances and labels triangles with boundary indices, boundaries contain:
Properties are interpolated onto a common wavelength domain
Interleaved properties used to create single boundary texture 2d (wavelength, qty line) containing ~50 boundaries, 4 float4 each. CUDA tex2d property lookup:
float nmi = (nm - wavelength_domain.x)/wavelength_domain.z + 0.5f ; float4 material1 = tex2D(wavelength_texture, nmi, line + 0.5f ); float refractive_index = material1.x ; float absorption_length = material1.y ; float scattering_length = material1.z ;
https://github.com/simoncblyth/assimp (my fork of Assimp)
Inverting Reemission CDF allows using texture lookup to obtain reemission wavelength from uniform random throws. Using 4096 probability bins.
128 bit compressed record
Compression necessary to work with ~30M records (30M * 128bit = ~500 MB)
GPU memory 2 GB
Up to 10 steps of the photon propagation are recorded.
Photon buffer : 4 * float4 = 512 bits/photon
Record buffer : 2 * short4 = 2*16*4 = 128 bits/record
Compression uses known domains of position (geometry center, extent), time (0:200ns), wavelength, polarization.
Union trickery allows recording ints into floats
Scintillation Photons colored by material
Visualization of 30M scintillation photon records from an 100 GeV muon crossing Dayabay AD. Primaries are simulated by Geant4, Scintillation "steps" of the primaries are transferred to the GPU, where photons are generated, propagated using NVIDIA OptiX and visualized using OpenGL. The dots represent propagation step positions with colors corresponding to materials.
Selecting photons by flag/material sequences, requires indexing integer sequences.
Indexing history/material sequences for 3M photons:
Packaged indexing into ThrustRap ThrustIdx
Selection by flag sequence
Selection of scintillation photons by flag sequence (all boundary transmit) from a 100 GeV muon crossing Dayabay AD. Primaries are simulated by Geant4, Scintillation "steps" of the primaries are transferred to the GPU. The dots represent OptiX calculated photon steps with colors corresponding to materials.
Distributed with CUDA
GPU performance without developing CUDA kernels
Avoid CPU for performance
Once debugged can skip:
Possible approach:
max_record:10 max_bounce:9 Cerenkov Ck*4.59 Scintillation photons 0.61M 2.8M 2.8M --(bytes)-------------------------------------- genstep size 736K 1.3M photons size 37M 172M records size 97M 430M --(seconds)------------------------------------ createOpenGLCtx 0.692 - 0.599 loadGeometry 1.570 - 1.302 interpGeometry 0.211 - 0.190 initOptiX 4.216 - 6.521 loadGenstep 0.011 - 0.014 hostEvtAllocation ** 3.540 16.275 16.179 uploadEvt 0.232 1.066 0.552 generatePropagate ++ 1.404 6.453 7.907 evtDownload ** 0.348 1.602 1.780 evtSave ** 0.437 2.008 2.006 sequenceIndex 0.134 0.614 0.359 ----------------------------------------------- ** scales by photon count
OptiX Julia Set Sample
Ray tracing with purely analytic geometry, ie no triangles. Application to PMTs may allow drastic reduction in memory usage and access costs.
The large number of PMTs may require a more memory efficient geometry representation using OptiX features:
Memory access (not calculation) typically limits GPU performance, improving memory efficiency expected to improve performance.
Test New Framework with IHEP 4-GPU workstation (together with Tao Lin)
Optical Photon Simulation
G4DAE Geometry Exporter
On the following pages:
GGeo/OptiX using inverted CDF reemission wavelength lookups (4096 bins)
Geant4/DetSim wavelength distribution has a blip at 200nm, corresponding to edge of water refractive index properties.
Cerenkov Photon Steps
Cerenkov photons steps from a 100 GeV muon crossing Dayabay AD. Primaries are simulated by Geant4, Cerenkov "steps" of the primaries are transferred to the GPU. The dots represent OptiX calculated photon steps with colors corresponding to materials.
Replacing Python, NumPy, PyZMQ
Boost Libraries (filesystem, thread, program_options, logging, regex, ptree, Asio) and Asio-ZMQ, ZMQ used to replace python packages.
NPY format convenient for C++/Python interop:
a = np.load("photons.npy")
Array persistency/manipulations inspired by NumPy, using NPY serialization format
Asynchronous IO of Geant4 Steps, Photons, Hits. Communicates with remote G4DAEOpticks process, receiving steps and replying with hits.
cuRAND init and persist curandState (pure CUDA)
https://bitbucket.org/simoncblyth/env/src/tip/numerics/npy/
https://bitbucket.org/simoncblyth/env/src/tip/boost/basio/numpyserver/
https://bitbucket.org/simoncblyth/env/src/tip/cuda/cudawrap/
Replacing Python packages
Many C++ classes required to replace:
Migration allows use of modern OpenGL 4.1:
GPU Geometry representation, NPY persistency
G4DAE -> GGeo geometry
GGeo -> OptiX geometry, OptiX launch control
OpenGL shader based 3D visualization
https://bitbucket.org/simoncblyth/env/src/tip/optix/ggeo/
https://bitbucket.org/simoncblyth/env/src/tip/graphics/assimpwrap/
https://bitbucket.org/simoncblyth/env/src/tip/graphics/oglrap/
https://bitbucket.org/simoncblyth/env/src/tip/graphics/optixrap/
__device__ int propagate_to_boundary( Photon& p, State& s, curandState &rng) { float absorption_distance = -s.material1.y*logf(curand_uniform(&rng)); // .y:absorption_length float scattering_distance = -s.material1.z*logf(curand_uniform(&rng)); // .z:scattering_length if (absorption_distance <= scattering_distance) { if (absorption_distance <= s.distance_to_boundary) { p.time += absorption_distance/(SPEED_OF_LIGHT/s.material1.x); // .x:refractive_index p.position += absorption_distance*p.direction; if (curand_uniform(&rng) < s.material1.w) // .w:reemission_prob { // non-scintillators have zero reemission_prob p.wavelength = reemission_lookup(curand_uniform(&rng)); p.direction = uniform_sphere(&rng); p.polarization = normalize(cross(uniform_sphere(&rng), p.direction)); s.flag = BULK_REEMIT ; return CONTINUE; } else { s.flag = BULK_ABSORB ; return BREAK; } } // otherwise sail to boundary } else // scattering ..
Position, direction, polarization XYZ + time, wavelength, weight
Position, direction, polarization XYZ + time, wavelength, weight
GGeoView
Cerenkov photons from an 100 GeV muon travelling from right to left across Dayabay AD. Primaries are simulated by Geant4, Cerenkov "steps" of the primaries are transferred to the GPU. The dots represent OptiX calculated first intersections of GPU generated photons with colors corresponding to material boundaries: (red) GdDopedLS:Acrylic, (green) LiquidScintillator:Acrylic, (blue) Acrylic:LiquidScintillator, (white) IwsWater:UnstStainlessSteel, (grey) others. The red lines represent the positions and directions of the "steps" with an arbitrary scaling for visibility.