JUNO+Opticks : Progress Towards Production

Open source, https://bitbucket.org/simoncblyth/opticks

Opticks replaces Geant4 optical photon simulation with an equivalent implementation that benefits from state-of-the-art GPU ray tracing from NVIDIA OptiX, which can yield performance >1000x Geant4.

All optically relevant aspects of Geant4 context are translated+copied to GPU:

geometry : solids, structure, material+surface properties

generation : Cerenkov + Scintillation (using Gensteps from Geant4)

propagation : Rayleigh scattering, absorption, reemission, boundary

COMPLETED : Full Simulation re-implementation for OptiX 7 API

NEXT : A-B Validation iteration, geometry issue fixes

Simon C Blyth, IHEP, CAS — 18 July 2022

Geant4OpticksWorkflow

Geant4OpticksWorkflow 2

Validation : Opticks-Geant4 A-B testing

: a.genstep :        (1, 6, 4) : generation param
: a.seed    :         (10000,) : photon -> genstep
: a.inphoton:    (10000, 4, 4) : input photon
: a.photon  :    (10000, 4, 4) : final photon param
: a.hit     :     (9633, 4, 4) : hitmasked photon
: a.seq     :       (10000, 2) : history sequence
: a.record  :(10000, 10, 4, 4) : steppoint record
: a.rec     :(10000, 10, 2, 4) : compressed record
: a.domain  :        (2, 4, 4) : used for compression
: a.sframe  :        (4, 4, 4) : used for targetting
: a.flat    :      (10000, 64) : random numbers tagged
: a.tag     :       (10000, 4) : compressed tag enum
: a.prd     :(10000, 10, 2, 4) : per-ray-data isect

Every random consumed by every photon step

A : Opticks : GPU geometry + optical sim.

full step point recording (QEvent, SEvt)

B : Geant4 simulation

full step point recording (U4Recorder, SEvt)

A-B : comparison via python/NumPy analysis

Random Aligned: direct comparison of photon steppoint record and history arrays
Non-aligned: statistical comparison of history fractions

July => ??? : A-B validation iteration : random alignment, deviation checks

March : Shifted focus, geom issues => optical sim, July : A-B iteration starting

/env/presentation/notes/progress/2022_jul.png

A : Opticks GPU sim, B : Geant4 CPU sim (both fully instrumented: point recording, histories, tagged random consumption)

Opticks Packages : Many Removed, Many Added

pkg	hh/cc/cu/py	Old CUDA Simulation and Visualization
cudarap	16/8/3/0	old CUDA interface
thrustrap	20/2/6/1	old CUDA thrust interface : photon seeding, indexing
optixrap	43/30/3/2	old optical photon simulation implemented in old OptiX API
okop	14/10/1/0	high level pkg on top of optixrap
oglrap	38/29/0/0	old OpenGL based visualization of opticksgeo geometry
opticksgl	9/5/0/0	integrating OpenGL with okop OptiX

pkg	hh/cc/cu/py	New Geometry model and CUDA simulation
CSG_GGeo	3/2/0/0	GGeo to CSG geometry translation
GeoChain	3/2/0/0	geometry translation testing
CSG	37/18/0/7	New geometry model
CSGOptiX	20/14/4/2	CSG intersection with OptiX 7
qudarap	43/25/13/1	CUDA optical photon simulation, CUDA upload, download, textures
u4	32/10/0/2	New Geant4 interface, genstep collection, U4Recorder
gdxml	6/4/0/0	GDML loaded as XML for fixups
g4cx	3/2/0/0	New top level package integrating Geant4 and CSGOptiX

green : active development, blue : plan to remove, red : removed

Simulation : Why so much had to be re-implemented ?

Huge change unavoidable from new OptiX API --> So profit from rethink of simulation code --> 2nd impl advantage

Old simulation (OptiXRap)	New simulation (QUDARap/qsim.h + CSGOptiX)
implemented on top of old OptiX API	pure CUDA implementation OptiX use kept separate, just for intersection
rather monolithic .cu GPU only implementation deep stack of support code	many small headers many GPU+CPU headers shallow stack : QUDARap depends only on SYSRap
most code in GPU only context, even when not needing OptiX or CUDA	strict code segregation code not needing GPU in SYSRap not QUDARap
testing : GPU only, coarse	testing : CPU+GPU , fine-grained curand mocking on CPU
limited code sharing between A and B	maximal sharing : SEvt.hh, sphoton.h, ...
timeconsuming manual random alignment conducted via debugger	new systematic approach to random alignment

Goals of re-implementation : flexible, modular GPU simulation, easily testable, less code

code reduction, sharing as much as possible between A and B
fine grained testing on both CPU and GPU, with GPU curand mocking
profit from several years of CUDA experience, eg QSim.hh/qsim.h host/device counterpart pattern:
- hostside initializes and uploads device side counterpart --> device side hits ground running

cx/CSGOptiX7.cu:simulate : High Level, details in qudarap+sysrap

void simulate(const uint3& idx, const uint3& dim)
{
    // skip some setup

    curandState rng = sim->rngstate[idx] ;
    sim->generate_photon(ctx.p, rng, gs, idx, gs_idx );

    int command = START ;
    int bounce = 0 ;
    ctx.point(bounce);

    while( bounce < evt->max_bounce )
    {
        trace( params.handle, ctx.p.pos, ctx.p.mom,
              params.tmin, params.tmax, prd);
        if( prd->boundary() == 0xffffu ) break ;
        ctx.trace(bounce); // record PerRayData intersect

        command = sim->propagate(bounce, rng, ctx);

        bounce++;
        ctx.point(bounce) ;
        if(command == BREAK) break ;
    }
    ctx.end();  // write seq, tag, flat 
    evt->photon[idx] = ctx.p ;  // final photon 
}

https://bitbucket.org/simoncblyth/opticks/src/master/CSGOptiX/CSGOptiX7.cu

Opticks Packages : Many more to be removed

pkg	hh/cc/cu/py	Base
okconf	3/2/0/0	config version detection
sysrap	150/70/1/8	basis types, new array NP.hh
boostrap	46/42/0/0	boost tools
npy	181/165/0/6	geo primitives, old array NPY.hpp
optickscore	70/62/0/1	old core, argument parsing

pkg	hh/cc/cu/py	Geometry
ggeo	68/65/0/2	complete geometry model : no Geant4 dependency
extg4	62/53/0/0	Geant4 geometry translation into GGeo model
opticksgeo	10/6/0/0	high level pkg on top of ggeo
cfg4	126/117/0/3	Old Geant4 comparison machinery, eg CRecorder

green : active development, blue : plan to remove, red : removed

Systematic Approach to Random Alignment of Two Simulations : A side

 enum {
     stag_undef      =    0,
     stag_to_sci     =    1,
     stag_to_bnd     =    2,
     stag_to_sca     =    3,
     stag_to_abs     =    4,
     stag_at_burn_sf_sd = 5,
     stag_at_ref     =    6,
     stag_sf_burn    =    7,
     stag_sc         =    8,
     stag_to_ree     =    9,
     stag_re_wl      =   10,
     stag_re_mom_ph  =   11,
     stag_re_mom_ct  =   12,
     stag_re_pol_ph  =   13,
     stag_re_pol_ct  =   14,
     stag_hp_ph      =   15
     //stag_hp_ct      =   16
  };   // 4 bit tags pack better

run simulation with all random consumption tagged

qudarap/qsim.h:

 459 inline QSIM_METHOD int qsim::propagate_to_boundary(unsigned& flag,
          curandStateXORWOW& rng, sctx& ctx)
 460 {
 462     const sstate& s = ctx.s ;
 464     const float& absorption_length = s.material1.y ;
 465     const float& scattering_length = s.material1.z ;
 471 #ifdef DEBUG_TAG
 472     float u_to_sci = curand_uniform(&rng) ; // burn to align 
 473     float u_to_bnd = curand_uniform(&rng) ; // burn to align 
 474 #endif
 475     float u_scattering = curand_uniform(&rng) ;
 476     float u_absorption = curand_uniform(&rng) ;
 478 #ifdef DEBUG_TAG
 479     stagr& tagr = ctx.tagr ;
 480     tagr.add( stag_to_sci, u_to_sci);
 481     tagr.add( stag_to_bnd, u_to_bnd);
 482     tagr.add( stag_to_sca, u_scattering);
 483     tagr.add( stag_to_abs, u_absorption);
 491 #endif
 489     float scattering_distance = -scattering_length*logf(u_scattering);
 490     float absorption_distance = -absorption_length*logf(u_absorption);
 ...

sysrap/stag.h/stagr collects into tag.npy flat.npy : A-side straightforward as already control all code

Systematic Approach to Random Alignment of Two Simulations : B side

enum {
U4Stack_Unclassified            = 0,
U4Stack_RestDiscreteReset       = 1,
U4Stack_DiscreteReset           = 2,
U4Stack_ScintDiscreteReset      = 3,
U4Stack_BoundaryDiscreteReset   = 4,
U4Stack_RayleighDiscreteReset   = 5,
U4Stack_AbsorptionDiscreteReset = 6,
U4Stack_BoundaryBurn_SurfaceReflectTransmitAbsorb = 7,
U4Stack_BoundaryDiDiTransCoeff  = 8,
U4Stack_AbsorptionEffDetect     = 9,
U4Stack_RayleighScatter         = 10,
U4Stack_BoundaryDiMeReflectivity = 11,
U4Stack_ChooseReflection         = 12,
U4Stack_RandomDirection          = 13,
U4Stack_LambertianRand           = 14,
U4Stack_Reemission               = 15 // 4-bit better
};

u4/InstrumentedG4OpBoundaryProcess.hh
u4/InstrumentedG4OpBoundaryProcess.cc
u4/tests/ShimG4OpRayleigh.hh
u4/tests/ShimG4OpRayleigh.cc
u4/tests/ShimG4OpAbsorption.hh
u4/tests/ShimG4OpAbsorption.cc
u4/tests/DsG4Scintillation.h
u4/tests/DsG4Scintillation.cc

A : collect curand_uniform streams eg:(1M,256)
u4/U4Random : get G4UniformRand to supply precooked u
manual tagging -> override all u consuming Geant4 code

u4/tests/ShimG4OpAbsorption.cc:

 32  void ShimG4OpAbsorption::ResetNumberOfInteractionLengthLeft()
 33 {
 35     G4double u = G4UniformRand() ;
 36     SEvt::AddTag( U4Stack_AbsorptionDiscreteReset, u );
 45     theNumberOfInteractionLengthLeft = -1.*G4Log(u) ;
 49 }

 882 void SEvt::addTag(unsigned tag, float flat){
 883 {
 885     stagr& tagr = current_ctx.tagr ;
 899     tagr.add(tag,flat)  ;
 929 }

sysrap/SEvt.cc:SEvt::AddTag using same stagr struct as A

u4/U4Process::ClearNumberOfInteractionLengthLeft

makes Geant4 consumption more regular -> easier to align
used from U4Recorder::UserSteppingAction_Optical

A - B : Compare tag arrays with NumPy to check alignment

epsilon:~ blyth$ gx
/Users/blyth/opticks/g4cx
epsilon:g4cx blyth$ ./ab.sh

In [1]: AB(54)
Out[1]:
A(54) : TO BT BR BR BT SA                     B(54) : TO BT BR BR BT SA
       A.t : (1000, 64)                              B.t : (1000, 64)
      A.t2 : (1000, 64)                             B.t2 : (1000, 64)

 0 :     0.7082 :  1 :     to_sci              0 :     0.7082 :  3 : ScintDiscreteReset :
 1 :     0.0797 :  2 :     to_bnd              1 :     0.0797 :  4 : BoundaryDiscreteReset :
 2 :     0.1970 :  3 :     to_sca              2 :     0.1970 :  5 : RayleighDiscreteReset :
 3 :     0.4009 :  4 :     to_abs              3 :     0.4009 :  6 : AbsorptionDiscreteReset :
 4 :     0.3778 :  5 : at_burn_sf_sd           4 :     0.3778 :  7 : BoundaryBurn_SurfaceReflectTransmitAbsorb :
 5 :     0.7441 :  6 :     at_ref              5 :     0.7441 :  8 : BoundaryDiDiTransCoeff :
...
32 :     0.2452 :  1 :     to_sci             32 :     0.2452 :  3 : ScintDiscreteReset :
33 :     0.5043 :  2 :     to_bnd             33 :     0.5043 :  4 : BoundaryDiscreteReset :
34 :     0.1788 :  3 :     to_sca             34 :     0.1788 :  5 : RayleighDiscreteReset :
35 :     0.8004 :  4 :     to_abs             35 :     0.8004 :  6 : AbsorptionDiscreteReset :
36 :     0.3335 :  5 : at_burn_sf_sd          36 :     0.3335 :  7 : BoundaryBurn_SurfaceReflectTransmitAbsorb :
37 :     0.7170 :  7 :    sf_burn             37 :     0.7170 :  9 : AbsorptionEffDetect :
38 :     0.0000 :  0 :      undef             38 :     0.0000 :  0 : Unclassified :

A : /tmp/blyth/opticks/RaindropRockAirWaterSmall/G4CXSimulateTest/ALL
B : /tmp/blyth/opticks/RaindropRockAirWaterSmall/U4RecorderTest/ShimG4OpAbsorption_FLOAT_ShimG4OpRayleigh_FLOAT/ALL
In [2]:

sm = np.all( a.seq[:,0] == b.seq[:,0] )            : True
ww = np.where( a.seq[:,0] != b.seq[:,0] )[0]       : []          History alignment : can be accidental 
we = np.where( A.t.view('|S64') != B.t2.view('|S64') )[0] : []   Tag enum alignment : much more stringent 
 match sequences of 64 tags by viewing as 64 char string

`RaindropRockAirWaterSmall_UpXZ1000_chk.png`

A - B : 1M : RaindropRockAirWaterSmall : simple geometry machinery test

a.record shape (1000000, 10, 4, 4) : photon pos/mom/pol/flags/.. at 10 step points
A-B deviation from direct subtraction eg: w = np.where(np.abs(a.record-b.record) > 0.01)

epsilon:~ blyth$ ~/opticks/g4cx/ab.sh

sm = np.all( a.seq[:,0] == b.seq[:,0] )            : True       ## all 1M histories match 
we = np.where( A.t.view('|S64') != B.t2.view('|S64') )[0] : [ 41595 114799 125032 158475 174993 243023 ... ]

In [2]: seqhis_(a.seq[we,0])      ## 22/1M not enum aligned : all with truncated record 
Out[2]: ['TO BT SC BR BR BR BR BR BR BR', 'TO BT SC BR BR BR BR BR BR BR', ...

In [4]: wm = np.where( A.t.view('|S64') == B.t2.view('|S64') )[0] ; wm, len(wm)     ## select the A-B enum aligned 
Out[4]: array([     0,      1,      2,      3,  ...,  999995, 999996, 999997, 999998, 999999]), 999978

In [29]: np.abs(a.record[wm] - b.record[wm]).max()      ## max deviation : 0.018 mm  
Out[29]: 0.018722534

In [30]: np.where( np.abs(a.record[wm] - b.record[wm]) > 0.01 )    ## select photon step points with deviation > 0.01 
Out[30]:
(array([ 18157, 125121, 467717, 499537, 624529, 695184, 759091, 779861, 938053]),    ## photon index 
 array([1, 1, 3, 1, 1, 2, 4, 1, 1]),             ## step point index 
 array([0, 0, 0, 0, 0, 0, 0, 0, 0]),             ## first photon row is position, time 
 array([2, 2, 2, 2, 2, 0, 2, 2, 2]))             ## 2:Z 0:X coordinate 


In [31]: dv = wm[np.unique(np.where( np.abs(a.record[wm] - b.record[wm]) > 0.01 )[0])]   ## deviant indices  

In [36]: seqhis_(a.seq[dv,0])   ## deviant step point mostly AB/SC position 
Out[36]: ['TO AB','TO AB','TO BT BT AB','TO AB','TO AB','TO BR AB','TO SC BT BT SA','TO AB','TO AB']

22/1M not tag aligned, 9(466)/1M with deviation > 0.01(0.001)

22/1M mis-aligned : tag arrays show alignment lost at 1st BR after SC

Select non-aligned : we = np.where( A.t.view('|S64') != B.t2.view('|S64') )[0]

In [4]: AB(we[0])     ## all 22/1M loose alignment at 1st BR after SC 
Out[4]: A(41595) : TO BT SC BR BR BR BR BR BR BR          B(41595) : TO BT SC BR BR BR BR BR BR BR

 0 :     0.6518 :  1 :     to_sci                  0 :     0.6518 :  3 : ScintDiscreteReset :
 1 :     0.3244 :  2 :     to_bnd                  1 :     0.3244 :  4 : BoundaryDiscreteReset :
 2 :     0.2309 :  3 :     to_sca                  2 :     0.2309 :  5 : RayleighDiscreteReset :
 3 :     0.7327 :  4 :     to_abs                  3 :     0.7327 :  6 : AbsorptionDiscreteReset :
 4 :     0.1133 :  5 : at_burn_sf_sd               4 :     0.1133 :  7 : BoundaryBurn_SurfaceReflectTransmitAbsorb :
 5 :     0.6275 :  6 :     at_ref                  5 :     0.6275 :  8 : BoundaryDiDiTransCoeff :

 6 :     0.4789 :  1 :     to_sci                  6 :     0.4789 :  3 : ScintDiscreteReset :
 7 :     0.6990 :  2 :     to_bnd                  7 :     0.6990 :  4 : BoundaryDiscreteReset :
 8 :     0.9998 :  3 :     to_sca                  8 :     0.9998 :  5 : RayleighDiscreteReset :
 9 :     0.0067 :  4 :     to_abs                  9 :     0.0067 :  6 : AbsorptionDiscreteReset :
10 :     0.1797 :  8 :         sc                 10 :     0.1797 : 10 : RayleighScatter :
11 :     0.0088 :  8 :         sc                 11 :     0.0088 : 10 : RayleighScatter :
12 :     0.5316 :  8 :         sc                 12 :     0.5316 : 10 : RayleighScatter :
13 :     0.8436 :  8 :         sc                 13 :     0.8436 : 10 : RayleighScatter :
14 :     0.4477 :  8 :         sc                 14 :     0.4477 : 10 : RayleighScatter :

15 :     0.4004 :  1 :     to_sci                 15 :     0.4004 :  3 : ScintDiscreteReset :
16 :     0.8328 :  2 :     to_bnd                 16 :     0.8328 :  4 : BoundaryDiscreteReset :
17 :     0.0016 :  3 :     to_sca                 17 :     0.0016 :  5 : RayleighDiscreteReset :
18 :     0.2059 :  4 :     to_abs                 18 :     0.2059 :  6 : AbsorptionDiscreteReset :
19 :     0.5832 :  5 : at_burn_sf_sd              19 :     0.5832 :  7 : BoundaryBurn_SurfaceReflectTransmitAbsorb :
20 :     0.7585 :  6 :     at_ref   <----- ALIGNMENT LOST HERE : DO NOT GET EXPECTED TAG : BoundaryDiDiTransCoeff 
                                                  20 :     0.7585 :  3 : ScintDiscreteReset :
21 :     0.6396 :  1 :     to_sci                 21 :     0.6396 :  4 : BoundaryDiscreteReset :
22 :     0.9837 :  2 :     to_bnd                 22 :     0.9837 :  5 : RayleighDiscreteReset :

TODO: run the non-aligned indices in debugger to understand unexpected G4 random consumption

A - B : Deviant Scattering SC and Absorption AB Position : __logf(u)

"Raindrop" geometry : 50mm radius sphere of water in air

near perfect alignment : 22/1M not enum aligned
largest deviation ~0.02mm from SC/AB position

Previously assumed float/double difference between Opticks/Geant4

BUT: changing Geant4 to use float -> almost no difference

sysrap/tests/logTest.cu : CUDA comparison

logf(u)
__logf(u) (used with -use_fast_math)

Deviations small,

BUT: relevant ~0.5e-7 for u > 0.998

Using KLUDGE_FASTMATH_LOGF reduces SC/AB position deviation

##define KLUDGE_FASTMATH_LOGF(u) (u < 0.998f ? __logf(u) : __logf(u) - 0.46735790f*1e-7f )

Started : A - B aligned iteration with full JUNO GDML geometry

Simple geometry : ~perfect A-B aligned match
Full geometry : JUST STARTED, CROP OF ISSUES

Configure geometry and input photons:

geom_
oip

Run A and B simulations and "T" simtrace running:

cd ~/opticks/g4cx
./gxs.sh       ## A
./gxt.sh       ## T  collect "simtrace" arrays of intersect

cd ~/opticks/u4
./u4s.sh       ## B

gxt.sh uses "simtrace" arrays to make 2D slice plots (pyvista, matplotlib)

cd ~/opticks/g4cx
./gxt.sh ana
./ab.sh           # compare A and B with NumPy

`J000_DownXZ1000_PMT_ellipsoid_transform_again.png`

"simtrace" : scatter plot of intersects, looks like missed ellipsoid transform

`J000_DownXZ1000_virtual_too.png`

gx ; EYE=0,1000,200 LOOK=0,0,200 ./ab.sh recplot

A:red, B:blue : virtual Water/Water "hatbox" skipped in A: prevents alignment

`hama_body_log_DownXZ1000_no_issue_with_hama_body_log.png`

single PMT test : no-such missed ellipsoid transform issue

Next steps in detail

Aligned A-B validation iteration

Very powerful tool for finding geometry issues

complete random aligned iteration
find solutions for bookkeeping + geometry issues eg:
- B: Pyrex/Pyrex "microstep" from PMT virtuals : will need stream rewind to stay aligned

Non-aligned A-B iteration : statistical comparison

less resource intensive than aligned running : no large tag+flat arrays
handle ordinary genstep running, not restricted to input photons
statistical comparison can be used with much larger samples than aligned

Update JUNO-Opticks integration to use new Opticks

integration designed to be simpler
additional identity info for multifilm optical model
- check performance with very large textures

Overview

JUNO Simulation -> Opticks -> GPU
Benefits:	drastically faster GPU sim. in production improved CPU sim. speed + correctness
Status:	DONE : OptiX 7 sim. re-implementation Started : A-B validation iteration

Changed gears:

sim. development -> geometry debug

Many Thanks to : Yuxiang Hu

More pioneer users very welcome

https://bitbucket.org/simoncblyth/opticks	code repository
https://simoncblyth.bitbucket.io	presentations and videos