/cvmfs/opticks.ihep.ac.cn/ok/releases/Opticks-v0.2.1/x86_64-CentOS7-gcc1120-geant4_10_04_p02-dbg
| Geant4 10.4.2 | (Opticks with newer Geant4 already in use elsewhere) | 
| Custom4 0.1.8 | small package : but deeply coupled with : Geant4 + JUNOSW + Opticks | 
| OptiX 7.5 CUDA 11.7 | straightforward update, so far did not exploit new features | 
| Opticks-v0.2.1 | latest release https://github.com/simoncblyth/opticks/releases/tag/v0.2.1 | 
Updated environment setup + distribution scripts:
Testing distribution using ctests:
cd $OPTICKS_PREFIX/tests ctest -N # list tests ctest —-output-on-failure # run all ctest -R CSGFoundry_CreateFromSimTest --output-on-failure # run selected
EYE=0,1.5,0 TMIN=1.3 ZOOM=4 ~/opticks/cxr_min.sh ## CSGOptiXRMTest
Using GEOM J23_1_0_rc3_ok0
Deferred geometry, switched off by tut_detsim.py options.
| --no-guide_tube | OptiX 7.1 has curves : thought might enable G4Torus translation, but docs show are one-sided : so instead triangulate torus[T] ? | |
| --debug-disable-xj | XJfixture XJanchor | Deep CSG trees require dev. to see if "listnode" (similar to G4MultiUnion) can provide solution | 
| --debug-disable-sj | SJCLSanchor SJFixture SJReceiver SJFixture | |
| --debug-disable-fa | FastenerAcrylic | |
Virtual surface shifts used to avoid degeneracy, together with defaults:
export Tub3inchPMTV3Manager__VIRTUAL_DELTA_MM=0.10 ## 1.e-3 export HamamatsuMaskManager__MAGIC_virtual_thickness_MM=0.10 ## 0.05 export NNVTMaskManager__MAGIC_virtual_thickness_MM=0.10 ## 0.05
sigma_alpha/polish ground surface handling ?
[T] torus quartic analytic solution is painful : instead simply use appropriate triangulation approx, more precise that analytic with much less pain
| idx | control script | initialization time (seconds) | Notes | 
|---|---|---|---|
| [1] | ~/j/okjob.sh | 149 | JUNOSW+Opticks (tut_detsim.py "main") | 
| [2] | ~/opticks/g4cx/tests/G4CXTest_GEOM.sh | 127 | InputPhoton, TorchGenstep, NOT YET InputGenstep | 
| [3] | ~/opticks/CSGOptiX/cxs_min.sh | <2 | InputPhoton, TorchGenstep, InputGenstep | 
G4CXTest.cc usinh G4CXApp.h
#include "OPTICKS_LOG.hh"
#include "G4CXApp.h"
int main(int argc, char** argv)
{
    OPTICKS_LOG(argc, argv);
    return G4CXApp::Main();
}
Enables pure optical simulation comparison
| Test | Status | 
|---|---|
| InputPhotons targetting PMTs | chi2 matched, no known issues | 
| TorchGenstep from CD center | chi2 marginal : chimney issue ? | 
QCF qcf : a.q 1000000 b.q 1000000 c2sum : 567.1130 c2n : 506.0000 c2per: 1.1208 C2CUT: 200 CHI2 ISSUE WITH TORCH RUNNING c2sum/c2n:c2per(C2CUT) 567.11/506:1.121 (200) pv[0.031,< 0.05 : NOT:null-hyp ] INPUT PHOTONS TARGETTING PMT CHI2 OK np.c_[siq,_quo,siq,sabo2,sc2,sabo1][0:40] ## A-B history frequency chi2 comparison [[' 0' 'TO AB ' ' 0' '126549 126732' ' 0.1322' ' 2 5'] [' 1' 'TO BT BT BT BT BT BT SD ' ' 1' ' 70494 70173' ' 0.7325' ' 18 2'] [' 2' 'TO BT BT BT BT BT BT SA ' ' 2' ' 57103 56944' ' 0.2217' ' 5 25'] [' 3' 'TO SC AB ' ' 3' ' 51434 51739' ' 0.9016' ' 4 9'] [' 4' 'TO SC BT BT BT BT BT BT SD ' ' 4' ' 35878 36119' ' 0.8067' ' 58 45'] [' 5' 'TO SC BT BT BT BT BT BT SA ' ' 5' ' 29676 30164' ' 3.9797' ' 124 4'] [' 6' 'TO SC SC AB ' ' 6' ' 19993 19499' ' 6.1794' ' 137 124'] [' 7' 'TO BT BT SA ' ' 7' ' 18932 18837' ' 0.2390' ' 71 14'] [' 8' 'TO RE AB ' ' 8' ' 18319 18272' ' 0.0604' ' 9 64'] [' 9' 'TO SC SC BT BT BT BT BT BT SD ' ' 9' ' 15454 15701' ' 1.9582' ' 19 85'] ['10' 'TO SC SC BT BT BT BT BT BT SA ' '10' ' 12785 12696' ' 0.3109' ' 24 3'] ['11' 'TO BT BT AB ' '11' ' 10993 11100' ' 0.5182' ' 72 188'] ['12' 'TO BT AB ' '12' ' 9250 9727' '11.9897' ' 36 96'] ## ABSLEN ACRYLIC ? ['13' 'TO BT BT BT BT BT BT BT SA ' '13' ' 7476 7627' ' 1.5097' ' 176 162'] ['14' 'TO SC SC SC AB ' '14' ' 7544 7545' ' 0.0001' ' 90 84'] ['15' 'TO RE BT BT BT BT BT BT SD ' '15' ' 7419 7364' ' 0.2046' ' 197 6'] ['16' 'TO SC RE AB ' '16' ' 7137 7191' ' 0.2035' ' 110 93'] ['17' 'TO RE BT BT BT BT BT BT SA ' '17' ' 7126 7104' ' 0.0340' ' 48 181'] ['18' 'TO SC BT BT AB ' '18' ' 6419 6527' ' 0.9010' ' 153 89'] ['19' 'TO BT BT BT BT BT BT BT SR SA ' '19' ' 6385 6367' ' 0.0254' ' 16 139'] ['20' 'TO BT BT BT BT SD ' '20' ' 6146 6190' ' 0.1569' ' 13 99'] ['21' 'TO SC SC SC BT BT BT BT BT BT SD ' '21' ' 6148 6175' ' 0.0592' ' 145 194'] ['22' 'TO SC BT BT SA ' '22' ' 6087 6170' ' 0.5620' ' 120 185'] ['23' 'TO SC BT AB ' '23' ' 5589 5782' ' 3.2758' ' 8 17'] ['24' 'TO BT BT DR BT SA ' '24' ' 5449 5543' ' 0.8039' ' 600 246'] ['25' 'TO RE RE AB ' '25' ' 5538 5420' ' 1.2707' ' 267 125'] ['26' 'TO BT BT BT SA ' '26' ' 5532 5259' ' 6.9066' ' 745 7'] ['27' 'TO SC SC SC BT BT BT BT BT BT SA ' '27' ' 5084 4974' ' 1.2030' ' 23 31'] ['28' 'TO SC BT BT BT BT BT BT BT SA ' '28' ' 4609 4610' ' 0.0001' ' 20 63'] ['29' 'TO BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT SD ' '29' ' 3809 3813' ' 0.0021' ' 362 812'] ['30' 'TO RE SC AB ' '30' ' 3660 3565' ' 1.2491' ' 54 30'] ['31' 'TO SC RE BT BT BT BT BT BT SD ' '31' ' 3192 3134' ' 0.5318' ' 292 136'] ['32' 'TO SC BT BT BT BT BT BT BT SR SA ' '32' ' 3145 3173' ' 0.1241' ' 243 419'] ['33' 'TO BT BT BT BT BT BT BT SD ' '33' ' 3168 3138' ' 0.1427' ' 181 424'] ['34' 'TO BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT SA ' '34' ' 3142 3163' ' 0.0699' ' 22 257'] ['35' 'TO BT BT BT BT BT BT BT SR SR SA ' '35' ' 3043 3096' ' 0.4576' ' 286 1591'] ['36' 'TO SC SC BT BT AB ' '36' ' 2878 2987' ' 2.0257' ' 636 252'] ['37' 'TO SC RE BT BT BT BT BT BT SA ' '37' ' 2877 2960' ' 1.1802' ' 151 301'] ['38' 'TO BT BT BT BT AB ' '38' ' 2857 2834' ' 0.0930' ' 225 228'] ['39' 'TO SC BT BT BT BT SD ' '39' ' 2841 2800' ' 0.2980' ' 224 323']]
 np.c_[siq,_quo,siq,sabo2,sc2,sabo1][bzero]  ## in A but not B
 [['1107' 'TO BT BT BT BT BT BT BT BT SD                                                                  ' '1107' '    41      0' ' 0.0000' ' 11355     -1']
  ['1305' 'TO BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT' '1305' '    33      0' ' 0.0000' ' 11040     -1']
  ['1623' 'TO BT BT DR BT BT BT SD                                                                        ' '1623' '    26      0' ' 0.0000' '  1930     -1']
  ['2375' 'TO BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT BT SD                                          ' '2375' '    17      0' ' 0.0000' ' 10972     -1']
  ['3264' 'TO SC BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT BT SD                                       ' '3264' '    12      0' ' 0.0000' ' 22140     -1']]
 In [1]: w = a.q_startswith("TO BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT") ; w
 Out[1]:
 array([ 11040,  15219, 118322, 152607, 165838, 215978, 299136, 374379, 395244, 422394, 427598, 434101, 443666, 445392, 479186, 531698, 549984, 592656, 604821, 637582, 656052, 736283, 777988, 789501,
        821402, 837105, 853410, 898084, 903045, 923645, 927731, 974750, 989689])
 In [2]: a.f.record[w[0],:,0]    ## PHOTON STEP POINT POSITIONS : ALL SIMILAR : GOING UP THE CHIMNEY 
 Out[2]:
 array([[   -1.594,     0.835,    99.984,     0.   ],        ##  photon step point (x,y,z,t)  
        [ -284.142,   148.854, 17823.998,    81.302],
        [ -315.513,   165.289, 20000.   ,    88.563],
        [ -332.369,   174.119, 21750.   ,    94.401],
        [ -332.383,   174.127, 21752.   ,    94.407],
        [ -344.817,   180.641, 23500.   ,   103.396],
        [ -368.795,   193.202, 25752.   ,   110.911],
        [ -409.762,   214.663, 29599.7  ,   123.75 ],
        [ -409.764,   214.664, 29599.85 ,   123.75 ],
        ...
        [ -412.414,   216.053, 29848.799,   124.581],
        [ -412.424,   216.058, 29849.7  ,   124.584],
        [ -412.426,   216.059, 29849.85 ,   124.585],
        [ -412.532,   216.114, 29859.85 ,   124.618],
        [ -412.534,   216.115, 29860.   ,   124.618],
        [ -412.543,   216.12 , 29860.9  ,   124.621],
        [ -412.55 ,   216.124, 29861.5  ,   124.623],
        [ -412.56 ,   216.129, 29862.5  ,   124.627]], dtype=float32)
 | [2] G4CXTest_GEOM.sh | |
|---|---|
| 1M photon Torch Genstep at CD center | |
| Red | End points | 
| Green | Step points | 
| Cyan | Hit points | 
TEST=large_scan ~/opticks/cxs_min.sh
Generate 20 optical only events with 0.1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH OPTICKS_NUM_PHOTON=H1:10,M2,3,5,7,10,20,40,60,80,100 OPTICKS_NUM_EVENT=20 OPTICKS_EVENT_MODE=Hit
| Test Hardware | Notes | 
|---|---|
| DELL Precison Workstation with NVIDIA TITAN RTX(24G) | Primary test hardware | 
| DELL Precision Workstation with NVIDIA TITAN V(12G) | VRAM limited | 
| DELL Precision Workstation with NVIDIA Quadro RTX 8000 (48G) | TODO : try for 400M photons | 
| GPU cluster nodes with NVIDIA V100 (32GB) | Basic function tests only so far | 
~/o/cxs_min.sh ## 2.2M hits from 10M photon TorchGenstep, 3.1 seconds
Release preprocessor macros : adds: PRODUCTION , removes: DEBUG_TAG, DEBUG_PIDX,...
Examine flattened kernel source CSGOptiX/CSGOptiX7.cu (103k lines) : all includes included
~/opticks/preprocessor.sh > /tmp/out.cc ## using gcc -E -C -P
Grepping Kernel PTX : Parallel Thread Execution ~Assembly code
Grepping PTX for doubles and printf, and then removing from source : opticks-ptx bash function eg:
grep \\.f64 $OPTICKS_PREFIX/ptx/CSGOptiX_generated_CSGOptiX7.cu.ptx
Opticks SEvt Metadata
Opticks Event => folders of NumPy .npy (NPFold.h/NP.hh)
sreport executable:
Usage on workstation/GPU job and laptop:
~/o/cxs_min.sh ## create SEvt
Laptop, rsync small metadata summary from remote:
JOB=N7 ~/o/sreport.sh grab JOB=N7 PLOT=Substamp_ALL_Etime_vs_Photon ~/o/sreport.sh
Effective automated reporting+plotting are essential for optimization
Debug : 0.341 seconds per million photons
Release : 0.314 seconds per million photons
   ...
   A018_QSim__simulate_PREL :   1701933491020126,19102924,1300668    2023-12-07T15:18:11.020126  92,039,765  92,038,118       2,598
   A018_QSim__simulate_POST :   1701933526625966,19102924,1300668    2023-12-07T15:18:46.625966 127,645,605 127,643,958  35,605,840
        SEvt__endIndex_A018 :   1701933526626230,19102924,1300668    2023-12-07T15:18:46.626230 127,645,869 127,644,222         264
 SEvt__endOfEvent_LAST_EGPU :   1701933531837026,19102924,1300668    2023-12-07T15:18:51.837026 132,856,665 132,855,018   5,210,796
             SEvt__EndOfRun :   1701933531837143,19102924,1300668    2023-12-07T15:18:51.837143 132,856,782 132,855,135         117
   A018_QSim__simulate_TAIL :   1701933531837486,19102924,1300668    2023-12-07T15:18:51.837486 132,857,125 132,855,478         343
CSGOptiX__SimulateMain_TAIL :   1701933531837541,19102924,1300668    2023-12-07T15:18:51.837541 132,857,180 132,855,533          55
 juncture:4 [SEvt__Init_RUN_META,SEvt__BeginOfRun,SEvt__EndOfRun,SEvt__Init_RUN_META] time ranges between junctures
            SEvt__Init_RUN_META :           -1                        :            0 : 2023-12-07T15:16:38.980361 JUNCTURE
               SEvt__BeginOfRun :   22,181,663                        :   22,181,663 : 2023-12-07T15:17:01.162024 JUNCTURE
                 SEvt__EndOfRun :  110,675,119                        :  132,856,782 : 2023-12-07T15:18:51.837143 JUNCTURE
            SEvt__Init_RUN_META : -132,856,782                        :            0 : 2023-12-07T15:16:38.980361 JUNCTURE
 ranges:6 time ranges between pairs of stamps
             SEvt__Init_RUN_META ==>           CSGFoundry__Load_HEAD                 1,774    ## init
           CSGFoundry__Load_HEAD ==>           CSGFoundry__Load_TAIL             1,325,321    ## load_geom
           CSGOptiX__Create_HEAD ==>           CSGOptiX__Create_TAIL            20,854,325    ## upload_geom
        A000_QSim__simulate_HEAD ==>        A000_QSim__simulate_PREL                19,450    ## upload_genstep
        A000_QSim__simulate_PREL ==>        A000_QSim__simulate_POST                55,697    ## simulate
        A000_QSim__simulate_POST ==>        A000_QSim__simulate_TAIL                 7,686    ## download
        A001_QSim__simulate_HEAD ==>        A001_QSim__simulate_PREL                 1,037    ## upload_genstep
        A001_QSim__simulate_PREL ==>        A001_QSim__simulate_POST               103,109    ## simulate
        A001_QSim__simulate_POST ==>        A001_QSim__simulate_TAIL                11,304    ## download
        A002_QSim__simulate_HEAD ==>        A002_QSim__simulate_PREL                 1,022    ## upload_genstep
        A002_QSim__simulate_PREL ==>        A002_QSim__simulate_POST               112,313    ## simulate
        A002_QSim__simulate_POST ==>        A002_QSim__simulate_TAIL                16,068    ## download
        A003_QSim__simulate_HEAD ==>        A003_QSim__simulate_PREL                   988    ## upload_genstep
        ...
"Debug" : rather slow hit downloads ?
Unclear why "Release" downloads so much faster than "Debug"
Now back to [2] G4CXTest_GEOM.sh optical only comparison
Only got to 80M : due to U4Recorder memory leak
"Release" benefits B:U4Recorder more than A:CSGOptiX
U4Recorder leaking badly! [Geant4 propagation recorded into Opticks SEvt]
([3] cxs_min.sh) Pure Opticks (no Geant4 or U4Recorder) : no leak
B:U4Recorder / A:CSGOptiX : ratio only ~190 !
Absolute Comparison with ancient Opticks Measurements.. ? [Below presented at CHEP 2019] 58s / 400M photons
| JUNO analytic, 400M photons from center | Speedup | |
|---|---|---|
| Geant4 Extrap. | 95,600 s (26 hrs) | |
| Opticks RTX ON (i) | 58 s | 1650x | 
| JUNO analytic, 400M photons from center | Speedup | Notes | |
|---|---|---|---|
| Geant4 Extrap. | 95,600 s (26 hrs) | Ancient (2019) | |
| Opticks RTX ON (i) | 58 s | 1650x | Ancient (2019) | 
| Current Opticks | 124 s (~2x slower) | "770x" | extrapolated from 31s for 100M | 
Practically everything different between these measurements : nevertheless, its natural to compare
~300 ns photon lifetime limit ?
OPTICKS_MAX_BOUNCE=32 ## curr. OPTICKS_MAX_NS=300 ## IDEA
Expected Primary Cause of 2x slowdown : "bouncy" POM
Use cxs_min_scan.sh to vary OPTICKS_MAX_BOUNCE from 0->32
Slow hit increase above MAX_BOUNCE 20
Using ~/o/cxs_min.sh script with:
OPTICKS_RUNNING_MODE : SRM_TORCH OPTICKS_EVENT_MODE : HitPhoton (picked with VERSION 3) OPTICKS_NUM_PHOTON : H1 (100K) OPTICKS_MAX_PHOTON : M1
Workstation:
~/o/cxs_min_scan.sh ## o is symbolic link to opticks
Laptop:
~/o/cxs_min.sh grab PLOT=Substamp_ONE_maxb_scan PICK=A ~/o/sreport.sh PLOT=Substamp_ONE_maxb_scan PICK=A ~/o/sreport.sh mpcap PLOT=Substamp_ONE_maxb_scan PICK=A PUB=expensive_tail ~/o/sreport.sh mppub vi ~/opticks/notes/issues/OPTICKS_MAX_BOUNCE_scanning.rst ## notes
TODO: check performance with MAX_TIME = 200,300,400 ns
Small truncation bump at 32
sequence nibbles
sseq.h and seq.npy sequence array:
Using ~/o/cxs_min.sh script with:
OPTICKS_RUNNING_MODE : SRM_TORCH OPTICKS_EVENT_MODE : HitPhotonSeq OPTICKS_NUM_PHOTON : M1 OPTICKS_MAX_PHOTON : M1
Workstation:
VERSION=4 ~/o/cxs_min.sh
Laptop:
VERSION=4 ~/o/cxs_min.sh grab VERSION=4 MODE=2 PLOT=seqnib ~/o/cxs_min.sh ana VERSION=4 MODE=2 PLOT=seqnib ~/o/cxs_min.sh mpcap VERSION=4 MODE=2 PLOT=seqnib PUB=small_truncation_bump ~/o/cxs_min.sh mppub
S(n) Expected Speedup
 
optical photon simulation, P ~ 99% of CPU time
Must consider processing "big picture"
 
Very dependant on the parallel fraction
| Theoretical Overall Speedup for various parallel fractions and parallelized speedups | |||
|---|---|---|---|
| Parallelized Speedup | |||
| Parallel Fraction | 100x | 1000x | Notes | 
| 95% | 17x | 20x | Little benefit beyond ~100x parellized speedup | 
| 96% | 20x | 24x | |
| 97% | 25x | 32x | |
| 98% | 34x | 48x | Substantial benefit from more parallelized speedup | 
| 99% | 50x | 91x | |
In [4]: Amdahl.Overall_Speedup(np.array([100,1000]),0.95) Out[4]: array([16.807, 19.627]) In [5]: Amdahl.Overall_Speedup(np.array([100,1000]),0.99) Out[5]: array([50.251, 90.992])
Try Optix-IR (Intermediate Representation) alternative to PTX (new in OptiX 7.1)