source /cvmfs/juno.ihep.ac.cn/centos7_amd64_gcc1120_opticks/Pre-Release/J23.1.0-rc6/setup.sh
OptiX 7.5 | Chosen to match NVIDIA CUDA 11.7 + Driver Version: 515.65.01 on IHEP GPU cluster |
Geant4 10.4.2 | (Opticks with Geant4 11 already in use by Fermilab Geant4 group, others) |
Custom4 0.1.8 | small package : but deeply coupled with : Geant4 + JUNOSW + Opticks |
Opticks-v0.2.4 | December 18 release https://github.com/simoncblyth/opticks/releases/tag/v0.2.4 |
Pre-Release usable only on IHEP GPU cluster (?)
Example test scripts in j repository:
Example job script for developers building JUNOSW+Opticks with bash junoenv opticks,bash junoenv offline
export OPTICKS_INPUT_GENSTEP=$BASE/jok-tds/ALL0/A%0.3d/genstep.npy ## sequence of genstep arrays to load across multiple SEvt folders
EYE=0,1.5,0 TMIN=1.3 ZOOM=4 ~/opticks/cxr_min.sh ## CSGOptiXRMTest
Using GEOM J23_1_0_rc3_ok0
Deferred geometry, switched off by tut_detsim.py options.
--no-guide_tube | OptiX 7.1 has curves : thought might enable G4Torus translation, but docs show are one-sided : so instead triangulate torus[T] ? | |
--debug-disable-xj | XJfixture XJanchor | Deep CSG trees require dev. to see if "listnode" (similar to G4MultiUnion) can provide solution |
--debug-disable-sj | SJCLSanchor SJFixture SJReceiver SJFixture | |
--debug-disable-fa | FastenerAcrylic |
Virtual surface shifts used to avoid degeneracy, together with defaults (shifts avoid chi2 discrepancies from degenerate surfaces):
export Tub3inchPMTV3Manager__VIRTUAL_DELTA_MM=0.10 ## 1.e-3 export HamamatsuMaskManager__MAGIC_virtual_thickness_MM=0.10 ## 0.05 export NNVTMaskManager__MAGIC_virtual_thickness_MM=0.10 ## 0.05
Completing these three : will match GPU and CPU geometry
idx | control script | initialization time (seconds) | Notes |
---|---|---|---|
[1] | ~/j/okjob.sh | 149 | JUNOSW+Opticks (tut_detsim.py "main") |
[2] | ~/opticks/g4cx/tests/G4CXTest_GEOM.sh | 127 | InputPhoton, TorchGenstep, NOT YET InputGenstep |
[3] | ~/opticks/CSGOptiX/cxs_min.sh | <2 | InputPhoton, TorchGenstep, InputGenstep |
~/o/G4CXTest_GEOM.sh ana ## python history comparison ~/o/sysrap/tests/sseq_index_test.sh ## C++ history comparison
a_path $AFOLD/seq.npy /data/blyth/opticks/GEOM/J23_1_0_rc3_ok0/G4CXTest/ALL98/A000/seq.npy a_seq (1000000, 2, 2, ) b_path $BFOLD/seq.npy /data/blyth/opticks/GEOM/J23_1_0_rc3_ok0/G4CXTest/ALL98/B000/seq.npy b_seq (1000000, 2, 2, ) AB [sseq_index_ab::desc u.size 152520 opt BRIEF mode 6 sseq_index_ab_chi2::desc sum 565.3332 ndf 504.0000 sum/ndf 1.1217 sseq_index_ab_chi2_ABSUM_MIN:200.0000 TO AB : 126549 126745 : 0.1517 : Y : 2 7 : TO BT BT BT BT BT BT SD : 70494 70397 : 0.0668 : Y : 18 2 : TO BT BT BT BT BT BT SA : 57103 57388 : 0.7094 : Y : 5 1 : TO SC AB : 51434 51094 : 1.1275 : Y : 4 48 : TO SC BT BT BT BT BT BT SD : 35878 35913 : 0.0171 : Y : 58 56 : TO SC BT BT BT BT BT BT SA : 29676 30061 : 2.4813 : Y : 124 85 : TO SC SC AB : 19993 19869 : 0.3857 : Y : 137 24 : TO BT BT SA : 18932 18869 : 0.1050 : Y : 71 148 : TO RE AB : 18319 18090 : 1.4403 : Y : 9 50 : TO SC SC BT BT BT BT BT BT SD : 15454 15326 : 0.5323 : Y : 19 8 : TO SC SC BT BT BT BT BT BT SA : 12785 12833 : 0.0899 : Y : 24 138 : TO BT BT AB : 10993 10949 : 0.0882 : Y : 72 26 : TO BT AB : 9250 9279 : 0.0454 : Y : 36 13 : TO BT BT BT BT BT BT BT SA : 7476 7577 : 0.6777 : Y : 176 634 : TO SC SC SC AB : 7544 7418 : 1.0611 : Y : 90 82 : TO RE BT BT BT BT BT BT SD : 7419 7272 : 1.4709 : Y : 197 73 : TO SC RE AB : 7137 7049 : 0.5459 : Y : 110 11 : ...
Test | Status |
---|---|
InputPhotons targetting PMTs | chi2 matched, no known issues |
TorchGenstep from CD center | chi2 marginal : chimney issue ? Probably some coincident surfaces to fix |
np.c_[siq,_quo,siq,sabo2,sc2,sabo1][bzero] ## history seq in A but not B : usually from degeneracy [['1107' 'TO BT BT BT BT BT BT BT BT SD ' '1107' ' 41 0' ' 0.0000' ' 11355 -1'] ['1305' 'TO BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT' '1305' ' 33 0' ' 0.0000' ' 11040 -1'] ['1623' 'TO BT BT DR BT BT BT SD ' '1623' ' 26 0' ' 0.0000' ' 1930 -1'] ['2375' 'TO BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT BT SD ' '2375' ' 17 0' ' 0.0000' ' 10972 -1'] ['3264' 'TO SC BT BT BT BT BT BT BR BT BT BT BT BT BT BT BT BT SD ' '3264' ' 12 0' ' 0.0000' ' 22140 -1']] In [1]: w = a.q_startswith("TO BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT BT") ; w Out[1]: array([ 11040, 15219, 118322, 152607, 165838, 215978, 299136, 374379, 395244, 422394, 427598, 434101, 443666, 445392, 479186, 531698, 549984, 592656, 604821, 637582, 656052, 736283, 777988, 789501, 821402, 837105, 853410, 898084, 903045, 923645, 927731, 974750, 989689]) In [2]: a.f.record[w[0],:,0] ## PHOTON STEP POINT POSITIONS : ALL SIMILAR : GOING UP THE CHIMNEY Out[2]: array([[ -1.594, 0.835, 99.984, 0. ], ## photon step point (x,y,z,t) mm,ns [ -284.142, 148.854, 17823.998, 81.302], [ -315.513, 165.289, 20000. , 88.563], [ -332.369, 174.119, 21750. , 94.401], [ -332.383, 174.127, 21752. , 94.407], [ -344.817, 180.641, 23500. , 103.396], [ -368.795, 193.202, 25752. , 110.911], [ -409.762, 214.663, 29599.7 , 123.75 ], [ -409.764, 214.664, 29599.85 , 123.75 ], ... [ -412.414, 216.053, 29848.799, 124.581], [ -412.424, 216.058, 29849.7 , 124.584], [ -412.426, 216.059, 29849.85 , 124.585], [ -412.532, 216.114, 29859.85 , 124.618], [ -412.534, 216.115, 29860. , 124.618], [ -412.543, 216.12 , 29860.9 , 124.621], [ -412.55 , 216.124, 29861.5 , 124.623], [ -412.56 , 216.129, 29862.5 , 124.627]], dtype=float32)
Investigating a discrepant Chimney photon : more of this + geometry examination needed to find cause of difference
TEST=large_scan ~/opticks/cxs_min.sh
Generate 20 optical only events with 0.1M->100M photons starting from CD center, gather and save only Hits.
OPTICKS_RUNNING_MODE=SRM_TORCH ## "Torch" running enables num_photon scan OPTICKS_NUM_PHOTON=H1:10,M2,3,5,7,10,20,40,60,80,100 OPTICKS_NUM_EVENT=20 OPTICKS_EVENT_MODE=Hit
Test Hardware | Notes |
---|---|
DELL Precison Workstation with NVIDIA TITAN RTX(24G) | Primary test hardware |
DELL Precision Workstation with NVIDIA TITAN V(12G) | VRAM limited |
DELL Precision Workstation with NVIDIA Quadro RTX 8000 (48G) | TODO : push to memory limit ~400M photons |
GPU cluster nodes with NVIDIA V100 (32GB) | TODO: Production Config Testing, expect ~250M photon per launch limit |
~/o/cxs_min.sh ## 2.2M hits from 10M photon TorchGenstep, 3.1 seconds
Release preprocessor macros : adds: PRODUCTION , removes: DEBUG_TAG, DEBUG_PIDX,...
Examine flattened kernel source CSGOptiX/CSGOptiX7.cu (103k lines) : all includes included
~/opticks/preprocessor.sh > /tmp/out.cc ## using gcc -E -C -P
Grepping Kernel PTX : Parallel Thread Execution ~Assembly code
Grepping PTX for doubles and printf, and then removing from source : opticks-ptx bash function eg:
grep \\.f64 $OPTICKS_PREFIX/ptx/CSGOptiX_generated_CSGOptiX7.cu.ptx
Debug : 0.341 seconds per million photons
Release : 0.314 seconds per million photons
Absolute Comparison with ancient Opticks Measurements.. ? [Below presented at CHEP 2019] 58s / 400M photons
JUNO analytic, 400M photons from center | Speedup | |
---|---|---|
Geant4 Extrap. | 95,600 s (26 hrs) | |
Opticks RTX ON (i) | 58 s | 1650x |
JUNO analytic, 400M photons from center | Speedup | Notes | |
---|---|---|---|
Geant4 Extrap. | 95,600 s (26 hrs) | Ancient (2019) | |
Opticks RTX ON (i) | 58 s | 1650x | Ancient (2019) |
JUNOSW+Opticks 1st | 124 s (~2x slower) | "770x" | extrapolated from 31s for 100M |
Practically everything different between these measurements : nevertheless, its natural to compare
~300 ns photon lifetime limit ?
OPTICKS_MAX_BOUNCE=32 ## curr. OPTICKS_MAX_NS=300 ## IDEA
Expected Primary Cause of 2x slowdown : "bouncy" POM
Use cxs_min_scan.sh to vary OPTICKS_MAX_BOUNCE from 0->32
Slow hit increase above MAX_BOUNCE 20
Yuxiang Hu : Gamma Event at CD center : Comparison of JUNOSW with JUNOSW+Opticks
Hit position, wavelength and time comparison
Yuxiang Hu : Gamma Event at CD center : Comparison of JUNOSW with JUNOSW+Opticks
Overall speedup [JSW/(JSW+Opticks)] | ~60X | UN-OPTIMIZED + PRELIM |
[Calculation: same TMM header as JUNOSW, Lookup: using uploaded "ART" texture (Gigabytes)]
S(n) Expected Speedup
optical photon simulation, P ~ 99% of CPU time
Must consider processing "big picture"
Very dependant on the parallel fraction
Theoretical Overall Speedup for various parallel fractions and parallelized speedups | ||||
---|---|---|---|---|
Parallelized Speedup | ||||
Parallel Fraction | 100x | 1000x | limit | Notes |
95% | 17x | 20x | 20x | Little benefit beyond ~100x parallelized speedup |
96% | 20x | 24x | 25x | |
97% | 25x | 32x | 33.3x | |
98% | 34x | 48x | 50x | Substantial benefit from more parallelized speedup |
99% | 50x | 91x | 100x |
In [1]: run ~/opticks/ana/amdahl.py In [2]: Amdahl.Overall_Speedup(np.array([100,1000,np.inf]),0.95) Out[2]: array([16.807, 19.627, 20. ]) In [3]: Amdahl.Overall_Speedup(np.array([100,1000,np.inf]),0.99) Out[3]: array([ 50.251, 90.992, 100. ])
https://bitbucket.org/simoncblyth/opticks | code repository (day-to-day) |
https://github.com/simoncblyth/opticks | code repository (~month-to-month), releases |
https://simoncblyth.bitbucket.io https://simoncblyth.github.io https://juno.ihep.ac.cn/~blyth/ | publications, presentations, videos |
https://groups.io/g/opticks | forum/mailing list archive |
email:opticks+subscribe@groups.io | subscribe to mailing list |
JUNOSW+Opticks
DocDB-10968 | 2023/12/19 | Using first JUNOSW+Opticks Pre-Release at IHEP GPU cluster |
DocDB-10929 | 2023/12/11 | JUNOSW + Opticks : Profiling and Status |