Perfect normal incidence resulted in an NAN_ABORT. Observed with a vertical photon incident on the horizontal AD table, on which the ADs sit. Worked around by special casing normal incidence.
Need to login GPU machine in “>console” only mode in order to use the debugger. Most convenient to do that over ssh, for multiple connections and copy and pasting.
(chroma_env)delta:g4daeview blyth$ g4daechroma.sh --cuda-gdb
Suspect a handful of stuck threads:
(cuda-gdb) info threads
4 Thread 0x261b of process 44071 0x00007fff9292664a in -[PAStackshot initWithGlobalTrace].tracebuf () from /usr/lib/system/libsystem_kernel.dylib
3 Thread 0x240b of process 44071 0x00007fff9292664a in -[PAStackshot initWithGlobalTrace].tracebuf () from /usr/lib/system/libsystem_kernel.dylib
2 Thread 0x23f7 of process 44071 0x00007fff92921a1a in -[PAStackshot initWithGlobalTrace].tracebuf () from /usr/lib/system/libsystem_kernel.dylib
* 1 Thread 0x2203 of process 44071 0x00007fff89fc1b2d in pthread_threadid_np () from /usr/lib/system/libsystem_pthread.dylib
(cuda-gdb) kill
Kill the program being debugged? (y or n) y
[Termination of CUDA Kernel 9 (init_rng<<<(1025,1,1),(64,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 8 (reduce_kernel_stage2<<<(1,1,1),(512,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 7 (reduce_kernel_stage1<<<(3,1,1),(512,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 5 (init_rng<<<(1025,1,1),(64,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 4 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 3 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 2 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 1 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 0 (write_size<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 10 (propagate_hit<<<(2,1,1),(64,1,1)>>>) on Device 0]
[Termination of CUDA Kernel 6 (propagate_hit<<<(66,1,1),(64,1,1)>>>) on Device 0]
(cuda-gdb)
Suspicion turned out to be correct, axis aligned (actually z-aligned vertical photons) were not reaching intersection. They were just continuously testing against boxes, this was due to some infinities meaning that the x,y info was effectively not used. Workaround was to special case axis aligned photons.
# vbo propagation
# photon_id / slot / history / pmtid
In [23]: h[:,3,:4].view(np.int32)
Out[23]:
array([[ 62, 22, 68, 16844311],
[ 74, 22, 84, 16844311],
[ 82, 22, 68, 16844810],
[ 93, 22, 580, 16843276],
[ 164, 22, 68, 16843021],
[ 170, 11, 68, 16844291],
[ 194, 5, 516, 16844033],
[ 248, 6, 516, 16843800],
Some photon lists like mock001 succeed to propagate, some like mock007 causing timeouts.
pycuda errors that manifest as timeouts can be due to the GPU equivalent of a segfault which kills the context, and subsequently causes the timeout as the host has no context to talk to on device.
Are certain photon parameters causing “segfaults” on GPU ?
File "/usr/local/env/chroma_env/lib/python2.7/site-packages/env/geant4/geometry/collada/g4daeview/daephotons.py", line 222, in propagate
self.propagator.interop_propagate( vbo, max_steps=max_steps, max_slots=max_slots )
File "/usr/local/env/chroma_env/lib/python2.7/site-packages/env/geant4/geometry/collada/g4daeview/daephotonspropagator.py", line 192, in interop_propagate
self.propagate( vbo_dev_ptr, max_steps=max_steps, max_slots=max_slots )
File "/usr/local/env/chroma_env/lib/python2.7/site-packages/env/geant4/geometry/collada/g4daeview/daephotonspropagator.py", line 160, in propagate
t = get_time()
File "/usr/local/env/chroma_env/lib/python2.7/site-packages/pycuda/driver.py", line 453, in get_call_time
end.synchronize()
pycuda._driver.LaunchError: cuEventSynchronize failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuGLUnmapBufferObject failed: launch timeout
(chroma_env)delta:g4daeview blyth$
(chroma_env)delta:g4daeview blyth$
(chroma_env)delta:g4daeview blyth$ g4daeview.sh --load mock007
Using the transform cache, samples of photons were prepared with directions oriented with respect to the PMTs. Eg bullseye photons.
To visualize initial photons load with -P/–nopropagate
g4daeview.sh --load mock002 --nopropagate --geometry-regexp PmtHemiCathode
//transport->GetPhotons()->Save("mock002"); // ldir +y
//transport->GetPhotons()->Save("mock003"); // ldir +x
//transport->GetPhotons()->Save("mock004"); // ldir +z
//transport->GetPhotons()->Save("mock005"); // lpos (0,0,100) ldir (0,0,-1) try to shoot directly at PMT
//transport->GetPhotons()->Save("mock006"); // lpos (0,0,500) ldir (0,0,-1) try to shoot directly at PMT
//transport->GetPhotons()->Save("mock007"); // lpos (0,0,1500) ldir (0,0,-1) try to shoot directly at PMT
In [36]: h.shape
Out[36]: (146, 4, 4)
In [37]: h = ph("h1")
In [38]: h.shape
Out[38]: (33, 4, 4)
While running:
# non-vbo propagation using propagate_hit.cu gpu/photon_hit.py GPUPhotonsHit
g4daechroma.sh
# vbo propagation with the GUI
g4daeview.sh --live
g4daeview.sh --zmqendpoint=tcp://localhost:5002
# the broker
czmq-;czmq-broker-local
Provoke a propagation with:
mocknuwa.sh 1
g4daeview.sh --load mock002 --nopropagate --geometry-regexp PmtHemiCathode
udp.py --load mock002
udp.py --load mock003
udp.py --propagate
Kernel invoked from interop_propagate daephotons.py:
182 def propagate(self, max_steps=100):
...
216 vbo = self.renderer.pbuffer
217
218 self.propagator.update_constants()
219
220 if not self.config.args.propagate:
221 log.warn("propagation is inhibited by config: -P/--nopropagate ")
222 else:
223 log.warn("propagation proceeding")
224 self.propagator.interop_propagate( vbo, max_steps=max_steps, max_slots=max_slots )
225 pass
226
227 propagated = vbo.read()
kernel call daephotonspropagator.py:
.92 def propagate(self,
93 vbo_dev_ptr,
94 max_steps=100,
95 max_slots=30,
96 use_weights=False,
97 scatter_first=0):
98 """
...
145 grid=(blocks, 1)
146 args = ( np.int32(first_photon),
147 np.int32(photons_this_round),
148 self.input_queue_gpu[1:].gpudata,
149 self.output_queue_gpu.gpudata,
150 self.ctx.rng_states,
151 vbo_dev_ptr,
152 np.int32(nsteps),
153 np.int32(max_slots),
154 np.int32(use_weights),
155 np.int32(scatter_first),
156 self.ctx.gpu_geometry.gpudata)
157
158 get_time = self.kernel.prepared_timed_call( grid, block, *args )
cuda/propagate_vbo.cu:
488 __global__ void
489 propagate_vbo( int first_photon,
490 int nthreads,
491 unsigned int *input_queue,
492 unsigned int *output_queue,
493 curandState *rng_states,
494 float4 *vbo,
495 int max_steps,
496 int max_slots,
497 int use_weights,
498 int scatter_first,
499 Geometry *g)
500 {
Hmm, can i access the maps from the Geometry struct GPU side ? Nope not there:
54 struct Geometry
55 {
56 float3 *vertices;
57 uint3 *triangles;
58 unsigned int *material_codes;
59 unsigned int *colors;
60 uint4 *primary_nodes;
61 uint4 *extra_nodes;
62 Material **materials;
63 Surface **surfaces;
64 float3 world_origin;
65 float world_scale;
66 int nprimary_nodes;
67 };
4 struct Detector
5 {
6 // Order in decreasing size to avoid alignment problems
7 int *solid_id_to_channel_index;
Must use GPUDetector (not GPUGeometry) to have the mapping arrays.
gpu/detector.py:
16 class GPUDetector(GPUGeometry):
17 def __init__(self, detector, wavelengths=None, print_usage=False):
18 GPUGeometry.__init__(self, detector, wavelengths=wavelengths, print_usage=False)
19
20 self.solid_id_to_channel_index_gpu = \
21 ga.to_gpu(detector.solid_id_to_channel_index.astype(np.int32))
22 self.solid_id_to_channel_id_gpu = \
23 ga.to_gpu(detector.solid_id_to_channel_id.astype(np.int32))
24
gpu/photon_hit.py:
176 solid_id_map_gpu = gpu_geometry.solid_id_map
177 solid_id_to_channel_id_gpu = gpu_geometry.solid_id_to_channel_id_gpu
178
...
197 grid = (blocks, 1)
198 args = (
199 np.int32(first_photon),
200 np.int32(photons_this_round),
201 self.input_queue_gpu[1:].gpudata,
202 self.output_queue_gpu.gpudata,
203 rng_states,
204 self.pos.gpudata,
205 self.dir.gpudata,
206 self.wavelengths.gpudata,
207 self.pol.gpudata,
208 self.t.gpudata,
209 self.flags.gpudata,
210 self.last_hit_triangles.gpudata,
211 self.weights.gpudata,
212 np.int32(nsteps),
213 np.int32(use_weights),
214 np.int32(scatter_first),
215 gpu_geometry.gpudata,
216 solid_id_map_gpu.gpudata,
217 solid_id_to_channel_id_gpu.gpudata,
218 )
219 get_time = self.propagate_hit_kernel.prepared_timed_call( grid, block, *args )
220 t = get_time()
cuda/propagate_hit.cu:
118 // iiPPPPPPPPPPPiiiP
119
120 __global__ void
121 propagate_hit(
122 int first_photon,
123 int nthreads,
124 unsigned int *input_queue,
125 unsigned int *output_queue,
126 curandState *rng_states,
127 float3 *positions,
128 float3 *directions,
129 float *wavelengths,
130 float3 *polarizations,
131 float *times,
132 unsigned int *histories,
133 int *last_hit_triangles,
134 float *weights,
135 int max_steps,
136 int use_weights,
137 int scatter_first,
138 Geometry *g,
139 int* solid_map,
140 int* solid_id_to_channel_id )
141 {
...
233 if ((p.history & SURFACE_DETECT) != 0) {
234
235 //
236 // kludgy mis-use of lht for outputting
237 // various things like
238 // solid_id: index like, zero based
239 // channel_id: the pmtid, encoding site/ad/ring/...
240 //
241 int triangle_id = last_hit_triangles[photon_id];
242 if (triangle_id > -1) {
243 int solid_id = solid_map[triangle_id];
244 int channel_id = solid_id_to_channel_id[solid_id];
245 last_hit_triangles[photon_id] = channel_id ;
246 } else {
247 last_hit_triangles[photon_id] = -2 ;
248 }
In [7]: h = ph("h1")
In [8]: a = h[:,3,0].view(np.int32)
In [9]: b = h[:,3,1].view(np.int32)
In [10]: c = h[:,3,2].view(np.int32)
In [11]: a[a != 0]
Out[11]:
array([ 750, 276, 816, 342, 486, 702, 1044, 936, 696, 696, 1050,
1194, 372, 390, 756, 1086, 762, 1134, 786, 726, 1026, 408,
912, 48, 102, 78, 756, 942, 954, 1164, 108, 876, 1092,
702, 504, 414, 702, 498, 522, 546, 768, 324, 1086, 1008,
...
In [13]: np.set_printoptions(formatter={'int':hex})
In [14]: b[b != 0]
Out[14]:
array([0x1010516, 0x101020f, 0x1010609, 0x1010302, 0x1010402, 0x101050e,
0x1010717, 0x1010705, 0x101050d, 0x101050d, 0x1010718, 0x1010818,
0x1010307, 0x101030a, 0x1010517, 0x1010806, 0x1010518, 0x101080e,
0x1010604, 0x1010512, 0x1010714, 0x101030d, 0x1010701, 0x1010101,
...
In [16]: np.set_printoptions(formatter={'int':None})
In [17]: c[c != 0]
Out[17]: array([888, 888, 888, ..., 888, 888, 888], dtype=int32)
In [12]: a = ph("1")
In [13]: h = ph("h1")
In [14]: a.shape
Out[14]: (4165, 4, 4)
In [15]: h.shape
Out[15]: (52, 4, 4)
In [16]: np.set_printoptions(formatter={'int':hex})
In [17]: h[:,3,3]
Out[17]:
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
In [18]: h[:,3,3].view(np.int32)
Out[18]:
array([0x1010516, 0x1010302, 0x1010402, 0x1010717, 0x1010718, 0x1010517,
0x1010518, 0x1010701, 0x1010106, 0x1010706, 0x1010708, 0x101010b,
0x101050e, 0x101040c, 0x1010601, 0x1010201, 0x101020d, 0x101020d,
0x1010502, 0x1010209, 0x101070d, 0x1010602, 0x1010715, 0x1010108,
0x1010407, 0x1010418, 0x101040b, 0x101060c, 0x1010709, 0x1010409,
0x101050d, 0x101050d, 0x1010613, 0x1010707, 0x1010516, 0x101020d,
0x1010201, 0x1010308, 0x101040f, 0x101010e, 0x1010109, 0x1010417,
0x101050c, 0x1010309, 0x1010213, 0x101050c, 0x1010402, 0x101040e,
0x1010716, 0x1010315, 0x101010f, 0x1010416], dtype=int32)
Hmm for comparison need photon index in the hits array