CUDA Context Cleanup¶

Observe that GPU memory usage is high and stays high for minutes even when no applications are actively using the GPU.

Is this due to bad actors not cleaning up their CUDA Contexts ?

does host thread death automatically cleanup any CUDA contexts that were openend ?

Running with OSX in normal GUI mode¶

Immediately after restart with Finder, Safari and Terminal:

delta:~ blyth$ cuda_info.sh
timestamp                Mon Dec  1 13:25:26 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              447.8M
memory free              1.7G
delta:~ blyth$

After a ~20min in Safari, Terminal, Sys Prefs a whole gig of GPU memory is gone:

delta:~ blyth$ cuda_info.sh
timestamp                Mon Dec  1 13:42:33 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              1.4G
memory free              738.7M
delta:~ blyth$

After sleeping during lunch, 1G of VRAM frees up:

delta:env blyth$ cuda_info.sh
timestamp                Mon Dec  1 14:55:09 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              440.8M
memory free              1.7G
delta:env blyth$

Pragmatic Solution¶

run in “>console” mode when you need maximum GPU memory
for GUI usage, restart the machine often to ensure will have enough GPU memory
- TODO: add a GPU memory configurable minimum to g4daeview.py, it will try to run with mapped/pinned host memory but the performance is factor 3-4 lower that when sufficient memory to go GPU resident

OSX VRAM¶

https://forums.adobe.com/thread/1326404
- some Adobe raycaster, running into VRAM pressure
http://www.anandtech.com/show/2804
http://arstechnica.com/apple/2005/04/macosx-10-4/13/
- abouts OSX VRAM usage, number of open windows matters
retina screen support is presumably eating lots of VRAM

Running without GUI using “>console” login¶

GPU memory is almost entirely free:

delta:~ blyth$ cuda_info.sh
timestamp                Mon Nov 24 12:57:17 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              96.0M
memory free              2.1G
delta:~ blyth$

While running g4daechroma.sh:

delta:~ blyth$ cuda_info.sh
timestamp                Mon Nov 24 13:01:46 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              111.2M
memory free              2.0G

Huh memory usage seems variable, sometimes get 220M used.

After one mocknuwa run:

elta:~ blyth$ cuda_info.sh
timestamp                Mon Nov 24 13:04:28 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              277.2M
memory free              1.9G
delta:~ blyth$

After 2nd mocknuwa, not increasing:

delta:~ blyth$ cuda_info.sh
timestamp                Mon Nov 24 13:06:13 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              277.2M
memory free              1.9G
delta:~ blyth$

ctrl-c interrupt g4daechroma.py cleans up ok:

delta:~ blyth$ cuda_info.sh
timestamp                Mon Nov 24 13:07:50 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              96.0M
memory free              2.1G
delta:~ blyth$

Repeating:

delta:~ blyth$ cuda_info.sh
timestamp                Mon Nov 24 13:09:06 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              111.2M
memory free              2.0G

Mem reporting from inside the process doesnt match the above:

chroma_env)delta:MockNuWa blyth$ python mocknuwa.py
             a_min_free_gpu_mem :     300.00M  300000000
             b_node_array_usage :      54.91M  54909600
                b_node_itemsize :      16.00M  16
                  b_split_index :       3.43M  3431850
                      b_n_extra :       1.00M  1
                      b_n_nodes :       3.43M  3431850
                    b_splitting :       0.00M  0
              c_triangle_nbytes :      28.83M  28829184
                 c_triangle_gpu :       1.00M  1
              d_vertices_nbytes :      14.60M  14597424
                 d_triangle_gpu :       1.00M  1
                     a_gpu_used :      99.57M  99573760
                     b_gpu_used :     129.72M  129720320
                     c_gpu_used :     184.64M  184639488
                     d_gpu_used :     213.48M  213475328
                     e_gpu_used :     228.16M  228155392
(chroma_env)delta:MockNuWa blyth$

Huh GPUGeometry init only happening when the first evt arrives:

2014-11-24 13:22:58,720 INFO    env.geant4.geometry.collada.g4daeview.daedirectpropagator:53  DAEDirectPropagator ctrl {u'reset_rng_states': 1, u'nthreads_per_block': 64, u'seed': 0, u'max_blocks': 1024, u'max_steps': 30, u'COLUMNS': u'max_blocks:i,max_steps:i,nthreads_per_block:i,reset_rng_states:i,seed:i'}
2014-11-24 13:22:58,720 WARNING env.geant4.geometry.collada.g4daeview.daedirectpropagator:63  reset_rng_states
2014-11-24 13:22:58,720 INFO    env.geant4.geometry.collada.g4daeview.daechromacontext:182 _set_rng_states
2014-11-24 13:22:58,851 INFO    chroma.gpu.geometry :19  GPUGeometry.__init__ min_free_gpu_mem 300000000.0
2014-11-24 13:22:59,073 INFO    chroma.gpu.geometry :206 Optimization: Sufficient memory to move triangles onto GPU
2014-11-24 13:22:59,085 INFO    chroma.gpu.geometry :220 Optimization: Sufficient memory to move vertices onto GPU
2014-11-24 13:22:59,085 INFO    chroma.gpu.geometry :248 device usage:
----------
nodes             3.4M  54.9M
total                   54.9M
----------
device total             2.1G
device used            228.2M
device free              1.9G

2014-11-24 13:22:59,089 INFO    env.geant4.geometry.collada.g4daeview.daechromacontext:177 _get_rng_states
2014-11-24 13:22:59,090 INFO    env.geant4.geometry.collada.g4daeview.daechromacontext:132 setup_rng_states using seed 0
2014-11-24 13:22:59,512 INFO    chroma.gpu.photon_hit:204 nwork 4165 step 0 max_steps 30 nsteps 30
2014-11-24 13:23:00,157 INFO    chroma.gpu.photon_hit:242 step 0 propagate_hit_kernel times  [0.6453909912109375]
2014-11-24 13:23:00,319 INFO    env.geant4.geometry.collada.g4daeview.daedirectpropagator:86  daedirectpropagator:propagate returning photons_end.as_npl()

Timings are not stable, even when running in console mode with no memory or other GPU user contention.

Stuck Python Process¶

Killing an old stuck process succeeds to free some ~200M of GPU memory, but still how is 1.7 G being used. When running with visible apps only Finder and Terminal.

(chroma_env)delta:MockNuWa blyth$ cuda_info.py
timestamp                Mon Nov 24 12:40:09 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              1.9G
memory free              232.1M
(chroma_env)delta:MockNuWa blyth$ ps aux | grep python
blyth           69938   1.2  0.2 35266100  31340 s000  S+    3Nov14 126:41.78 python /Users/blyth/env/bin/daedirectpropagator.py mock001
blyth            2313   0.0  0.0  2423368    284 s007  R+   12:40PM   0:00.00 grep python
(chroma_env)delta:MockNuWa blyth$ kill -9 69938
(chroma_env)delta:MockNuWa blyth$ ps aux | grep python
blyth            2315   0.0  0.0  2423368    240 s007  R+   12:40PM   0:00.00 grep python
(chroma_env)delta:MockNuWa blyth$ cuda_info.py
timestamp                Mon Nov 24 12:40:47 2014
tag                      default
name                     GeForce GT 750M
compute capability       (3, 0)
memory total             2.1G
memory used              1.7G
memory free              400.0M

Search¶

google:CUDA Context Cleanup

https://devblogs.nvidia.com/parallelforall/pro-tip-clean-up-after-yourself-ensure-correct-profiling/

If your application uses the CUDA Runtime API, call cudaDeviceReset() just before exiting, or when the application finishes making CUDA calls and using device data. If your application uses the CUDA Driver API, call cuProfilerStop() on each context to flush the profiling buffers before destroying the context with cuCtxDestroy().

Links

Content Skeleton

This Page

Previous topic

Next topic

CUDA Context Cleanup¶

Running with OSX in normal GUI mode¶

Pragmatic Solution¶

OSX VRAM¶

Stuck Python Process¶

Search¶

Navigation

Quick search

Links

Content Skeleton

This Page

Previous topic

Next topic

CUDA Context Cleanup¶

Running with OSX in normal GUI mode¶

Pragmatic Solution¶

OSX VRAM¶

Running without GUI using “>console” login¶

Stuck Python Process¶

Search¶

Navigation