Links

Content Skeleton

This Page

Previous topic

Chroma Use of PyUblas

Next topic

Chroma Materials

Chroma Geometry Source Overview

GPU memory usage of chroma geometry

For simple memory check make sure a non-split load.

2nd gpu_used numbers show change with prior.

(chroma_env)delta:MockNuWa blyth$ python mocknuwa.py
                              a :   start
                     a_gpu_used :      1269.06M
             a_min_free_gpu_mem :       300.00M
                              b :   after materials,surfaces
                     b_gpu_used :      1270.11M           1.05M
                              c :   after colors, idmap
                     c_gpu_used :      1299.48M          29.37M     ## large? for colors and idmap, maybe context setup for kernels ?
                              d :   after nodes
                     d_gpu_used :      1354.40M          54.92M     ## matches expectation for nodes
                 d_nodes_nbytes :        54.91M
               d_nodes_itemsize :   16
                  d_nodes_count :         3.43M      16 * 3431850 = 54909600  (     54.91 M)
                  d_split_index :         3.43M
            d_extra_nodes_count :   1
                    d_splitting :   0
             e_triangles_nbytes :        28.83M
           e_triangles_itemsize :   12
              e_triangles_count :         2.40M      12 * 2402432 = 28829184  (     28.83 M)
                              e :   after triangles
                     e_gpu_used :      1383.23M          28.84M     ## matches expectation for triangles
                e_triangles_gpu :   1
              f_vertices_nbytes :        14.60M
            f_vertices_itemsize :   12
               f_vertices_count :         1.22M      12 * 1216452 = 14597424  (     14.60 M)
                              f :   after vertices
                     f_gpu_used :      1397.91M          14.68M     ## matches expectation for vertices
                 f_vertices_gpu :   1
                              g :   after geometry struct
                     g_gpu_used :      1397.91M     0
(chroma_env)delta:MockNuWa blyth$
In [12]: 1397.91 - 1269.06
Out[12]: 128.85000000000014       # only ~129 MB

where is geometry populated

So where is Geometry populated:

(chroma_env)delta:cuda blyth$ grep geometry_types.h *.*
bvh.cu:#include "geometry_types.h"
geometry.h:#include "geometry_types.h"    # device funcs that query the geometry, accessing nodes/triangles etc..

From python with chroma/gpu/geometry.py:GPUGeometry using pycuda.gpuarray and chroma.gpu.tools.make_gpu_struct

chroma/cuda/geometry_types.h

46 struct Node
47 {
48     float3 lower;
49     float3 upper;
50     unsigned int child;
51     unsigned int nchild;
52 };
53
54 struct Geometry
55 {
56     float3 *vertices;
57     uint3 *triangles;
58     unsigned int *material_codes;
59     unsigned int *colors;
60     uint4 *primary_nodes;
61     uint4 *extra_nodes;
62     Material **materials;
63     Surface **surfaces;
64     float3 world_origin;
65     float world_scale;
66     int nprimary_nodes;
67 };

chroma/gpu/geometry.py

GPUGeometry class that converts into GPU side geometry using CUDA types from geometry_types.h

simon:chroma blyth$ find . -name '*.py' -exec grep -H GPUGeometry {} \;

./camera.py:        self.gpu_geometry = gpu.GPUGeometry(self.geometry)
./camera.py:                gpu_geometry = gpu.GPUGeometry(geometry, print_usage=False)
./camera.py:        gpu_geometry = gpu.GPUGeometry(geometry)

./gpu/detector.py:from chroma.gpu.geometry import GPUGeometry
./gpu/detector.py:class GPUDetector(GPUGeometry):
./gpu/detector.py:        GPUGeometry.__init__(self, detector, wavelengths=wavelengths, print_usage=False)

./gpu/geometry.py:class GPUGeometry(object):

./sim.py:            self.gpu_geometry = gpu.GPUGeometry(detector)


(chroma_env)delta:chroma blyth$ find ../bin -type f -exec grep -H GPUGeometry {} \;

../bin/chroma-bvh:from chroma.gpu.geometry import GPUGeometry
../bin/chroma-bvh:    gpu_geometry = GPUGeometry(geometry)

GPUGeometry

GPU Geometry struct is constructed and populated from the below python using

chroma/gpu/geometry.py:

13 class GPUGeometry(object):
14     def __init__(self, geometry, wavelengths=None, print_usage=False, min_free_gpu_mem=300e6):
15         if wavelengths is None:
16             wavelengths = standard_wavelengths
17
18         try:
19             wavelength_step = np.unique(np.diff(wavelengths)).item()
20         except ValueError:
21             raise ValueError('wavelengths must be equally spaced apart.')
22
23         geometry_source = get_cu_source('geometry_types.h')
24         material_struct_size = characterize.sizeof('Material', geometry_source)
25         surface_struct_size = characterize.sizeof('Surface', geometry_source)
26         geometry_struct_size = characterize.sizeof('Geometry', geometry_source)
27
..

The materials/surfaces/mesh from the python geometry object are transferred over to the GPU and Geometry struct is created to hold the pointers.

Some fiddly splitting of data between GPU and CPU ?:

196 def Mapped(array):
197     '''Analog to pycuda.driver.InOut(), but indicates this array
198     is memory mapped to the device space and should not be copied.
199
200     To simplify coding, Mapped() will pass anything with a gpudata
201     member, like a gpuarray, through unchanged.
202     '''
203     if hasattr(array, 'gpudata'):
204         return array
205     else:
206         return np.intp(array.base.get_device_pointer())

In [91]: np.intp
Out[91]: numpy.int32

Heavy lifting of getting geometry onto GPU:

134         self.vertices = mapped_empty(shape=len(geometry.mesh.vertices),
135                                      dtype=ga.vec.float3,
136                                      write_combined=True)
137         self.triangles = mapped_empty(shape=len(geometry.mesh.triangles),
138                                       dtype=ga.vec.uint3,
139                                       write_combined=True)
140         self.vertices[:] = to_float3(geometry.mesh.vertices)
141         self.triangles[:] = to_uint3(geometry.mesh.triangles)
142

143         self.world_origin = ga.vec.make_float3(*geometry.bvh.world_coords.world_origin)
144         self.world_scale = np.float32(geometry.bvh.world_coords.world_scale)

145
146         material_codes = (((geometry.material1_index & 0xff) << 24) |
147                           ((geometry.material2_index & 0xff) << 16) |
148                           ((geometry.surface_index & 0xff) << 8)).astype(np.uint32)
149         self.material_codes = ga.to_gpu(material_codes)

///   packing 3 single byte indices (0:255) into unint32 with low byte empty

In [110]: "0x%x" %  (((0xaa & 0xff) << 24) | ((0xbb & 0xff) << 16) |  ((0xcc & 0xff) << 8))
Out[110]: '0xaabbcc00'



150         colors = geometry.colors.astype(np.uint32)
151         self.colors = ga.to_gpu(colors)

152         self.solid_id_map = ga.to_gpu(geometry.solid_id.astype(np.uint32))
153


154         # Limit memory usage by splitting BVH into on and off-GPU parts
155         gpu_free, gpu_total = cuda.mem_get_info()
156         node_array_usage = geometry.bvh.nodes.nbytes
157
158         # Figure out how many elements we can fit on the GPU,
159         # but no fewer than 100 elements, and no more than the number of actual nodes
160         n_nodes = len(geometry.bvh.nodes)
161         split_index = min(
162             max(int((gpu_free - min_free_gpu_mem) / geometry.bvh.nodes.itemsize),100),
163             n_nodes
164             )
165
166         self.nodes = ga.to_gpu(geometry.bvh.nodes[:split_index])
167         n_extra = max(1, (n_nodes - split_index)) # forbid zero size
168         self.extra_nodes = mapped_empty(shape=n_extra,
169                                         dtype=geometry.bvh.nodes.dtype,
170                                         write_combined=True)
171         if split_index < n_nodes:
172             logger.info('Splitting BVH between GPU and CPU memory at node %d' % split_index)
173             self.extra_nodes[:] = geometry.bvh.nodes[split_index:]
174
175         # See if there is enough memory to put the and/ortriangles back on the GPU
176         gpu_free, gpu_total = cuda.mem_get_info()
177         if self.triangles.nbytes < (gpu_free - min_free_gpu_mem):
178             self.triangles = ga.to_gpu(self.triangles)
179             logger.info('Optimization: Sufficient memory to move triangles onto GPU')
180
181         gpu_free, gpu_total = cuda.mem_get_info()
182         if self.vertices.nbytes < (gpu_free - min_free_gpu_mem):
183             self.vertices = ga.to_gpu(self.vertices)
184             logger.info('Optimization: Sufficient memory to move vertices onto GPU')
185
186         self.gpudata = make_gpu_struct(geometry_struct_size,
187                                        [Mapped(self.vertices),
188                                         Mapped(self.triangles),
189                                         self.material_codes,
190                                         self.colors, self.nodes,
191                                         Mapped(self.extra_nodes),
192                                         self.material_pointer_array,
193                                         self.surface_pointer_array,
194                                         self.world_origin,
195                                         self.world_scale,
196                                         np.int32(len(self.nodes))])

material and surface indices packed into material_codes

Hmm, there are loadsa materials and surfaces, so whats this material_codes qty ?:

145
146         material_codes = (((geometry.material1_index & 0xff) << 24) |
147                           ((geometry.material2_index & 0xff) << 16) |
148                           ((geometry.surface_index & 0xff) << 8)).astype(np.uint32)
149         self.material_codes = ga.to_gpu(material_codes)

///   packing 3 single byte indices (0:255) into unint32 with low byte empty

In [110]: "0x%x" %  (((0xaa & 0xff) << 24) | ((0xbb & 0xff) << 16) |  ((0xcc & 0xff) << 8))
Out[110]: '0xaabbcc00'

In [130]: (np.array([(1<<8)-1, (1<<8), (1<<8)+1 ]) & 0xff ).astype(np.uint32)
Out[130]: array([255,   0,   1], dtype=uint32)

chroma/geometry.py:

301     def flatten(self):
302         """
303         Create the flat list of triangles (and triangle properties)
304         from the list of solids in this geometry.
305
306         This does not build the BVH!  If you want to use the geometry
307         for rendering or simulation, you should call build() instead.
308         """
309
310         # Don't run this function twice!
311         if hasattr(self, 'mesh'):
312             return
313
314         nv = np.cumsum([0] + [len(solid.mesh.vertices) for solid in self.solids])
315         nt = np.cumsum([0] + [len(solid.mesh.triangles) for solid in self.solids])
316
317         vertices = np.empty((nv[-1],3), dtype=np.float32)
318         triangles = np.empty((nt[-1],3), dtype=np.uint32)
319
320
321         logger.info('Flattening detector mesh...')
322         logger.info('  triangles: %d' % len(triangles))
323         logger.info('  vertices:  %d' % len(vertices))
324
325
326         for i, solid in enumerate(self.solids):
327             vertices[nv[i]:nv[i+1]] = \
328                 np.inner(solid.mesh.vertices, self.solid_rotations[i]) + self.solid_displacements[i]
329             triangles[nt[i]:nt[i+1]] = solid.mesh.triangles + nv[i]
330
331         # Different solids are very unlikely to share vertices, so this goes much faster
332         self.mesh = Mesh(vertices, triangles, remove_duplicate_vertices=False)
333
334         self.colors = np.concatenate([solid.color for solid in self.solids])
335
336         self.solid_id = np.concatenate([filled_array(i, shape=len(solid.mesh.triangles), dtype=np.uint32) for i, solid in enumerate(self.solids)])
337
338         self.unique_materials = list(np.unique(np.concatenate([solid.unique_materials for solid in self.solids])))
339
340         material_lookup = dict(zip(self.unique_materials, range(len(self.unique_materials))))
//
//          map from material to its unique index
//
341
342         self.material1_index = np.concatenate([solid.material1_indices(material_lookup) for solid in self.solids])
343
//          array of material1 indices of every triangle in every solid
//
344         self.material2_index = np.concatenate([solid.material2_indices(material_lookup) for solid in self.solids])
345
346         self.unique_surfaces = list(np.unique(np.concatenate([solid.unique_surfaces for solid in self.solids])))
347
348         surface_lookup = dict(zip(self.unique_surfaces, range(len(self.unique_surfaces))))
349
350         self.surface_index = np.concatenate([solid.surface_indices(surface_lookup) for solid in self.solids])
351
352         try:
353             self.surface_index[self.surface_index == surface_lookup[None]] = -1
354         except KeyError:
355             pass
356

chroma/cuda/photon.h:

79 __device__ void
80 fill_state(State &s, Photon &p, Geometry *g)
81 {
82     p.last_hit_triangle = intersect_mesh(p.position, p.direction, g,
83                                          s.distance_to_boundary,
84                                          p.last_hit_triangle);
85
86     if (p.last_hit_triangle == -1) {
87         p.history |= NO_HIT;
88         return;
89     }
90
91     Triangle t = get_triangle(g, p.last_hit_triangle);
92
93     unsigned int material_code = g->material_codes[p.last_hit_triangle];
94
95     int inner_material_index = convert(0xFF & (material_code >> 24));
96     int outer_material_index = convert(0xFF & (material_code >> 16));
97     s.surface_index = convert(0xFF & (material_code >> 8));


In [135]: "0x%x" % (0xff & (0xaabbcc00 >> 24 ))
Out[135]: '0xaa'

In [136]: "0x%x" % (0xff & (0xaabbcc00 >> 16 ))
Out[136]: '0xbb'

In [137]: "0x%x" % (0xff & (0xaabbcc00 >> 8 ))
Out[137]: '0xcc'

98
99     float3 v01 = t.v1 - t.v0;
00     float3 v12 = t.v2 - t.v1;
01
02     s.surface_normal = normalize(cross(v01, v12));
03
04     Material *material1, *material2;
05     if (dot(s.surface_normal,-p.direction) > 0.0f) {
06         // outside to inside
07         material1 = g->materials[outer_material_index];
08         material2 = g->materials[inner_material_index];
09
10         s.inside_to_outside = false;
11     }
12     else {
13         // inside to outside
14         material1 = g->materials[inner_material_index];
15         material2 = g->materials[outer_material_index];
16         s.surface_normal = -s.surface_normal;
17
18         s.inside_to_outside = true;
19     }
20
21     s.refractive_index1 = interp_property(material1, p.wavelength,
22                                           material1->refractive_index);
23     s.refractive_index2 = interp_property(material2, p.wavelength,
24                                           material2->refractive_index);
25     s.absorption_length = interp_property(material1, p.wavelength,
26                                           material1->absorption_length);
27     s.scattering_length = interp_property(material1, p.wavelength,
28                                           material1->scattering_length);
29     s.reemission_prob = interp_property(material1, p.wavelength,
30                                         material1->reemission_prob);
31
32     s.material1 = material1;
33 } // fill_state

chroma/loader.py

def load_geometry_from_string

28       "filename.stl" or "filename.stl.bz2" - Create a geometry from a
29           3D mesh on disk.  This model will not be cached, but the
30           BVH can be, depending on whether update_bvh_cache is True.

chroma/stl.py

Parse STL files (simple format of vertices and triangles) into Mesh objects.

chroma/geometry.py

  • Is the below wavelength comment outdated ?

My impression was that the wavelengths used are held in the material/surface structs and interpolated as appropriate.:

15 # all material/surface properties are interpolated at these
16 # wavelengths when they are sent to the gpu
17 standard_wavelengths = np.arange(60, 810, 20).astype(np.float32)
18
19 class Mesh(object):
20     "Triangle mesh object."
21     def __init__(self, vertices, triangles, remove_duplicate_vertices=False):
22         vertices = np.asarray(vertices, dtype=np.float32)
23         triangles = np.asarray(triangles, dtype=np.int32)

Python side geometry

  • Geometry, a detector_material and a list of Solids, rotations and displacements
    • flatten method determines global unique_materials, unique_surfaces from those for each solid
  • Solid, attaches materials, surfaces, and colors to each triangle in the Mesh object argument
  • Mesh , comprising arrays of vertices and triangles
  • Material, with name and wavelength dependant property arrays:
    • refractive_index
    • absorption_length
    • scattering_length
    • reemission_prob
    • reemission_cdf
    • density
    • composition
  • Surface, with name and model and wavelength dependant optical property arrays:
    • detect/absort/reemit/reflect_diffuse/reflect_specular/eta/k/reemission_cdf/thickness/transmissive

Q: where all these properties getting set ?

  • not in the STL, thats a very simple list of vertices/triangles

Chroma geometry construction currently done in “ad-hoc” python such as chroma/demo/__init__.py, Not out of some “standard” file format, like G4DAE COLLADA+metadata

Chroma BVH class chroma/bvh/bvh.py

A bounding volume hierarchy for a triangle mesh.

For the purposes of Chroma, a BVH is a tree with the following properties:

  • Each node consists of an axis-aligned bounding box, a child ID number, and a boolean flag indicating whether the node is a leaf. The bounding box is represented as a lower and upper bound for each Cartesian axis.

chroma/cuda/geometry_types.h

46 struct Node
47 {
48     float3 lower;
49     float3 upper;
50     unsigned int child;
51     unsigned int nchild;
52 };
  • All nodes are stored in a 1D array with the root node first.
  • A node with a bounding box that has no surface area (upper and lower bounds equal for all axes) is a dummy node that should be ignored. Dummy nodes are used to pad the tree to satisfy the fixed degree requirement described below, and have no children.
  • If the node is a leaf, then the child ID number refers to the ID number of the triangle this node contains.
  • If the node is not a leaf (an “inner” node), then the child ID number indicates the offset in the node array of the first child. The other children of this node will be stored immediately after the first child.
  • All inner nodes have the same number of children, called the “degree” (technically the “out-degree”) of the tree. This avoid the requirement to save the degree with the node.
  • For simplicity, we also require nodes at the same depth in the tree to be contiguous, and the layers to be in order of increasing depth.
  • All nodes satisfy the bounding volume hierarchy constraint: their bounding boxes contain the bounding boxes of all their children.

For space reasons, the BVH bounds are internally represented using 16-bit unsigned fixed point coordinates. Normally, we would want to hide that from you, but we would like to avoid rounding issues and high memory usage caused by converting back and forth between floating point and fixed point representations. For similar reasons, the node array is stored in a packed record format that can be directly mapped to the GPU. In general, you will not need to manipulate the contents of the BVH node array directly.

chroma/cuda/mesh.h

Stack based recursive tree walk:

36 /* Finds the intersection between a ray and `geometry`. If the ray does
37    intersect the mesh and the index of the intersected triangle is not equal
38    to `last_hit_triangle`, set `min_distance` to the distance from `origin` to
39    the intersection and return the index of the triangle which the ray
40    intersected, else return -1. */
41 __device__ int
42 intersect_mesh(const float3 &origin, const float3& direction, Geometry *g,
43            float &min_distance, int last_hit_triangle = -1)
44 {
45     int triangle_index = -1;
46