Marching Cubes on the GPU

04/202/2021

I recently implemented marching cubes (MC) inside my Caustic renderer using compute shaders.

I started out with a pure CPU version, which is rather trivial to implement. See CMeshConstructor::MeshFromDensityFunction(). I then converted this CPU implementation into two different GPU implementations. The C++ side of these GPU versions are handled in CSceneMarchingCubesElem.cpp.

GPU Version #1
The first version is found in MCSinglePass.cs. The name of this file is a bit of a misnomer since my implementation actually calls this compute shader twice. The first time is only to retrieve the total number of vertices generated so that we can allocate a vertex buffer of the correct size. The second pass actually emits the vertices once we have allocated the vertex buffer. This version only outputs a vertex buffer (i.e. no index buffer), which means there is no vertex sharing.

GPU Version #2
This version enables vertex sharing by outputting both a vertex buffer and an index buffer. It is implemented as 3 separate compute shaders that are run as a pipeline with the following steps: MCCountVerts => MCAllocVerts => MCEmitVerts

MCCountVerts.cs
This pass does two things. First it counts the total number of vertices that will be emitted by the final stage. Secondly, for each voxel it flags which vertices are referenced. Each voxel contains 8 values from the signed distance function (SDF) provided by the client. These SDF values indicate whether the voxel is considered to be inside or outside the surface. Since the SDF values are shared with neighboring voxels, each voxel only stores the SDF value for the 0th vertex (see Figure 1)

The other 7 values are read from neighboring voxels. From these 8 SDF values we have potentially 12 different vertices that intersect the surface. These vertices are located where the SDF value is 0. See Figure 2.

Again, since these vertices are shared with neighboring voxels, each voxel only emits vertices 0, 3, and 8 (assuming a polygon references them).
Updating of the total vertex count and setting a vertex's reference flag must use InterlockedAdd() and InterlockedOr() respectively. The reason is that multiple GPU threads may be attempting to update these values at the same time. The Interlocked...() calls ensure that the operation is atomic. The second key attribute that must be set is the globallycoherent modifier must be set on the variables we are calling Interlocked...() on. The reason for this is to ensure that writes to the variable are flushed across all GPU thread groups.
MCAllocVerts.cs
After the first stage runs we are able to allocate a vertex and index buffer of the correct size. This stage determines the index where each referenced vertex will reside in the vertex buffer.
MCEmitVerts.cs
This stage writes the actual data to the vertex and index buffers. Since we want to pass these output buffers as ID3D11Buffer to Draw() we can't declare them as RWStructuredBuffer. Thus we need to declare them as RWByteAddressBuffer.

Here is the final wireframe output

using the SDF function:
auto spherePDF = [](Vector3& v)->float
  {
      static Vector3 center(0.5f, 0.5f, 0.5f);
      return (v - center).Length() - 0.5f;
  };

Marching Cubes on the GPU

Latest Articles

Login Form