All Projects

CUDA Path Tracer

Sept – Oct 2025
C++CUDAOpenGL
GitHub
CUDA Path Tracer: toys scene

I built a GPU path tracer in CUDA that runs global illumination and physically accurate material responses entirely on the GPU. A BVH acceleration structure cuts frame time by 131x on dense meshes. The sections below cover the materials, lighting, and performance techniques in detail.

Physically-Based Materials

The most interesting challenge with metallic-roughness surfaces is that naive microfacet models lose energy at high roughness, making rough metals go unnaturally dark. I added a multiple-scattering energy compensation step to fix this; the underlying BRDF uses Trowbridge-Reitz GGX with Smith masking-shadowing and Schlick Fresnel. Lambertian handles matte surfaces with cosine-weighted hemisphere sampling. Dielectrics cover glass, water, and clear plastics by splitting energy between reflection and refraction via Fresnel with a configurable IOR.

PBR material grid

Metallic (top→bottom) × roughness (left→right); rightmost column: dielectrics with varying IOR

PBR Textures

The path tracer loads glTF scenes and reads per-pixel material data from texture maps rather than using a single value for the whole mesh. A normal map stores surface normals encoded in tangent space, so a flat polygon can respond to light as if it had fine geometric detail. Metallic and roughness maps let the same mesh have polished chrome in one area and matte rubber in another. At shading time, the BRDF samples these textures at the ray hit point and uses the result directly.

Albedo mapAlbedo
Normal mapNormal
Metallic mapMetallic
Roughness mapRoughness
Image-Based Lighting

Placing lights by hand produces flat, unconvincing illumination. Instead, environment maps supply radiance at each ray exit direction. An equirectangular HDRI gives realistic ambient lighting and a natural background with no manual light rigging. Normal, metallic, and roughness textures load from glTF and are sampled during shading.

Fireplace HDRIFireplace HDRI
Studio HDRIStudio HDRI
Depth of Field

I modeled the camera as a thin lens to produce physically accurate blur. Each ray samples a random point on a lens disk and aims at a user-set focus distance. Open the aperture for shallow focus; stop down to keep the full frame sharp.

DOF enabledDOF enabled
DOF disabledDOF disabled
OptiX AI Denoising

At low sample counts (spp, meaning how many light paths are traced per pixel), Monte Carlo renders come out noisy and speckled. I integrated NVIDIA's OptiX AI denoiser to clean those frames. It uses the albedo and surface normals as guide buffers, which lets it preserve color boundaries and surface edges while smoothing out speckle. It's strongest on diffuse surfaces. High-frequency roughness detail can get smudged at small resolutions; 2K or 4K helps.

No denoisingWithout denoising
DenoisedOptiX denoised
BVH Acceleration

On a mesh with ~50k triangles, I measured a 131x speedup (roughly 99.3% reduction in frame time) by building a Bounding Volume Hierarchy (BVH), a tree of nested bounding boxes that cuts ray-triangle intersection cost from O(n) to O(log n). I built the tree on the CPU using the Surface Area Heuristic, which greedily splits each node to minimize expected intersection cost, then uploaded it to the GPU for traversal.

log scale · lower is better · BVH On is 131× faster

Stream Compaction

After each bounce, some paths have escaped to the environment or been killed by Russian roulette (a probabilistic rule that randomly terminates dim paths to save computation). Leaving dead paths in the work queue wastes GPU threads. Compacting the path buffer removes them, pushing all live paths to the front so subsequent kernel launches only process work that matters. The speedup is largest in open scenes where paths terminate early; in closed scenes the compaction overhead can outweigh the savings.

lower is better

Open scene (ms)

lower is better

Closed scene (ms)