RayD is a minimalist differentiable ray tracing package wrapping OptiX ray tracing with Dr.Jit autodiff.
pip install raydRayD is not a full renderer. It is a thin wrapper around Dr.Jit and OptiX for building your own renderers and simulators.
The goal is simple: expose differentiable ray-mesh intersection on the GPU without bringing in a full graphics framework.
RayD provides three frontends:
- Dr.Jit (Native) — direct Dr.Jit array API, maximum control
- PyTorch —
rayd.torchmodule, CUDAtorch.Tensorin/out, integrates withtorch.autograd - Slang — C++ POD/handle bridge for Slang
cpptarget interop
RayD is for users who want OptiX acceleration and autodiff, but do not want a full renderer.
Why not Mitsuba? Mitsuba is excellent for graphics rendering, but often too high-level for RF, acoustics, sonar, or custom wave simulation. In those settings, direct access to ray-scene queries and geometry gradients is usually more useful than a full material-light-integrator stack.
RayD keeps only the geometric core:
- differentiable ray-mesh intersection
- scene-level GPU acceleration through OptiX
- edge acceleration structures for nearest-edge queries
- primary-edge sampling support for edge-based gradient terms
For intersection workloads, RayD targets Mitsuba-level performance and matching results with a much smaller API surface.
Mesh: triangle geometry, transforms, UVs, and edge topologyScene: a container of meshes plus OptiX accelerationscene.intersect(ray): differentiable ray-mesh intersectionscene.shadow_test(ray): occlusion testingscene.nearest_edge(query): nearest-edge queries for points and rays, returningshape_id, mesh-localedge_id, and scene-globalglobal_edge_idscene.set_edge_mask(mask)/scene.edge_mask(): scene-global filtering for the secondary-edge BVH used bynearest_edge(...)- edge acceleration data that is useful for edge sampling and edge diffraction methods
The chart below was generated on March 25, 2026 on an NVIDIA GeForce RTX 5080 and AMD Ryzen 7 9800X3D, comparing RayD (0.1.2) against Mitsuba 3.8.0 with the cuda_ad_rgb variant.
Raw benchmark data is stored in docs/performance_benchmark.json.
- RayD is consistently faster on static forward and static gradient workloads across all three scene sizes.
- Dynamic reduced forward reaches parity or better from the medium scene onward, and dynamic full is effectively tied on the largest case.
- On the largest
192x192mesh /384x384ray benchmark, RayD vs Mitsuba average latency in milliseconds is: static full0.162 vs 0.190, static reduced0.124 vs 0.224, dynamic full0.741 vs 0.740, dynamic reduced0.689 vs 0.714, gradient static0.411 vs 0.757, gradient dynamic1.324 vs 1.413. - Correctness stayed aligned throughout the sweep: forward mismatch counts remained
0, and the largest static gradient discrepancy was9.54e-7.
If you only want to see the package in action, start here:
examples/basics/ray_mesh_intersection.py: custom rays against a meshexamples/basics/nearest_edge_query.py: nearest-edge queriesexamples/basics/camera_edge_sampling_gradient.py: camera-driven edge-sampling gradientsdocs/slang_interop.md: Slangcpptarget interop for host-side RayD scene queries
GPU path tracing + interior AD + edge sampling (Li et al.) in ~180 lines (examples/renderer/cornell_box.py):
(Mini-Differentiable-RF-Digital-Twin):
The example below traces a single ray against one triangle and backpropagates the hit distance to the vertex positions.
import rayd as rd
import drjit as dr
mesh = rd.Mesh(
dr.cuda.Array3f([0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 0.0, 0.0]),
dr.cuda.Array3i([0], [1], [2]),
)
verts = dr.cuda.ad.Array3f(
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 0.0, 0.0],
)
dr.enable_grad(verts)
mesh.vertex_positions = verts
scene = rd.Scene()
scene.add_mesh(mesh)
scene.build()
ray = rd.Ray(
dr.cuda.ad.Array3f([0.25], [0.25], [-1.0]),
dr.cuda.ad.Array3f([0.0], [0.0], [1.0]),
)
its = scene.intersect(ray)
loss = dr.sum(its.t)
dr.backward(loss)
print("t =", its.t)
print("grad z =", dr.grad(verts)[2])This is the core RayD workflow. Replace the single ray with your own batched rays, RF paths, acoustic paths, or edge-based objectives.
rayd.torch is an optional Python-level wrapper that mirrors the native API using CUDA torch.Tensor inputs and outputs. AD mode is inferred automatically from requires_grad.
import rayd.torch as rt
verts = torch.tensor([...], device="cuda", requires_grad=True)
mesh = rt.Mesh(verts, faces)
scene = rt.Scene()
scene.add_mesh(mesh)
scene.build()
its = scene.intersect(rt.Ray(origins, directions))
loss = (its.t - target).pow(2).mean()
loss.backward() # gradients flow to vertsKey conventions:
- vectors use shape
(N, 3)or(N, 2);(3,)and(2,)are accepted as batch size1 - index tensors use shape
(F, 3); images use shape(H, W); transforms use shape(4, 4) - CPU tensors are rejected;
rayd.torchdoes not do implicit device transfers
The native Dr.Jit API remains unchanged and does not depend on PyTorch.
RayD follows Dr.Jit's current-thread CUDA device selection. If you need to choose a GPU explicitly, do it before constructing any RayD resources:
import rayd as rd
rd.set_device(0) # also initializes OptiX on that device by defaultrd.set_device() / rayd.torch.set_device() are intended for selecting the
device up front. Existing RayD scenes, OptiX pipelines, and BVHs should not be
reused across device switches in the same process.
RayD ships a Slang interop layer for Slang's cpp target. Slang code can import rayd_slang; and call RayD scene queries directly.
NearestPointEdge and NearestRayEdge returned through Slang include global_edge_id in the same scene-global index space as scene.edge_info().global_edge_id.
The Slang bridge also exposes scene edge-mask helpers for host code: sceneEdgeCount(scene), sceneEdgeMaskValue(scene, index), and sceneSetEdgeMask(scene, maskPtr, count).
import rayd_slang;
export float traceRayT(uint64_t sceneHandle,
float ox, float oy, float oz,
float dx, float dy, float dz)
{
SceneHandle scene = makeSceneHandle(sceneHandle);
Ray ray = makeRay(float3(ox, oy, oz), float3(dx, dy, dz));
Intersection hit = sceneIntersect(scene, ray);
return itsT(hit); // use accessor, not hit.t
}Load and call from Python:
import rayd as rd
import rayd.slang as rs
m = rs.load_module("my_shader.slang") # use rayd.slang.load_module, not slangtorch.loadModule
scene = rd.Scene()
scene.add_mesh(mesh)
scene.build()
t = m.traceRayT(scene.slang_handle, 0.25, 0.25, -1.0, 0.0, 0.0, 1.0)sceneIntersectAD returns an IntersectionAD with analytic gradients dt_do (∂t/∂origin) and dt_dd (∂t/∂direction):
import rayd_slang;
export IntersectionAD traceAD(uint64_t sceneHandle,
float ox, float oy, float oz,
float dx, float dy, float dz)
{
SceneHandle scene = makeSceneHandle(sceneHandle);
Ray ray = makeRay(float3(ox, oy, oz), float3(dx, dy, dz));
return sceneIntersectAD(scene, ray);
}Use it from Python with torch.autograd:
import torch
import rayd as rd
import rayd.slang as rs
m = rs.load_module("my_shader.slang")
scene = rd.Scene()
scene.add_mesh(mesh)
scene.build()
class DiffTrace(torch.autograd.Function):
@staticmethod
def forward(ctx, oz):
ctx.save_for_backward(oz)
hit = m.traceAD(scene.slang_handle, 0.25, 0.25, oz.item(), 0, 0, 1)
return torch.tensor(hit.t, device=oz.device)
@staticmethod
def backward(ctx, g):
oz, = ctx.saved_tensors
hit = m.traceAD(scene.slang_handle, 0.25, 0.25, oz.item(), 0, 0, 1)
return torch.tensor(hit.dt_do.z * g.item(), device=oz.device)
oz = torch.tensor(-1.0, device="cuda", requires_grad=True)
t = DiffTrace.apply(oz)
t.backward()
print(f"t={t.item()}, dt/doz={oz.grad.item()}") # t=1.0, dt/doz=-1.0load_module() runs slangc -target cpp, auto-generates pybind11 bindings, and links against rayd_core. See docs/slang_interop.md for the full compilation pipeline, API reference, and known workarounds.
RayD also provides a scene-level edge acceleration structure.
This is useful for:
- edge sampling
- nearest-edge queries
- visibility-boundary terms
- geometric edge diffraction models
Scene.set_edge_mask(mask) filters this secondary-edge BVH in scene-global edge index space. It does not modify scene.edge_info(), scene.edge_topology(), scene.mesh_edge_offsets(), or primary-edge camera sampling.
In other words, RayD is not limited to triangle hits. It also gives you direct access to edge-level geometry queries, which are important in many non-graphics simulators.
RayD is a Python package with a C++/CUDA extension.
You need Python >=3.10, CUDA Toolkit >=11.0, CMake, a C++17 compiler, drjit>=1.2.0, nanobind==2.9.2, and scikit-build-core.
On Windows, use Visual Studio 2022 with Desktop C++ tools. On Linux, use GCC or Clang with C++17 support.
conda create -n myenv python=3.10 -y
conda activate myenv
python -m pip install -U pip setuptools wheel
python -m pip install cmake scikit-build-core nanobind==2.9.2
python -m pip install "drjit>=1.2.0"conda activate myenv
python -m pip install .RayD depends on:
- Python
3.10+ - Dr.Jit
1.2.0+ - OptiX
8+
RayD does not include:
- BSDFs
- emitters
- integrators
- scene loaders
- image I/O
- path tracing infrastructure
That is by design.
rayd/: Python package (flat layout)rayd/torch/: PyTorch frontendrayd/slang/: Slang / slangtorch interop utilitiesinclude/rayd/: public C++ headerssrc/: C++ and CUDA implementationsrc/rayd.cpp: Python bindingsinclude/rayd/slang/interop.h: C++ POD/handle bridge for Slanginclude/rayd/slang/rayd.slang: Slang declarations for the C++ interop layerexamples/: basic, renderer, and Slang examplestests/drjit/: Dr.Jit native geometry teststests/torch/: PyTorch frontend teststests/slang/: Slang interop and gradient testsdocs/api_reference.md: Python API referencedocs/slang_interop.md: Slang interop notes and examples
python -m unittest tests.drjit.test_geometry -vOptional PyTorch wrapper tests:
python -m unittest tests.torch.test_geometry -vOptional Slang interop and gradient tests (requires slangtorch):
python -m unittest tests.slang.test_slang -vRayD is developed with reference to:
@inproceedings{chen2026rfdt,
title = {Physically Accurate Differentiable Inverse Rendering
for Radio Frequency Digital Twin},
author = {Chen, Xingyu and Zhang, Xinyu and Zheng, Kai and
Fang, Xinmin and Li, Tzu-Mao and Lu, Chris Xiaoxuan
and Li, Zhengxiong},
booktitle = {Proceedings of the 32nd Annual International Conference
on Mobile Computing and Networking (MobiCom)},
year = {2026},
doi = {10.1145/3795866.3796686},
publisher = {ACM},
address = {Austin, TX, USA},
}BSD 3-Clause. See LICENSE.


