Yohan Poirier-Ginter, Jean-François Lalonde, George Drettakis
Website | Paper | Video | NERPHYS | Pretrained Models
GRay is a fast ray tracer for 3D Gaussians that can be used as a ray-tracing-based alternative to 3DGS, much like 3DGRT. By leveraging dense initialization and other techniques including methods developped in our previous project, GRay optimizes nearly 10× faster than 3DGRT on an RTX 4090.
Using the uv package manager (installable with curl -LsSf https://astral.sh/uv/install.sh | sh), run
git submodule update --init --recursive # pull submodules
bash install.sh # create environment & install dependencies
source .venv/bin/activate # activate environment
bash make.sh # compile the cuda raytracer into `build/`This codebase requires a graphics card supporting OptiX 8 and a local CUDA 12 toolkit installation exposing nvcc.
For Windows please refer to windows/WINDOWS_README.md.
The pretrained models are available online and can be downloaded in batch with bash scripts/download_all_pretrained_scenes.sh. You can open them in the interactive viewer with
MODEL_DIR=output/pretrained/bicycle
python view.py -m $MODEL_DIRThis section explains how to easily reproduce the results from our paper.
First, run
bash scripts/full_dataset_preparation.shto download, resize, and preprocess all 13 scenes used for evaluation and place them in data/.
Then run
bash scripts/run_all_scenes.sh output/to train and evaluate all scenes and put them into output/.
You can then run
python collect_results.py output/to collect all metrics in a table.
Here are the expected results from the latest version of the code:
| PSNR | SSIM | LPIPS | Time | FPS |
|---|---|---|---|---|
| 26.45 | 0.818 | 0.238 | 05:30 | 250 |
This section explains how to run scenes step-by-step. You can skip it if you followed the automated reproduction steps above.
You can download the MipNerf360, Tanks and Temples, and Deep Blending scenes used for benchmarking with
bash scripts/download_all_scenes.shThis will place them in data/; for example, data/360_v2/bicycle will contain the files for the MipNeRF 360 bicycle scene.
You can also use any COLMAP scene or create your own with the provided convert.py utility. Its usage is explained in the 3DGS repository.
This codebase expects your images to already be sized to the correct resolution in .png. Resizing can be done with the preprocessing script
SCENE_DIR=data/360_v2/bicycle
python resize.py -s $SCENE_DIR -ywhich will downsize your images by factors of 2, 4, and 8 while also limiting their size to max 1600 pixels like 3DGS does. You can resize all benchmarking scenes with
bash scripts/resize_all_scenes.shwhich will produce the subdirectories images_1 (original size clamped to max 1600 pixels), images_2 (half resolution), etc.
This project uses dense initialization for its initial point cloud. You can create these point clouds for any scene with:
SCENE_DIR=data/360_v2/bicycle
INDOORS_OR_OUTDOORS=indoors
python third_party/edgs.py -s $SCENE_DIR --roma_model $INDOORS_OR_OUTDOORSHere you must select which type of scene you are dealing with (indoors or outdoors) to choose the correct RoMA network used for dense matching. The point cloud will be saved to $SCENE_DIR/point_cloud.safetensors.
You can create point clouds for all benchmarking scenes with
bash scripts/create_all_dense_point_clouds.shThe configuration is mostly unchanged from 3DGS, with some minor differences. Run a full training and evaluation pass with:
SCENE_DIR=data/360_v2/bicycle
DOWNSAMPLING_LEVEL=4
OUTPUT_DIR=out/bicycle
python train.py -m $OUTPUT_DIR -s $SCENE_DIR -r $DOWNSAMPLING_LEVEL
python render.py -m $OUTPUT_DIR
python metrics.py -m $OUTPUT_DIR
python measure_fps.py -m $OUTPUT_DIRThe run.sh utility chains all steps and takes the output directory as its first argument, e.g.
bash run.sh $OUTPUT_DIR -s $SCENE_DIR -r $DOWNSAMPLING_LEVELThe viewer can also be enabled during training with the --viewer flag.
This section clarifies technical details and additional features.
You can use your own COLMAP files under the standard layout expected by 3DGS.
We also provide a script for running COLMAP from pycolmap-cuda12 (already installed). First, place your images under data/$SCENE/input and then run
python run_colmap.py -s data/$SCENE
GPU support is only for Linux; on Windows you can either install the CPU-only version pycolmap, or use the script from the 3DGS codebase.
Once COLMAP has run successfully, you will need to resize the images and run dense initialization as explained earlier.
We use per-pixel linked lists to store intersected Gaussians and data for the backward pass. You can control their size with the flags --ppll_forward_size and --ppll_backward_size. You might need to increase the defaults for your own scenes, or you might be able to reduce them. Running the standard scenes with the current settings requires 24GB of VRAM.
While most code is CUDA-side, including the loss computation and optimizer step, nearly all memory is allocated in tensors and exposed to PyTorch via pybind. As such, most configuration can be adjusted via the command line without recompiling, and many intermediate results can be inspected in Python for debugging.
The main ray tracer's CUDA module exposes objects that group relevant data tensors. For instance, the camera can be inspected with
camera = raytracer.cuda_module.get_camera()and data is provided to the CUDA module by modifying its values in-place.
Note that the backward pass relies on the data from the forward pass staying unmodified (camera, framebuffer, etc.).
Preset configurations are available: adding the flag -c configs/lq.json selects a lower level of quality, and the flag -c configs/hq.json selects a high level of quality. The default quality level is mq (medium quality). The hyperparameters used are detailed in the paper.
The gaussians produced by this method are incompatible with 3DGS; in theory, the differences (different kernel, different sorting, and perspective accuracy) could be resolved by modifying both methods (refer to the paper for a short discussion on page 14), but this has not been done in practice. Rendering differences with 3DGRT are minute (hybrid transaprency).
Implementation-wise, the file format was changed to .safetensors which is simpler and faster. Scenes can be converted to and from the INRIA 3DGS and NVIDIA 3DGRT formats with the scripts in convert/ (to_3dgs.py/from_3dgs.py and to_3dgrt.py/from_3dgrt.py). The parameter conversion is lossless (a GRay → 3DGS → GRay round-trip reproduces identical metrics) but 3DGS-based viewers will produce blurrier images with some differences. 3DGRT renders should look identical. GRay scenes converted into these formats also render faster than each method's own models:
| Renderer | FPS (their scenes) | FPS (our scenes) | Speedup |
|---|---|---|---|
| 3DGS | 253 | 432 | 1.7× |
| 3DGRT | 68 | 190 | 2.8× |
Metric computation was moved to the PIQ library since the LPIPS metric was incorrect in the original 3DGS codebase. PSNRs and SSIM scores were verified to match.
This codebase also features MLP support, although we did not use MLPs in the paper.
Two types of MLPs are supported:
- Pre-processing MLPs (
pre_mlp) which transform features into per-gaussian channels, before they are rendered into pixels. - Post-processing MLPs (
post_mlp) which transform per-pixel channels into a final color.
If you wish to use tinycudann, you can optionally install it with uv sync --extra tcnn and enable it with --tcnn.
The viewer can be used remotely, in which case a server renders the images and delivers them to the client via Websocket. Launch the server with
python view.py --server -m $MODEL_DIR
On the client, you can install the minimal required dependencies with uv venv && source .venv/*/activate && uv pip install -r viewer/requirements.txt and then run
python view.py --client $SERVER_IP
The client does not require a GPU and all platforms are supported (Linux/Windows/Max).
Besides the Python viewer, which is designed for ease of development, a faster C++ viewer is also provided. It can be included during the build with
bash make.sh -DGRAY_BUILD_FAST_VIEWER=ON
and launched with
MODEL_DIR=output/pretrained/bicycle
build/fast_viewer -m $MODEL_DIRThe fast viewer does not include a UI and provides only minimal camera keyboard and mouse controls. Its performance can also be measured using the --benchmark and --benchmark-test-cameras flags.
You can render depth maps with --render_depth.
We fixed a minor bug in how the bin size was computed for initialization binning. As such, the default value for init_bin_size differs from the value reported in the paper and quantitative results may differ by negligible amounts (< 0.1 dB).
Please report any problems you encounter with installation in the GitHub issues.
If your scene is very large, you might get better results by disabling initialization binning with --no_init_binning.
This code was designed for scenes with around 200-300 images and pinhole cameras; we are working on support for larger scenes. Alternative camera models are not currently provided but should be straightforward to implement.
You will likely encounter floaters which are a known limitation of dense initialization.
The original code in this repository is licensed under the MIT License.
Some files are derived from third-party sources and remain under their original licenses. Those files include license notices in their headers.
This includes, but is not limited to:
- The GraphDeco viewer which is under Apache 2.0.
- The dense initialization script
third_party/edgs.pyunder the copyright license of its original authors.
@article{poirierginter2026gray,
author = {Poirier-Ginter, Yohan and Lalonde, Jean-Fran\c{c}ois and Drettakis, George},
title = {GRay: Ray Tracing 3D Gaussians Near the Speed of Splats},
year = {2026},
issue_date = {May 2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {9},
number = {1},
url = {https://doi.org/10.1145/3804496},
doi = {10.1145/3804496},
journal = {Proc. ACM Comput. Graph. Interact. Tech.},
month = may,
articleno = {14},
numpages = {19}
}
Thanks to Jeffrey Hu for helping with the code and pointing us towards dense initialization.
Thanks to Ishaan Shah for the Gaussian Viewer.
Thanks to Simon Lucas for help on the Windows configuration.
This research was co-funded by the European Union (EU) ERC Advanced Grant NERPHYS No 101141721. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the EU or the European Research Council. Neither the EU nor the granting authority can be held responsible for them. Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations. This research was also supported by NSERC grant RGPIN-2020-04799 and the Digital Research Alliance Canada. The authors are grateful to Adobe and NVIDIA for generous donations.