Nicolas Violante1 George Kopanas2 Linus Franke1 Julien Philip3 George Drettakis1
1Inria, Université Côte d'Azur 2Google DeepMind 3Eyeline Labs
3D Gaussian Splatting (3DGS) is widely used to capture and render real scenes. Compositing objects from one capture into another has applications in many domains, such as VFX, architecture and interior design, or marketing. However, extracting an object from a source scene and naively pasting it into a target scene will fail to produce realistic results due to the different lighting conditions between the two scenes. To address this problem, we introduce a diffusion model that harmonizes naively composited images with inconsistent lighting. The model is trained with a heterogeneous dataset of image pairs (inconsistent composite input, consistent output), combining synthetic, generated and real data. Our complete 3D solution allows a user to extract an object from the source scene and composite it into the target scene. From this, the (inconsistent) views of the target scene with the composite object are rendered. Our diffusion model harmonizes each one of these views, which are finally consolidated in a 3DGS representation with a post-optimization step. Our method provides visually compelling results, making object transfer between 3DGS easy to use and significantly improving quality compared to previous methods.
@article{violante2026dot3d,
author = {Violante, Nicolás and Kopanas, George and Franke, Linus and Philip, Julien and Drettakis, George},
title = {Lighting-Consistent Object Transfer Across Radiance Fields},
journal = {Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering)},
year = {2026},
volume = 45,
number = 4
}- June 18, 2026 — Paper and code released.
- Installation
- Weights and Data
- Quick Start
- Gradio Demo
- Training
- 3D Workflow
- Evaluation
- Acknowledgments
git clone https://github.com/nviolante25/dot3d.git
cd dot3dCreate a conda environment and activate it:
conda create -n dot3d python=3.10 -y
conda activate dot3dRun the install script, setting CUDA_VERSION to match your system (cu118, cu121, cu124):
export CUDA_VERSION=cu121 # change this to match your system
bash install.shThis will:
- Install PyTorch with the correct CUDA wheels
- Install the DOT3D package and all dependencies (
pip install -e .) - Build the 3DGS CUDA extensions (
diff-gaussian-rasterization,simple-knn,fused-ssim) with--no-build-isolation - Install the graphdecoviewer
- Install SAM2 (required for the Compose & Harmonize tab in the Gradio demo)
| Harmonization Model | Description | Download |
|---|---|---|
| Image-Mask | Inputs: composite image + binary mask | image-mask |
| Image-Background | Inputs: composite image + background | image-background |
huggingface-cli download nviolante/dot3d --include "image-mask/*" --local-dir checkpoints
huggingface-cli download nviolante/dot3d --include "image-background/*" --local-dir checkpoints| Dataset | Description | Download |
|---|---|---|
| 2D Data | Training & evaluation image pairs | data_2d.tar |
| 3D Scenes | Background + object captures | data_3d |
2D data:
huggingface-cli download nviolante/dot3d --repo-type dataset --include "data_2d.tar" --local-dir .
tar -xf data_2d.tar3D scenes:
huggingface-cli download nviolante/dot3d --repo-type dataset --include "data_3d/*" --local-dir .from PIL import Image
from huggingface_hub import snapshot_download
from wrappers import DOT3DHarmonizationWrapper
# Download weights (cached under checkpoints/ after first run)
snapshot_download(repo_id="nviolante/dot3d", allow_patterns="image-mask/*", local_dir="checkpoints")
wrapper = DOT3DHarmonizationWrapper("checkpoints/image-mask")
# Image-Mask: composite image + binary mask of the inserted object
image = Image.open("composite.png").convert("RGB")
mask = Image.open("mask.png").convert("RGB")
result = wrapper.predict_image(image, mask, num_inference_steps=4)
result["prediction"].save("harmonized.png")For the Image-Background variant, pass the original background instead of a mask:
snapshot_download(repo_id="nviolante/dot3d", allow_patterns="image-background/*", local_dir="checkpoints")
wrapper = DOT3DHarmonizationWrapper("checkpoints/image-background")
background = Image.open("background.png").convert("RGB")
result = wrapper.predict_image(image, background, num_inference_steps=4)
result["prediction"].save("harmonized.png")Launch the interactive demo to run the Harmonization model on your own images:
python demo/gradio_demo.pyThe demo offers three tabs:
- Image-Mask — takes a composite image (object pasted into a scene) and a binary mask of the inserted object. The Harmonization model corrects the lighting of the object to match the surrounding scene.
- Image-Background — takes the same composite image paired with the original background before insertion, without requiring a mask. The model infers the object region from the difference between the two images.
- Compose & Harmonize — an end-to-end tab that lets you segment an object from a foreground image using SAM2, place it on a background with position and scale sliders, and then harmonize the composite in one click. SAM2 is loaded on first use (~900 MB download).
All training config and defaults live in train/config_flux.py.
Download the 2D data from Hugging Face (see Weights and Data), which will produce the following structure at the repo root:
data_2d/
├── blender/
│ ├── train/
│ └── test/
├── flux/
│ ├── train/
│ └── test/
└── orida/
├── train/ # download from https://hello-jinwoo.github.io/orida/
├── validation/ # download from https://hello-jinwoo.github.io/orida/
├── train_relit/ # provided in our dataset release (see Weights and Data)
└── validation_relit/ # provided in our dataset release (see Weights and Data)
Trained on 4× H100 (96 GB VRAM).
To train the Image-Mask Harmonization model:
accelerate launch --config_file train/multi_gpu.yaml train/train_flux_schnell.py --output_dir=<path/to/output>To train the Image-Background variant, override conditional_keys:
accelerate launch --config_file train/multi_gpu.yaml train/train_flux_schnell.py \
--output_dir=<path/to/output> \
--conditional_keys bg rgbAny field in train/config_flux.py can be overridden from the command line, e.g.:
accelerate launch --config_file train/multi_gpu.yaml train/train_flux_schnell.py \
--output_dir="$OUTPUT_DIR" \
--learning_rate=1e-5 \
--train_batch_size=2All 3DGS scripts live in gaussian_splatting/. Run all commands below from that directory:
cd gaussian_splattingThe full pipeline goes through four stages.
Each scene must be a COLMAP capture. The background and object scenes share the same structure, except that the object scene also requires a masks/ folder with a binary mask per frame:
<background_scene>/ <object_scene>/
├── images/ ├── images/
└── sparse/ ├── masks/ # Binary PNG masks, one per frame
└── 0/ └── sparse/
├── cameras.bin └── 0/
├── images.bin ├── cameras.bin
└── points3D.bin ├── images.bin
└── points3D.bin
Mask filenames must match the corresponding image filename with a .png extension (e.g. frame_001.jpg → masks/frame_001.png).
Train a 3DGS model for the background scene and separately for the object:
python train.py \
--optimizer_type sparse_adam \
--checkpoint_iterations 30000 \
-s <path/to/background_data> \
-m <path/to/background_output>
python train.py \
--optimizer_type sparse_adam \
--checkpoint_iterations 30000 \
-s <path/to/object_data> \
-m <path/to/object_output>Fit per-Gaussian segmentation features on top of the object reconstruction:
python train_features.py \
-s <path/to/object_data> \
-m <path/to/object_segmentation_output> \
--start_checkpoint <path/to/object_output>/chkpnt30000.pth \
--iterations 40000Launch the interactive viewer to position the object and save the insertion checkpoint (chkpnt_insertion.pth):
python insert_object_in_background.py \
<path/to/object_output>/chkpnt30000.pth \
<path/to/background_output>/chkpnt30000.pth \
<path/to/background_data>Use the viewer controls to translate, rotate, and scale the object, then click Save to write chkpnt_insertion.pth to the background data directory.
Run on 1× H100 (96 GB VRAM).
Run train_update.py to apply the Harmonization model to each view and consolidate them via 3DGS post-optimization, then render:
python train_update.py \
-s <path/to/background_data> \
-m <path/to/final_output> \
--optimizer_type sparse_adam \
--start_checkpoint <path/to/background_data>/chkpnt_insertion.pth \
--iterations 10000
python render.py -m <path/to/final_output>The output directory has the following structure:
<final_output>/
├── dataset_update/ # Harmonized views used during optimization
│ └── iter_000/
│ └── <frame>.png
├── point_cloud/
│ └── iteration_10000/
│ └── point_cloud.ply # Final 3DGS model
├── train/ # Rendered train views (always present)
│ └── ours_10000/
│ ├── renders/
│ └── gt/ # Composite input frames (before harmonization)
└── test/ # Rendered test views (if a test split exists)
└── ours_10000/
├── renders/
└── gt/
Pre-computed results: results_3d
All evaluation scripts live in metrics/.
Runs inference on the test split of one or more datasets and computes PSNR, SSIM, LPIPS, FID, and KID.
python metrics/eval_2d.py \
--ckpt_dir <path/to/checkpoint> \
--datasets blender flux orida \
--split test \
--output_dir eval_outputResults are saved to eval_output/summary.json.
Computes metrics on already-rendered 3DGS outputs against ground-truth synthetic scenes.
python metrics/eval_3d.py \
--gt_dir <path/to/gt_synth> \
--pred_dir <path/to/method/outputs> \
--output_dir metrics_output \
--experiment_name <run_name>Results are saved to metrics_output/<run_name>/metrics.json.
The script expects each scene's renders at <pred_dir>/<scene>/train/ours_10000/renders/.
Evaluates one rotation variant (0, 1, or 2) across the three rotation scenes (bathtub_57e, sink, white_truck).
Download: data_rotation | results_rotation
python metrics/eval_rotation.py \
--gt_dir <path/to/data_rotation> \
--pred_dir <path/to/results_rotation> \
--rotation 2 \
--experiment_name rotation_rot2Results are saved to metrics_output/<experiment_name>/metrics.json.
The script expects renders at <pred_dir>/<scene>/test/ours_10000/renders/.
This work was funded by the European Union, European Research Council (ERC) Advanced Grants NERPHYS (101141721) and EXPLORER (101097259). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations. The authors would also like to thank Adobe and NVIDIA for software and hardware donations.

