TL;DR: A unified framework for generalized unconstrained urban 3D occupancy prediction.
teaser.mp4
OccAny provides pretrained checkpoints, inference scripts, training/evaluation pipelines, data preparation, and visualization tools for urban 3D occupancy under unconstrained camera inputs. This repo includes two model variants:
- OccAny, based on Must3R + SAM2
- OccAny+, based on Depth Anything 3 + SAM3
The repository also includes sample RGB scenes in demo_data/input, pretrained weights in checkpoints/, and viewers for both point-cloud and voxel-grid outputs.
- 01/04/2026: Added depth and reconstruction evaluation (
extract_recon.py+compute_recon_metrics.py) comparing OccAny+ recon 1.1B against plain DA3 on KITTI and nuScenes, see Depth and Point-Cloud Reconstruction. - 29/03/2026: Added a bonus training recipe for the OccAny 1.1B reconstruction model, fine-tuned from DA3 1.1B; see 🏋️ Training and 📦 Checkpoints. Also added ego trajectory evaluation on nuScenes, see 🚗 Ego Trajectory Evaluation.
- 📰 News
- 🔧 Installation
- 📦 Checkpoints
- 🚀 Quick Start
- ⚙️ Key Inference Flags
- 👁️ Visualization
- 📊 Evaluation
- 📊 Additional Evaluation
- 🏋️ Training
- 📄 License
- 🙏 Acknowledgments
- 📝 Citation
The commands below create the environment and keep all required third-party dependencies local to this repository.
git clone https://github.com/valeoai/OccAny.git
cd OccAnyconda create -n occany python=3.12 -y
conda activate occany
python -m pip install --upgrade pip setuptools wheel ninjaconda install -c nvidia cuda-toolkit=12.6
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install xformers==0.0.29.post2pip install -r requirements.txtexport CUDA_HOME=$CONDA_PREFIX
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib:$LD_LIBRARY_PATH
pip install torch-scatter --no-cache-dir --no-build-isolationOccAny relies on the vendored copies bundled in third_party/:
third_party/crocoforcrocothird_party/dust3rfordust3rthird_party/Grounded-SAM-2for Grounded-SAM-2,sam2, andgroundingdinothird_party/sam3for SAM3third_party/Depth-Anything-3for Depth Anything 3
inference.py already prepends these paths automatically at runtime. If you want to import the vendored packages in a shell, notebook, or standalone sanity check, export them explicitly:
export PYTHONPATH="$PWD/third_party:$PWD/third_party/dust3r:$PWD/third_party/croco/models/curope:$PWD/third_party/Grounded-SAM-2:$PWD/third_party/Grounded-SAM-2/grounding_dino:$PWD/third_party/sam3:$PWD/third_party/Depth-Anything-3/src:$PYTHONPATH"export CUDA_HOME=$CONDA_PREFIX
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib:$LD_LIBRARY_PATH
cd third_party/croco/models/curope
python setup.py install
cd ../../../..This builds a curope*.so file next to the sources. The PYTHONPATH export above includes that directory so models.curope can resolve it at runtime.
The vendored third_party/croco/models/curope/setup.py currently targets SM 70, 80, and 90. If your GPU uses a different compute capability, update all_cuda_archs there before rebuilding.
python - <<'PY'
import sys
from pathlib import Path
repo_root = Path.cwd()
for path in reversed([
repo_root / "third_party",
repo_root / "third_party" / "dust3r",
repo_root / "third_party" / "croco" / "models" / "curope",
repo_root / "third_party" / "Grounded-SAM-2",
repo_root / "third_party" / "Grounded-SAM-2" / "grounding_dino",
repo_root / "third_party" / "sam3",
repo_root / "third_party" / "Depth-Anything-3" / "src",
]):
path_str = str(path)
if path.exists() and path_str not in sys.path:
sys.path.insert(0, path_str)
import torch
import sam2
import sam3
import groundingdino
import depth_anything_3
import dust3r.utils.path_to_croco # noqa: F401
from croco.models.pos_embed import RoPE1D
print("torch:", torch.__version__)
print("cuda:", torch.version.cuda)
print("RoPE1D backend:", RoPE1D.__name__)
print("third-party imports: ok")
PYModel checkpoints are hosted on Hugging Face:
Download checkpoints with:
cd OccAny
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='anhquancao/OccAny', repo_type='model', local_dir='.', allow_patterns='checkpoints/*')"Expected files under checkpoints/:
occany_plus_gen.pthoccany_plus_recon.pthoccany.pthoccany_recon.pthoccany_plus_recon_1B.pthgroundingdino_swinb_cogcoor.pthsam2.1_hiera_large.pt
After installation and checkpoint download, you can run the demo commands below from the repo root as-is. By default:
- RGB inputs are read from
./demo_data/input - Outputs are written to
./demo_data/output - The repo already includes sample scenes such as
kitti_08_1390andnuscenes_scene-0039
The following presets reproduce the default demo pipeline for each released model variant.
python inference.py \
--batch_gen_view 2 \
--view_batch_size 2 \
--semantic distill@SAM3 \
--compute_segmentation_masks \
--gen \
-rot 30 \
-vpi 2 \
-fwd 5 \
--seed_translation_distance 2 \
--recon_conf_thres 2.0 \
--gen_conf_thres 6.0 \
--apply_majority_pooling \
--model occany_da3python inference.py \
--batch_gen_view 2 \
--view_batch_size 2 \
--semantic distill@SAM2_large \
--compute_segmentation_masks \
--gen \
-rot 30 \
-vpi 2 \
-fwd 5 \
--seed_translation_distance 2 \
--recon_conf_thres 2.0 \
--gen_conf_thres 2.0 \
--apply_majority_pooling \
--model occany_must3rEach processed scene is written under ./demo_data/output/<frame_id>_<model>/. Common artifacts include:
pts3d_render.npyfor reconstruction-view point clouds and metadatapts3d_render_gen.npyfor rendered-view point clouds and metadata when--genis enabledpts3d_render_recon_gen.npyfor the merged point-cloud bundlevoxel_predictions.pklfor voxelized occupancy predictions, camera metadata, and visualization inputs
The pts3d_*.npy files are what vis_viser.py reads, while voxel_predictions.pkl is what vis_voxel.py and compute_metrics_from_saved_voxels.py consume.
inference.py currently uses an urban voxel grid selected for the included demo scenes:
voxel_size = 0.4occ_size = [200, 200, 24]voxel_origin = [-40.0, -40.0, -3.6]
This 200 x 200 x 24 grid is only the default for demo inference. The evaluation pipeline uses its own dataset-specific layouts, so only edit these constants when you want the standalone demo outputs to follow another convention. Two common evaluation presets are:
KITTI
voxel_size = 0.2
occ_size = [256, 256, 32]
voxel_origin = np.array([0.0, -25.6, -2.0], dtype=np.float32)nuScenes
voxel_size = 0.4
occ_size = [200, 200, 16]
voxel_origin = np.array([-40.0, -40.0, -1.0], dtype=np.float32)The most commonly adjusted flags fall into three groups: common flags, semantic flags, and render-specific flags. If you only want reconstruction output, omit --gen and any flag whose scope below is Render or Render + semantic.
List of inference flags
| Flag | Scope | Description |
|---|---|---|
--model |
Common | Select the inference backbone: occany_da3 or occany_must3r |
--input_dir |
Common | Directory containing RGB demo scene folders |
--output_dir |
Common | Directory where outputs are written |
--gen |
Common toggle | Enable novel-view rendering before voxel fusion |
-vpi, --views_per_interval |
Render | Number of rendered views sampled per reconstruction view |
-fwd, --gen_forward_novel_poses_dist |
Render | Forward offset for rendered views, in meters |
-rot, --gen_rotate_novel_poses_angle |
Render | Left/right yaw rotation applied to rendered views, in degrees |
--num_seed_rotations |
Render | Number of additional seed rotations used when initializing rendered poses |
--seed_rotation_angle |
Render | Angular spacing between seed rotations, in degrees |
--seed_translation_distance |
Render | Lateral translation paired with each seed rotation, in meters |
--batch_gen_view |
Render | Number of rendered views processed in parallel |
--semantic |
Semantic | Enable semantic inference with a SAM2 or SAM3 variant |
--compute_segmentation_masks |
Semantic | Save segmentation masks during semantic inference |
--view_batch_size |
Semantic | Number of views processed together during semantic inference |
--recon_conf_thres |
Reconstruction | Confidence threshold used when voxelizing reconstructed points |
--gen_conf_thres |
Render | Confidence threshold used when voxelizing rendered points |
--no_semantic_from_rotated_views |
Render + semantic | Ignore semantics from rotated rendered views |
--only_semantic_from_recon_view |
Render + semantic | Use semantics only from reconstruction views, even when rendered views are present |
--gen_semantic_from_distill_sam3 |
Render + semantic | For pretrained@SAM3, infer rendered-view semantics from distilled SAM3 features when available |
--apply_majority_pooling |
Post-processing | Apply 3x3x3 majority pooling to the fused voxel grid |
Use vis_viser.py to inspect the saved pts3d_*.npy point-cloud outputs interactively:
python vis_viser.py --input_folder ./demo_data/outputYou can point --input_folder either to the output root or directly to a single scene folder. In the viewer, the common dropdown options are:
renderfor reconstruction outputrender_genfor rendered-view outputrender_recon_genfor the combined output
vis_voxel.py renders voxel predictions to image files. Install mayavi separately if you want to use this path:
pip install mayavi
python vis_voxel.py --input_root ./demo_data/output --dataset nuscenesHelpful notes:
- The script writes rendered images to
./demo_data/output_visby default - If the requested
--prediction_keyis missing, it automatically falls back to the best availablerender*grid - Use
--dataset kittifor KITTI-style scenes and--dataset nuscenesfor nuScenes-style surround-view scenes - Add
--save_input_imagesif you also want stacked input RGB images next to the voxel render
This section covers the end-to-end evaluation workflow for KITTI and nuScenes using the provided shell and SLURM wrappers.
Download the following assets:
- The Semantic Scene Completion dataset v1.1 (SemanticKITTI voxel data, 700 MB) from the SemanticKITTI website
- The KITTI Odometry Benchmark calibration data (calibration files, 1 MB) and RGB images (color, 65 GB) from the KITTI Odometry website
The dataset folder at /path/to/kitti should have the following structure:
└── /path/to/kitti/
└── dataset
├── poses
└── sequences
- Download nuScenes with the following script. By default, it downloads to
$PROJECT/data/nuscenes.
export PROJECT=$PWD
mkdir -p $PROJECT/data/nuscenes
python dataset_setup/nuscenes/download.py --download_dir $PROJECT/data/nuscenes --output_dir $PROJECT/data/nuscenes --download_workers 16- Install the
nuscenes-devkit:
pip install nuscenes-devkit --no-cache-dir- Download the voxel ground truth from Occ3D-nuScenes, including the following files:
annotations.json
gts.tar.gz
imgs.tar.gz- Extract them under
$PROJECT/data/nuscenes. You should then have the following structure:
$PROJECT/data/nuscenes/
├── annotations.json
├── can_bus/
├── gts/
├── imgs/
├── maps/
├── samples/
├── sweeps/
├── v1.0-test/
└── v1.0-trainval/
-
Set
PROJECTandSCRATCH, then create the evaluation directories:export PROJECT=$PWD export SCRATCH=$PWD/eval_output mkdir -p \ "$SCRATCH/ssc_voxel_pred" \ "$SCRATCH/ssc_output" \ "$SCRATCH/data/kitti_processed" \ "$SCRATCH/data/nuscenes_processed"
-
If your datasets are not under
$PROJECT/data/kittiand$PROJECT/data/nuscenes, override the roots:export KITTI_ROOT=/path/to/kitti export NUSCENES_ROOT=/path/to/occ3d_nuscenes
-
Build the vendored Grounded-SAM-2 / GroundingDINO extension:
pip install -e third_party/Grounded-SAM-2 # REQUIRE GCC > 9 pip install --no-build-isolation -e third_party/Grounded-SAM-2/grounding_dinoIf
groundingdino/_Cfails to load (for example,NameError: name '_C' is not definedinms_deform_attn.py), rerun this step in youroccanyenvironment. -
Pre-extract the GroundingDINO boxes once:
python extract_gdino_boxes_kitti.py --image_size 1216 --box_threshold 0.05 --text_threshold 0.05 python extract_gdino_boxes_nuscenes.py --image_size 1328 --box_threshold 0.05 --text_threshold 0.05
Cached boxes are written to:
$SCRATCH/data/kitti_processed/resized_1216_box5_text5_DINOB/<sequence>_<frame_id>/boxes.npz $SCRATCH/data/nuscenes_processed/resized_1328_box5_text5_DINOB/<scene_name>/<frame_token>_<camera_name>/boxes.npzsh/eval_occany.shalready uses these cache folders, so later evaluation runs can reuse the detections.
sh/eval_occany.sh writes voxel predictions under $SCRATCH/ssc_voxel_pred/<preset-output-dir>/... and sampled visualization artifacts under $SCRATCH/ssc_output/<preset-output-dir>/....
Caution
Evaluation can take a very long time on a single process because some extraction presets render up to 180 novel views. We therefore provide the SLURM commands in the SLURM Workflow section, which run 20 processes in parallel for occupancy extraction. We have only tested the SLURM path but the local shell should output the same results.
Note
To maximize performance, some presets sample novel views densely, rendering roughly 150–180 views. You can reduce runtime by lowering -vpi (views per reconstruction view). In general, the total number of novel views is n_recon × vpi × (3 if rot > 0 else 1).
Evaluation is a two-step workflow:
- Run
extract_output_occany.pythroughsh/eval_occany.sh(orslurm/eval_occany.slurm) to save voxel predictions. - Run
compute_metrics_from_saved_voxels.pythroughsh/compute_metric.sh(orslurm/compute_metric.slurm) to compute SSC metrics from the savedvoxel_predictions.pklfiles.
Run both commands from the repo root. Each block below mirrors the corresponding SLURM example without the sbatch wrapper.
Command template:
EXP_LIST=<exp_list> EXP_ID=<id> bash sh/eval_occany.sh
USE_MAJORITY_POOLING=1 POOLING_MODE=<mode> EXP_LIST=<metric_exp_list> EXP_ID=<id> bash sh/compute_metric.shUse
EXP_LIST=occanyandmetric EXP_LIST=metric_occany.
| Preset | EXP_ID |
POOLING_MODE |
|---|---|---|
| KITTI 5-frame geometry | 0 |
separate |
| KITTI 1-frame geometry | 1 |
separate |
| nuScenes 5-frame geometry | 2 |
separate |
| nuScenes surround geometry | 3 |
separate |
| KITTI 5-frame distill semantic | 4 |
separate |
| nuScenes surround distill semantic | 5 |
separate |
| KITTI 5-frame pretrained semantic | 6 |
separate |
| nuScenes surround pretrained semantic | 7 |
separate |
Use
EXP_LIST=occany_plusandmetric EXP_LIST=metric_occany_plus.
| Preset | EXP_ID |
POOLING_MODE |
|---|---|---|
| KITTI 5-frame geometry | 0 |
separate |
| nuScenes surround geometry | 1 |
separate |
| KITTI 5-frame distill semantic | 2 |
unified |
| nuScenes surround distill semantic | 3 |
unified |
| KITTI 5-frame pretrained semantic | 4 |
unified |
| nuScenes surround pretrained semantic | 5 |
unified |
Some extraction presets can render up to 180 views, so extraction can be slow. The provided slurm/eval_occany.slurm script runs a 20-task array in parallel by default (#SBATCH --array=0-19 with WORLD=20).
Each example below submits the extraction job first and then chains the metric job with --dependency=afterany:$(...), so the metric job waits until the full extraction array finishes. The public SLURM wrappers keep the Karolina-HPC defaults (-A eu-25-92, --partition=qgpu, --hint=nomultithread, --cpus-per-task=16, and conda activate occany); update those settings to match your cluster.
Command template:
sbatch --dependency=afterany:$(sbatch --parsable --export=EXP_LIST=<exp_list>,EXP_ID=<id>,WORLD=20 slurm/eval_occany.slurm) --export=EXP_LIST=<metric_exp_list>,EXP_ID=<id>,USE_MAJORITY_POOLING=1,POOLING_MODE=<mode> slurm/compute_metric.slurmUse
EXP_LIST=occany,metric EXP_LIST=metric_occany, andPOOLING_MODE=separate.
| Preset | Paper | EXP_ID |
Expected Metrics | Notes |
|---|---|---|---|---|
| KITTI 5-frame geometry | Tab. 1 | 0 |
P 36.79 · R 46.70 · IoU 25.91 | |
| KITTI 1-frame geometry | Tab. 2 | 1 |
P 45.64 · R 33.66 · IoU 24.03 | |
| nuScenes 5-frame geometry | Tab. 1 | 2 |
P 36.09 · R 40.39 · IoU 23.55 | |
| nuScenes surround geometry | Tab. 3 | 3 |
P 45.04 · R 58.54 · IoU 34.15 | |
| KITTI 5-frame distill sem. | Tab. 5, 6 | 4 |
mIoU 7.30 · mIoUˢᶜ 13.54 | ≈ paper 7.28 / 13.53 |
| nuScenes surround distill sem. | Tab. 5, 6 | 5 |
mIoU 6.65 · mIoUˢᶜ 10.31 | ≈ paper 6.66 / 10.32 |
| KITTI 5-frame pretrained sem. | Tab. 7 | 6 |
mIoU 7.62 · mIoUˢᶜ 13.75 | ≈ paper 7.67 / 13.75 |
| nuScenes surround pretrained sem. | Tab. 7 | 7 |
mIoU 7.42 · mIoUˢᶜ 10.78 |
For OccAny+, geometry metrics use
separatepooling while semantic metrics useunifiedpooling. UseEXP_LIST=occany_plusandmetric EXP_LIST=metric_occany_plus.
| Preset | Paper | EXP_ID |
Expected Metrics | POOLING_MODE |
|---|---|---|---|---|
| KITTI 5-frame geometry | Tab. 5 | 0 |
P 38.11 · R 49.13 · IoU 27.33 | separate |
| nuScenes surround geometry | Tab. 5 | 1 |
P 46.37 · R 54.67 · IoU 33.49 | separate |
| KITTI 5-frame distill sem. | Tab. 7 | 2 |
mIoU 6.49 · mIoUˢᶜ 13.31 | unified |
| nuScenes surround distill sem. | Tab. 7 | 3 |
mIoU 7.20 · mIoUˢᶜ 11.51 | unified |
| KITTI 5-frame pretrained sem. | Tab. 7 | 4 |
mIoU 8.03 · mIoUˢᶜ 13.17 | unified |
| nuScenes surround pretrained sem. | Tab. 7 | 5 |
mIoU 9.45 · mIoUˢᶜ 12.22 | unified |
This section evaluates out-of-domain 3D reconstruction, depth estimation, and ego trajectory estimation on KITTI and nuScenes. Both datasets are unseen during training. All metrics in this section are reported in physical metric space.
Reconstruction accuracy is evaluated in two steps:
extract_recon.py— runs model inference and saves per-sample point clouds and depths as.npzfiles. Supports--world/--pidfor SLURM array parallelism.compute_recon_metrics.py— loads the saved.npzfiles, computes 3D reconstruction metrics (accuracy, completeness, precision, recall, F-score) and depth metrics, then writes aggregated results torecon_metrics.json.
Reconstruction distances are reported in meters, and precision, recall, and F-score use a 0.5 m correspondence threshold.
| Setting | Model | acc ↓ |
comp ↓ |
overall ↓ |
prec ↑ |
recall ↑ |
F-score ↑ |
|---|---|---|---|---|---|---|---|
| KITTI 5frames | DA3 1.1B + DA3 metric 0.35B | 1.43 | 0.78 | 1.11 | 42.11 | 63.57 | 50.48 |
| KITTI 5frames | OccAny+ recon 1.1B | 1.06 | 1.23 | 1.15 | 59.09 | 79.86 | 67.52 |
| nuScenes surround | DA3 1.1B + DA3 metric 0.35B | 7.31 | 1.61 | 4.46 | 32.92 | 42.35 | 36.68 |
| nuScenes surround | OccAny+ recon 1.1B | 1.79 | 0.97 | 1.38 | 38.71 | 64.75 | 48.03 |
Depth is capped at 80 m.
| Setting | Model |
abs_rel (%) ↓ |
sq_rel ↓ |
rmse ↓ |
log_rmse ↓ |
|
|
|
|---|---|---|---|---|---|---|---|---|
| KITTI 5frames | DA3 1.1B + DA3 metric 0.35B | 33.28 | 2.21 | 6.27 | 0.32 | 33.02 | 92.39 | 98.11 |
| KITTI 5frames | OccAny+ recon 1.1B | 9.58 | 0.62 | 4.35 | 0.19 | 90.39 | 96.15 | 98.15 |
| nuScenes surround | DA3 1.1B + DA3 metric 0.35B | 45.38 | 10.16 | 9.70 | 0.43 | 49.47 | 76.97 | 91.63 |
| nuScenes surround | OccAny+ recon 1.1B | 24.43 | 2.16 | 6.71 | 0.34 | 68.37 | 88.00 | 94.13 |
The extraction step is parallelized across SLURM array tasks. Each task processes a shard of the dataset. After all extraction tasks finish, a single metric computation job aggregates the results.
Step 1 — Extract (submit one array job per experiment config):
# EXP_ID=0: KITTI OccAny+ 1B, EXP_ID=1: nuScenes OccAny+ 1B,
# EXP_ID=2: KITTI DA3, EXP_ID=3: nuScenes DA3
EXP_LIST=recon EXP_ID=0 sbatch slurm/extract_recon.slurmThe default SLURM wrapper uses 20 array tasks (--array=0-19). Override with:
EXP_LIST=recon EXP_ID=0 WORLD=20 sbatch --array=0-19 slurm/extract_recon.slurmStep 2 — Compute metrics (after all extraction tasks finish):
EXP_LIST=metric_recon EXP_ID=0 sbatch slurm/compute_recon_metric.slurmEXP_ID |
Model | Dataset | Setting | Checkpoint |
|---|---|---|---|---|
0 |
occany_da3 |
KITTI | 5frames | occany_plus_recon_1B.pth |
1 |
occany_da3 |
nuScenes | surround | occany_plus_recon_1B.pth |
2 |
da3 |
KITTI | 5frames | (plain DA3 Giant, no checkpoint) |
3 |
da3 |
nuScenes | surround | (plain DA3 Giant, no checkpoint) |
Run the same evaluations locally without SLURM:
# 1) KITTI 5frames — OccAny+ 1B
python extract_recon.py \
--model occany_da3 \
--dataset kitti \
--setting 5frames \
--exp_name occany_plus_recon_1B \
--occany_recon_ckpt ./checkpoints/occany_plus_recon_1B.pth
python compute_recon_metrics.py \
--exp_dir ./outputs/occany_plus_recon_1B_occany_da3_kitti_5frames_img512
# 2) nuScenes surround — OccAny+ 1B
python extract_recon.py \
--model occany_da3 \
--dataset nuscenes \
--setting surround \
--exp_name occany_plus_recon_1B \
--occany_recon_ckpt ./checkpoints/occany_plus_recon_1B.pth
python compute_recon_metrics.py \
--exp_dir ./outputs/occany_plus_recon_1B_occany_da3_nuscenes_surround_img512
# 3) KITTI 5frames — plain DA3 Giant
python extract_recon.py \
--model da3 \
--dataset kitti \
--setting 5frames \
--exp_name da3_recon
python compute_recon_metrics.py \
--exp_dir ./outputs/da3_recon_da3_kitti_5frames_img512
# 4) nuScenes surround — plain DA3 Giant
python extract_recon.py \
--model da3 \
--dataset nuscenes \
--setting surround \
--exp_name da3_recon
python compute_recon_metrics.py \
--exp_dir ./outputs/da3_recon_da3_nuscenes_surround_img512Extraction outputs (.npz files) and metric results (recon_metrics.json) are written to ./outputs/<exp_name>_<model>_<dataset>_<setting>_img512/.
This section covers ego-trajectory evaluation on NuScenes Vista validation sequences using infer_trajectory.py and the provided shell / SLURM wrappers. The reported metric is ADE (Average Displacement Error).
Submit the provided array job to evaluate both released model sizes:
sbatch slurm/eval_trajectory.slurmThe bundled wrapper maps the array tasks as follows:
SLURM_ARRAY_TASK_ID=0→./checkpoints/occany_plus_recon.pth(OccAny 0.35B)SLURM_ARRAY_TASK_ID=1→./checkpoints/occany_plus_recon_1B.pth(OccAny 1.1B)
Run the same evaluation locally without SLURM by setting the checkpoint explicitly.
OCCANY_PLUS_RECON_CKPT=./checkpoints/occany_plus_recon.pth \
bash sh/infer_occany_nuscenes_traj.shOCCANY_PLUS_RECON_CKPT=./checkpoints/occany_plus_recon_1B.pth \
bash sh/infer_occany_nuscenes_traj.shOutputs are written to ./outputs/<ckpt_name>_nuscenes_traj/, including ade_metrics.json and trajectory plots.
| Method | ADE (m) ↓ |
|---|---|
| GEM pseudo traj | 1.63 |
| DA3 0.35B + DA3 metric 0.35B | 2.44 |
| DA3 1.1B + DA3 metric 0.35B | 1.12 |
| OccAny recon 0.35B | 1.86 |
| OccAny+ recon 1.1B | 0.90 |
This repository ships public SLURM wrappers for the two-stage training pipelines:
slurm/train_occany.slurmfor OccAny (Must3R + SAM2)slurm/train_occany_plus.slurmfor OccAny+ (Depth Anything 3 + SAM3)
All training SLURM scripts in this repository have been tested on 16 A100 40G GPUs across 2 nodes.
Both wrappers use the same array-task mapping:
0= reconstruction stage1= render stage
The shell entrypoints under sh/ expect the processed training datasets referenced in those files to already exist under $SCRATCH/data/.... If your processed dataset roots live elsewhere, update the paths in the corresponding sh/train_*.sh file before launching training.
All commands in this section assume your current working directory is the repository root. From there, define the output roots used by the training wrappers:
export PROJECT=$PWD
export SCRATCH=${SCRATCH:-$PROJECT}The SLURM wrappers already activate the occany conda environment and write scheduler logs to slurm/output/.
If you want to launch training directly from an interactive shell without SLURM, activate the environment yourself and force single-node / single-GPU mode so the helper scripts use plain python instead of srun python:
conda activate occany
export NUM_NODE=1
export NUM_GPU_PER_NODE=1You can also override script defaults inline, for example:
BATCH_SIZE=1 N_WORKERS=4 bash sh/train_occany_recon.shTraining recipe note: the original OccAny training was run in three stages: sequence-only reconstruction, sequence-only rendering, and sequence + surround reconstruction. In this public codebase, we simplify the recipe to two stages: sequence + surround reconstruction followed by sequence + surround rendering. This simplified two-stage recipe is also the one used for OccAny+.
For a compact overview of the scripts under dataset_setup/, see dataset_setup/README.md. Dataset-specific caveats for DDAD and PandaSet live in dataset_setup/ddad/README.md and dataset_setup/pandaset/README.md.
The training shell entrypoints expect the processed datasets below to exist under $SCRATCH/data/:
$SCRATCH/data/
├── ddad_processed/
├── once_processed/
├── pandaset_processed/
├── vkitti_processed/
└── waymo_processed/
The commands below keep the raw archives under $PROJECT/data/raw/ and write the preprocessed training samples to $SCRATCH/data/, which matches the default roots used by sh/train_occany*.sh.
export PROJECT=$PWD
export SCRATCH=${SCRATCH:-$PROJECT}
mkdir -p \
"$PROJECT/data/raw" \
"$SCRATCH/data/waymo_processed" \
"$SCRATCH/data/vkitti_processed" \
"$SCRATCH/data/ddad_processed" \
"$SCRATCH/data/pandaset_processed" \
"$SCRATCH/data/once_processed"If you prefer to keep the raw datasets elsewhere, pass the explicit raw root to each preprocessing command instead of relying on the defaults.
-
Accept the Waymo Open Dataset license, then download the Perception v1.4.2
training/*.tfrecordfiles into:$PROJECT/data/raw/waymo/training/You can use
dataset_setup/waymo/download_waymo.shas a starting point. -
Install the extra dependency required by
dataset_setup/waymo/preprocess_waymo.py:pip install gcsfs waymo-open-dataset-tf-2-12-0==1.6.4 --no-cache-dir
-
Preprocess the dataset:
python dataset_setup/waymo/preprocess_waymo.py \ --waymo_dir "$PROJECT/data/raw/waymo/training" \ --output_dir "$SCRATCH/data/waymo_processed" \ --workers 16
-
Download and extract Virtual KITTI 2 so that the raw root looks like:
$PROJECT/data/raw/vkitti/VirtualKitti2/ ├── Scene01/ ├── Scene02/ └── ... -
Preprocess the dataset:
python dataset_setup/vkitti/preprocess_vkitti.py \ --vkitti_dir "$PROJECT/data/raw/vkitti/VirtualKitti2" \ --output_dir "$SCRATCH/data/vkitti_processed" \ --workers 16
-
Download and extract DDAD to a raw root such as:
$PROJECT/data/raw/DDAD/ -
Install TRI-ML's
dgppackage and the protobuf version expected bydataset_setup/ddad/preprocess.py. Seedataset_setup/ddad/README.mdfor environment-specific installation notes and the protobuf pin used here. -
Preprocess the dataset:
python dataset_setup/ddad/preprocess.py \ --ddad_root "$PROJECT/data/raw/DDAD" \ --preprocessed_root "$SCRATCH/data/ddad_processed" \ --n_workers 16
-
Download the ONCE archives from the official source and place the tar files under:
$PROJECT/data/raw/once_archives/ -
Extract them so that the raw dataset root becomes:
$PROJECT/data/raw/ONCE/ └── data/ ├── <sequence_id>/ ├── train_split.txt ├── val_split.txt └── ...The helper
dataset_setup/once/extract.shshows one parallel extraction approach, but it contains site-specific paths, so update itsSOURCE/DESTvariables before using it. -
Preprocess the dataset:
python dataset_setup/once/preprocess.py \ --root "$PROJECT/data/raw/ONCE" \ --preprocessed_root "$SCRATCH/data/once_processed" \ --n_workers 16
- Download and extract PandaSet to a raw root such as:
$PROJECT/data/raw/PandaSet/
-
Install the
pandaset-devkitdependency.dataset_setup/pandaset/README.mdincludes a concrete environment example and notes about the optional pair-generation helper. -
Preprocess the dataset:
python dataset_setup/pandaset/preprocess.py \ --root "$PROJECT/data/raw/PandaSet" \ --save_dir "$SCRATCH/data/pandaset_processed"
The training scripts expect the processed output under:
$SCRATCH/data/pandaset_processed/
dataset_setup/pandaset/make_pairs.py is optional and only applies if you maintain a JPEG-exported processed tree. The current preprocess.py writes .npz samples.
After preprocessing, generate the training sequence pickle files consumed by WaymoSeqMultiView, VKittiSeqMultiView, DDADSeqMultiView, PandasetSeqMultiView, and OnceSeqMultiView.
The intended batch entrypoint is:
sbatch slurm/make_seqs.slurmImportant notes:
slurm/make_seqs.slurmlaunches temporal sequence generation forwaymo,once,ddad,pandaset,vkitti, andkitti, plus surround sequence generation for the multi-camera datasetswaymo,once,ddad, andpandaset.sh/make_seqs.shcalls the bundleddataset_setup/base_make_seq.py.- With the scripts as shipped, the expected sequence filenames are:
seq_exact_len_sub5_stride9_all.pklfor Waymo, DDAD, PandaSet, and ONCE temporal trainingseq_exact_len_sub5_stride9.pklfor VKITTI and KITTI temporal runsseq_surround_all.pklfor Waymo, DDAD, PandaSet, and ONCE surround training
- Single-camera datasets (
kitti,vkitti) skip surround mode by design.
Once the processed roots and sequence pickle files are in place, the default training wrappers can read them directly from $SCRATCH/data/... without any further path edits.
OccAny reconstruction and rendering both rely on the Must3R base weights referenced by sh/train_occany_recon.sh and sh/train_occany_gen.sh:
mkdir -p checkpoints
curl -L https://download.europe.naverlabs.com/ComputerVision/MUSt3R/MUSt3R_512.pth \
-o checkpoints/MUSt3R_512.pthTRAIN_TASK_ID=0 dispatches slurm/train_occany.slurm to sh/train_occany_recon.sh:
sbatch --array=0 slurm/train_occany.slurmWithout SLURM, run the same stage directly with:
bash sh/train_occany_recon.shWith the default script values, checkpoints and TensorBoard logs are written to:
$PROJECT/tb_log_occany/occany_reconFor OccAny, the final checkpoint for both reconstruction and rendering is the last checkpoint, i.e. checkpoint-last.pth.
Before launching rendering, point the helper script at the reconstruction checkpoint you just trained. By default it uses:
checkpoints/occany_recon.pthIn practice, checkpoints/occany_recon.pth should be a copy or symlink to that last reconstruction checkpoint.
If you want a different path, override OCCANY_RECON_CKPT inline:
OCCANY_RECON_CKPT=/path/to/occany_recon.pth bash sh/train_occany_gen.shKeep the Must3R base checkpoint available at checkpoints/MUSt3R_512.pth, or override MUST3R_PRETRAINED_CKPT. The render stage still loads the base Must3R checkpoint in addition to --pretrained_occany.
Then launch the render stage with:
sbatch --array=1 slurm/train_occany.slurmWithout SLURM, run the same stage directly with:
BATCH_SIZE=2 bash sh/train_occany_gen.shWith the default script values, render outputs are written to:
$PROJECT/tb_log_occany/occany_genTRAIN_TASK_ID=0 dispatches slurm/train_occany_plus.slurm to sh/train_occany_plus_recon.sh:
sbatch --array=0 slurm/train_occany_plus.slurmWithout SLURM, run the same stage directly with:
bash sh/train_occany_plus_recon.shWith the default script values, checkpoints and TensorBoard logs are written to:
$PROJECT/tb_log_occany/occany_plus_reconFor OccAny+, we use the checkpoint at epoch 50 as the final checkpoint for both reconstruction comparison and render for comparison convenience: all OccAny+ experiments run past 50 epochs within about 2 days on 16 A100 40GB GPUs.
The public repository also ships sh/train_occany_plus_recon_1B.sh, which switches the reconstruction stage to the DA3-Giant 1.1B backbone.
For SLURM, use the dedicated wrapper:
sbatch slurm/train_occany_plus_recon_1B.slurmWithout SLURM, run the 1.1B reconstruction stage directly with:
bash sh/train_occany_plus_recon_1B.shWith the default script values, checkpoints and TensorBoard logs are written to:
$PROJECT/tb_log_occany/occany_plus_recon_1BBefore launching rendering, point the helper script at the reconstruction checkpoint you just trained. By default it uses:
checkpoints/occany_plus_recon.pthIn practice, checkpoints/occany_plus_recon.pth should usually be a copy or symlink of checkpoint-50.pth.
If you want a different path, override OCCANY_PLUS_RECON_CKPT inline:
OCCANY_PLUS_RECON_CKPT=/path/to/occany_plus_recon.pth bash sh/train_occany_plus_gen.shThen launch the render stage with:
sbatch --array=1 slurm/train_occany_plus.slurmWithout SLURM, run the same stage directly with:
bash sh/train_occany_plus_gen.shWith the default script values, render outputs are written to:
$PROJECT/tb_log_occany/occany_plus_genFor both training backends, --output_dir is the canonical experiment directory. It stores:
- TensorBoard event files
log.txtcheckpoint-last.pthcheckpoint-final.pth- periodic
checkpoint-<epoch>.pthsnapshots
To inspect an experiment with TensorBoard, point --logdir at the same --output_dir used for training. For example:
tensorboard --logdir "$PROJECT/tb_log_occany/occany_recon"This project is licensed under the Apache License 2.0, see the LICENSE file for details.
We thank the authors of these excellent open-source projects:
Dust3r · Must3r · Depth-Anything-3 · SAM2 · SAM3 · viser
If you find this work or code useful, please cite the paper and consider starring the repository:
@inproceedings{cao2026occany,
title={OccAny: Generalized Unconstrained Urban 3D Occupancy},
author={Anh-Quan Cao and Tuan-Hung Vu},
booktitle={CVPR},
year={2026}
}