- [2026-3] Evaluation code, model checkpoints, detailed installation instruction have been released
- [2026-3] We release training code
- [2026-1] SceneCOT is accepted by ICLR 2026
- [2025-6] We released the webpage of SceneCOT
- Clone the repository.
git clone https://github.com/SceneCOT/scenecot
cd scenecot- Create a Python environment and install dependencies.
conda create -n scenecot python=3.9
conda activate scenecot
# PyTorch (example tested version)
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia
# project dependencies
pip install -r requirements.txt- Install point-cloud third-party modules.
pip install spconv-cu118
cd model/pointnetpp
python setup.py install
cd ../..
# sanity check
python -c 'from model.pointnetpp.pointnetpp import PointNetPP'If PointNext build/import fails, either disable PointNext usage or place the compiled file from LEO_data under model/pointnext/cpp/pointnet2_batch/.
The configs were updated to avoid machine-specific absolute paths. We recommend setting the following environment variables:
| Variable | Purpose | Default | Download / Source Link |
|---|---|---|---|
SCENECOT_EXP_ROOT |
experiment output root (cfg.base_dir) |
./outputs |
- |
SCENECOT_DATA_ROOT |
root directory for dataset/assets used by configs/data/default.yaml | ./data_assets |
SceneCOT dataset |
SCENECOT_COT_DATA_ROOT |
root directory for released COT annotations (MSQA/, GQA3D/) |
${SCENECOT_DATA_ROOT}/scenecot_cot_data |
SceneCOT dataset / scenecot_cot_data |
SCENECOT_MSR3D_ANNO_DIR |
MSQA annotation directory (contains situated_qa_{train,val,test}_pure_txt.json) |
${SCENECOT_COT_DATA_ROOT}/MSQA |
MSQA |
SCENECOT_GQA3D_ANNO_DIR |
GQA3D annotation directory (contains gqa3d_{train,val,test}.json) |
${SCENECOT_COT_DATA_ROOT}/GQA3D |
GQA3D |
HF_HOME |
Hugging Face cache root (cfg.hf_home) |
./.cache/huggingface |
Hugging Face Hub |
SCENECOT_MODEL_ROOT |
unified root directory for default model/checkpoint paths | ./model_assets |
SceneCOT models |
SCENECOT_LLM_PATH |
LLaVA model path (override) | ${SCENECOT_MODEL_ROOT}/llava-v1.5-7b |
LLaVA-1.5-7B |
SCENECOT_VISION_TOWER_PATH |
CLIP vision tower path (override) | ${SCENECOT_MODEL_ROOT}/clip-vit-large-patch14-336 |
CLIP ViT-L/14-336 |
SCENECOT_PQ3D_TOKENIZER_PATH |
PQ3D text tokenizer path (override, data.pq3d_tokenizer_path) |
${SCENECOT_MODEL_ROOT}/clip-vit-large-patch14 |
SceneCOT models |
SCENECOT_POINTNET_TOKENIZER_PATH |
PQ3D PointNet++ tokenizer checkpoint (override) | ${SCENECOT_MODEL_ROOT}/pointnet_tokenizer.pth |
SceneCOT models |
SCENECOT_QUERY3D_PRETRAIN_PATH |
PQ3D/SceneVerse pretrain checkpoint (override) | ${SCENECOT_MODEL_ROOT}/query3d_pretrain.bin |
SceneCOT models |
SCENECOT_EXPERT1_PATH |
MOE expert-1 checkpoint directory (override) | ${SCENECOT_MODEL_ROOT}/expert1_checkpoint0 |
SceneCOT model repo (checkpoint dirs) |
SCENECOT_EXPERT2_PATH |
MOE expert-2 checkpoint directory (override) | ${SCENECOT_MODEL_ROOT}/expert2_best.pth |
SceneCOT model repo (checkpoint dirs) |
Example:
export SCENECOT_EXP_ROOT=/path/to/experiments
export SCENECOT_DATA_ROOT=/path/to/data_assets
export SCENECOT_COT_DATA_ROOT=/path/to/data_assets/scenecot_cot_data
export SCENECOT_MSR3D_ANNO_DIR=/path/to/data_assets/scenecot_cot_data/MSQA
export SCENECOT_GQA3D_ANNO_DIR=/path/to/data_assets/scenecot_cot_data/GQA3D
export HF_HOME=/path/to/hf_cache
export SCENECOT_MODEL_ROOT=/path/to/model_assets
# Optional explicit overrides when using non-default file names/locations
# export SCENECOT_LLM_PATH=/path/to/model_assets/llava-v1.5-7b
# export SCENECOT_VISION_TOWER_PATH=/path/to/model_assets/clip-vit-large-patch14-336
# export SCENECOT_PQ3D_TOKENIZER_PATH=/path/to/model_assets/clip-vit-large-patch14
# export SCENECOT_EXPERT1_PATH=/path/to/model_assets/expert1_checkpoint0
# export SCENECOT_EXPERT2_PATH=/path/to/model_assets/expert2_best.pthTo reproduce paper-level performance, the following checkpoints are needed:
- SceneCOT experts (released): SceneCOT model repo
- PQ3D PointNet++ tokenizer (
pointnet_tokenizer.pth) β setSCENECOT_POINTNET_TOKENIZER_PATH - Query3D/SceneVerse pretrain (
pytorch_model.bin) β setSCENECOT_QUERY3D_PRETRAIN_PATH
For MOE evaluation, expert checkpoints are expected as directories under SCENECOT_MODEL_ROOT:
${SCENECOT_MODEL_ROOT}/
βββ expert1_checkpoint0/
β βββ pytorch_model.bin (or model.safetensors)
βββ expert2_best.pth/
βββ pytorch_model.bin (or model.safetensors)
These map to:
moe.expert1_pathβ${SCENECOT_MODEL_ROOT}/expert1_checkpoint0(orSCENECOT_EXPERT1_PATH)moe.expert2_pathβ${SCENECOT_MODEL_ROOT}/expert2_best.pth(orSCENECOT_EXPERT2_PATH)
By default, 2/3 are resolved under SCENECOT_MODEL_ROOT. If files are absent, related modules are initialized without those pretrained weights, which may significantly affect final metrics.
Tracking is enabled by default. For evaluation-only/offline runs without login:
export WANDB_MODE=disabledIf direct access to huggingface.co is restricted, set a mirror endpoint and keep a local cache:
export HF_ENDPOINT=https://your-hf-mirror
export HF_HOME=/path/to/hf_cache- Download released dataset assets from SceneCOT dataset.
- Place all downloaded data under one root directory, for example:
/path/to/data_assets
- Set:
export SCENECOT_DATA_ROOT=/path/to/data_assetsconfigs/data/default.yamlresolves paths fromSCENECOT_DATA_ROOTas:
${SCENECOT_DATA_ROOT}/SceneVerseβdata.sceneverse_base${SCENECOT_DATA_ROOT}/leo2-cotβdata.cot_annotation_base${SCENECOT_DATA_ROOT}/scan_familyβdata.scan_family_base${SCENECOT_DATA_ROOT}/LEO-2_feature/ScanNetβdata.obj_feat_2d_base.ScanNet${SCENECOT_DATA_ROOT}/scene-verse-pred-all/ScanNetβdata.obj_feat_base.ScanNet${SCENECOT_DATA_ROOT}/scenecot_imgs/imgs/scannetβdata.obj_img_base.ScanNet
- COT annotation paths are resolved clearly as:
${SCENECOT_COT_DATA_ROOT}/MSQA(orSCENECOT_MSR3D_ANNO_DIR) βdata.msr3d_anno_dir,data.cotqa.msr3d.anno_dir${SCENECOT_COT_DATA_ROOT}/GQA3D(orSCENECOT_GQA3D_ANNO_DIR) βdata.gqa3d_anno_dir,data.cotqa.gqa3d.anno_dir
Expected folder layout:
${SCENECOT_COT_DATA_ROOT}/
βββ MSQA/
β βββ situated_qa_train_pure_txt.json
β βββ situated_qa_val_pure_txt.json
β βββ situated_qa_test_pure_txt.json
βββ GQA3D/
βββ gqa3d_train.json
βββ gqa3d_val.json
βββ gqa3d_test.json
- Download released checkpoints from SceneCOT models, and set optional PQ3D checkpoint envs if available.
Training:
sh scripts/train/full_training_msqa_gqa3d.shEvaluation (MOE test script):
sh scripts/test/full_training_msqa_beacon3d_test_moe.sh- Download
evaluation_assetsfrom HF evaluation assets. - Set optional variables:
export SCENECOT_EVAL_ASSETS=/path/to/evaluation_assets
export SCENECOT_EVAL_ROOT=/path/to/experiments- Run:
python evaluator/msqa_evaluator_offline.pyExpected prediction files are read from:
{result_dir}/{model_name}/eval_results/{dataset_name}/results.json (or results.pt)
where result_dir defaults to SCENECOT_EVAL_ROOT.
- Arxiv paper
- Evaluation code
- Training code
- Model weights
- SceneCOT-185K dataset
If you find our work helpful, please consider citing us:
@inproceedings{linghu2026scenecot,
title={SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes},
author={Linghu, Xiongkun and Huang, Jiangyong and Zhu, Ziyu and Jia, Baoxiong and Huang, Siyuan},
booktitle={Proceedings of the International Conference on Learning Representations (ICLR)},
year={2026}
}.png)

.jpg)