Skip to content

SceneCOT/scenecot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

ICLR 2026

Β  Β 
SceneCOT Teaser
SceneCOT: We propose a Chain-of-Thought reasoning method in 3D scenes (SceneCOT), decoupling a complex reasoning task into simpler and manageable problems, and building corresponding visual clues based on multimodal expert modules. To our knowledge, this is the first attempt to successfully implement the COT technique for achieving human-like step-by-step reasoning for 3D scene understanding, where we show great potential in extending it to a wider range of 3D scene understanding scenarios.

SceneCOT Framework

LEO Teaser
SceneCOT achieves great performance on MSQA, and Beacon3D, demonstrating the effectiveness of our reasoning framework. Especially, our method significanlty enhances the performance on counting, the most challenging task in MSQA. Our method also significanlty outperforms previous methods by a large margin in Beacon3D.

πŸ”₯ News

  • [2026-3] Evaluation code, model checkpoints, detailed installation instruction have been released
  • [2026-3] We release training code
  • [2026-1] SceneCOT is accepted by ICLR 2026
  • [2025-6] We released the webpage of SceneCOT

πŸš€ Get Started

  1. Clone the repository.
git clone https://github.com/SceneCOT/scenecot
cd scenecot
  1. Create a Python environment and install dependencies.
conda create -n scenecot python=3.9
conda activate scenecot

# PyTorch (example tested version)
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia

# project dependencies
pip install -r requirements.txt
  1. Install point-cloud third-party modules.
pip install spconv-cu118

cd model/pointnetpp
python setup.py install
cd ../..

# sanity check
python -c 'from model.pointnetpp.pointnetpp import PointNetPP'

If PointNext build/import fails, either disable PointNext usage or place the compiled file from LEO_data under model/pointnext/cpp/pointnet2_batch/.

πŸ”§ Reproducibility configuration

The configs were updated to avoid machine-specific absolute paths. We recommend setting the following environment variables:

Variable Purpose Default Download / Source Link
SCENECOT_EXP_ROOT experiment output root (cfg.base_dir) ./outputs -
SCENECOT_DATA_ROOT root directory for dataset/assets used by configs/data/default.yaml ./data_assets SceneCOT dataset
SCENECOT_COT_DATA_ROOT root directory for released COT annotations (MSQA/, GQA3D/) ${SCENECOT_DATA_ROOT}/scenecot_cot_data SceneCOT dataset / scenecot_cot_data
SCENECOT_MSR3D_ANNO_DIR MSQA annotation directory (contains situated_qa_{train,val,test}_pure_txt.json) ${SCENECOT_COT_DATA_ROOT}/MSQA MSQA
SCENECOT_GQA3D_ANNO_DIR GQA3D annotation directory (contains gqa3d_{train,val,test}.json) ${SCENECOT_COT_DATA_ROOT}/GQA3D GQA3D
HF_HOME Hugging Face cache root (cfg.hf_home) ./.cache/huggingface Hugging Face Hub
SCENECOT_MODEL_ROOT unified root directory for default model/checkpoint paths ./model_assets SceneCOT models
SCENECOT_LLM_PATH LLaVA model path (override) ${SCENECOT_MODEL_ROOT}/llava-v1.5-7b LLaVA-1.5-7B
SCENECOT_VISION_TOWER_PATH CLIP vision tower path (override) ${SCENECOT_MODEL_ROOT}/clip-vit-large-patch14-336 CLIP ViT-L/14-336
SCENECOT_PQ3D_TOKENIZER_PATH PQ3D text tokenizer path (override, data.pq3d_tokenizer_path) ${SCENECOT_MODEL_ROOT}/clip-vit-large-patch14 SceneCOT models
SCENECOT_POINTNET_TOKENIZER_PATH PQ3D PointNet++ tokenizer checkpoint (override) ${SCENECOT_MODEL_ROOT}/pointnet_tokenizer.pth SceneCOT models
SCENECOT_QUERY3D_PRETRAIN_PATH PQ3D/SceneVerse pretrain checkpoint (override) ${SCENECOT_MODEL_ROOT}/query3d_pretrain.bin SceneCOT models
SCENECOT_EXPERT1_PATH MOE expert-1 checkpoint directory (override) ${SCENECOT_MODEL_ROOT}/expert1_checkpoint0 SceneCOT model repo (checkpoint dirs)
SCENECOT_EXPERT2_PATH MOE expert-2 checkpoint directory (override) ${SCENECOT_MODEL_ROOT}/expert2_best.pth SceneCOT model repo (checkpoint dirs)

Example:

export SCENECOT_EXP_ROOT=/path/to/experiments
export SCENECOT_DATA_ROOT=/path/to/data_assets
export SCENECOT_COT_DATA_ROOT=/path/to/data_assets/scenecot_cot_data
export SCENECOT_MSR3D_ANNO_DIR=/path/to/data_assets/scenecot_cot_data/MSQA
export SCENECOT_GQA3D_ANNO_DIR=/path/to/data_assets/scenecot_cot_data/GQA3D
export HF_HOME=/path/to/hf_cache
export SCENECOT_MODEL_ROOT=/path/to/model_assets

# Optional explicit overrides when using non-default file names/locations
# export SCENECOT_LLM_PATH=/path/to/model_assets/llava-v1.5-7b
# export SCENECOT_VISION_TOWER_PATH=/path/to/model_assets/clip-vit-large-patch14-336
# export SCENECOT_PQ3D_TOKENIZER_PATH=/path/to/model_assets/clip-vit-large-patch14
# export SCENECOT_EXPERT1_PATH=/path/to/model_assets/expert1_checkpoint0
# export SCENECOT_EXPERT2_PATH=/path/to/model_assets/expert2_best.pth

πŸ“¦ Pretrained weights

To reproduce paper-level performance, the following checkpoints are needed:

  1. SceneCOT experts (released): SceneCOT model repo
  2. PQ3D PointNet++ tokenizer (pointnet_tokenizer.pth) β†’ set SCENECOT_POINTNET_TOKENIZER_PATH
  3. Query3D/SceneVerse pretrain (pytorch_model.bin) β†’ set SCENECOT_QUERY3D_PRETRAIN_PATH

For MOE evaluation, expert checkpoints are expected as directories under SCENECOT_MODEL_ROOT:

${SCENECOT_MODEL_ROOT}/
β”œβ”€β”€ expert1_checkpoint0/
β”‚   └── pytorch_model.bin (or model.safetensors)
└── expert2_best.pth/
  └── pytorch_model.bin (or model.safetensors)

These map to:

  • moe.expert1_path β†’ ${SCENECOT_MODEL_ROOT}/expert1_checkpoint0 (or SCENECOT_EXPERT1_PATH)
  • moe.expert2_path β†’ ${SCENECOT_MODEL_ROOT}/expert2_best.pth (or SCENECOT_EXPERT2_PATH)

By default, 2/3 are resolved under SCENECOT_MODEL_ROOT. If files are absent, related modules are initialized without those pretrained weights, which may significantly affect final metrics.

🌐 External services

Weights & Biases

Tracking is enabled by default. For evaluation-only/offline runs without login:

export WANDB_MODE=disabled

Hugging Face access

If direct access to huggingface.co is restricted, set a mirror endpoint and keep a local cache:

export HF_ENDPOINT=https://your-hf-mirror
export HF_HOME=/path/to/hf_cache

πŸ“ Data preparation

  1. Download released dataset assets from SceneCOT dataset.
  2. Place all downloaded data under one root directory, for example:

/path/to/data_assets

  1. Set:
export SCENECOT_DATA_ROOT=/path/to/data_assets
  1. configs/data/default.yaml resolves paths from SCENECOT_DATA_ROOT as:
  • ${SCENECOT_DATA_ROOT}/SceneVerse β†’ data.sceneverse_base
  • ${SCENECOT_DATA_ROOT}/leo2-cot β†’ data.cot_annotation_base
  • ${SCENECOT_DATA_ROOT}/scan_family β†’ data.scan_family_base
  • ${SCENECOT_DATA_ROOT}/LEO-2_feature/ScanNet β†’ data.obj_feat_2d_base.ScanNet
  • ${SCENECOT_DATA_ROOT}/scene-verse-pred-all/ScanNet β†’ data.obj_feat_base.ScanNet
  • ${SCENECOT_DATA_ROOT}/scenecot_imgs/imgs/scannet β†’ data.obj_img_base.ScanNet
  1. COT annotation paths are resolved clearly as:
  • ${SCENECOT_COT_DATA_ROOT}/MSQA (or SCENECOT_MSR3D_ANNO_DIR) β†’ data.msr3d_anno_dir, data.cotqa.msr3d.anno_dir
  • ${SCENECOT_COT_DATA_ROOT}/GQA3D (or SCENECOT_GQA3D_ANNO_DIR) β†’ data.gqa3d_anno_dir, data.cotqa.gqa3d.anno_dir

Expected folder layout:

${SCENECOT_COT_DATA_ROOT}/
β”œβ”€β”€ MSQA/
β”‚   β”œβ”€β”€ situated_qa_train_pure_txt.json
β”‚   β”œβ”€β”€ situated_qa_val_pure_txt.json
β”‚   └── situated_qa_test_pure_txt.json
└── GQA3D/
    β”œβ”€β”€ gqa3d_train.json
    β”œβ”€β”€ gqa3d_val.json
    └── gqa3d_test.json
  1. Download released checkpoints from SceneCOT models, and set optional PQ3D checkpoint envs if available.

πŸ•Ή Training and evaluation

Training:

sh scripts/train/full_training_msqa_gqa3d.sh

Evaluation (MOE test script):

sh scripts/test/full_training_msqa_beacon3d_test_moe.sh

πŸ“Š Offline evaluation

  1. Download evaluation_assets from HF evaluation assets.
  2. Set optional variables:
export SCENECOT_EVAL_ASSETS=/path/to/evaluation_assets
export SCENECOT_EVAL_ROOT=/path/to/experiments
  1. Run:
python evaluator/msqa_evaluator_offline.py

Expected prediction files are read from:

{result_dir}/{model_name}/eval_results/{dataset_name}/results.json (or results.pt)

where result_dir defaults to SCENECOT_EVAL_ROOT.

πŸ“ TODO List

  • Arxiv paper
  • Evaluation code
  • Training code
  • Model weights
  • SceneCOT-185K dataset

BibTex

If you find our work helpful, please consider citing us:

@inproceedings{linghu2026scenecot,
  title={SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes},
  author={Linghu, Xiongkun and Huang, Jiangyong and Zhu, Ziyu and Jia, Baoxiong and Huang, Siyuan},
  booktitle={Proceedings of the International Conference on Learning Representations (ICLR)},
  year={2026}
}

About

[ICLR 2026] SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors