Skip to content

VAISR/OVGGT

Repository files navigation

OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer

https://arxiv.org/abs/2603.05959 https://vaisr.github.io/OVGGT

Si-Yu Lu1, Po-Ting Chen2, Hui-Che Hsu2, Sin-Ye Jhong2, Wen-Huang Cheng1, Yung-Yao Chen2

1National Taiwan University    2National Taiwan University of Science and Technology


TL;DR: OVGGT is a training-free framework enabling streaming 3D reconstruction from arbitrarily long video with constant memory and compute — achieving O(1) per-frame cost while surpassing full-cache baselines in accuracy.

Left: Quantitative comparison on 7-Scenes across 200 frames. Right: Qualitative 3D reconstructions demonstrating OVGGT's stability over long sequences (50–500 frames).

News

Overview

OVGGT is a training-free framework that enables streaming 3D reconstruction from arbitrarily long video with constant memory and compute. It combines Self-Selective Caching (SSC) for zero-overhead KV cache compression via FFN residual magnitudes, and Dynamic Anchor Protection (DAP) to shield geometrically critical tokens from eviction, suppressing coordinate drift over long sequences. OVGGT is fully compatible with FlashAttention and processes videos within a fixed VRAM envelope while surpassing full-cache baselines in accuracy.

⚙️ Installation

  1. Clone OVGGT
git clone https://github.com/<your-username>/OVGGT.git
cd OVGGT
  1. Create conda environment
conda create -n OVGGT python=3.11 cmake=3.14.0
conda activate OVGGT 
  1. Install requirements
pip install -r requirements.txt
conda install 'llvm-openmp<16'

Download Checkpoints

Please download checkpoint of StreamVGGT from Hugging Face or Tsinghua cloud.

Evaluation

The evaluation code follows MonST3R, CUT3R, TTT3R, StreamVGGT and InfiniteVGGT.

cd src/

Multi-view Reconstruction

bash eval/mv_recon/run.sh 

Results will be saved in eval_results/mv_recon/${model_name}_${ckpt_name}/logs_all.txt.

Video Depth

bash eval/video_depth/run.sh 

Results will be saved in eval_results/video_depth/${data}_${model_name}/result_scale.json.

Pose Evaluation

bash eval/pose_evaluation/run.sh 

Results will be saved in eval_results/pose_evaluation/{data}_${model_name}/_error_log.txt.

🚀 Quick Start

Viser Demo (Interactive 3D Visualization)

We provide a demo for OVGGT, based on the demo code from InfiniteVGGT. You can follow the instructions below to launch it.

python demo_viser.py  \
    --seq_path path/to/nrgbd/image_sequence \
    --frame_interval 10 \
    --gt_path path/to/nrgbd/gt_camera (Optional)

Gradio Demo (Web UI)

We provide a demo for OVGGT, based on the demo code from VGGT. You can follow the instructions below to launch it.

pip install -r requirements_demo.txt
python demo_gradio.py

🙏 Acknowledgements

Our code is based on the following brilliant repositories:

DUSt3R MonST3R Spann3R CUT3R VGGT Point3R StreamVGGT TTT3R Evict3R InfiniteVGGT

Many thanks to these authors!

📝 Citation

@article{lu2026ovggt,
  title={OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer},
  author={Si-Yu Lu and Po-Ting Chen and Hui-Che Hsu and Sin-Ye Jhong and Wen-Huang Cheng and Yung-Yao Chen},
  journal={arXiv preprint arXiv:2603.05959},
  year={2026}
}

About

OVGGT is a training-free framework enabling streaming 3D reconstruction from arbitrarily long video with constant memory and compute.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages