Skip to content

Tianshuo-Xu/Motion-Forcing

Repository files navigation

Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics

arXiv Project Page HuggingFace

Tianshuo Xu1, Zhifei Chen1, Leyi Wu1, Hao Lu1, Ying-cong Chen1,2*

1HKUST (GZ)    2HKUST    * corresponding author

Motion Forcing decouples physical reasoning from visual synthesis via a hierarchical Point → Shape → Appearance paradigm, enabling precise and physically consistent video generation from a single image and user-drawn trajectories. Given sparse motion anchors, the model first generates dynamic depth (Shape), then renders high-fidelity RGB frames (Appearance) — bridging the gap between control signals and complex scene dynamics.


Visualization

Driving Ego-Action Control

Turn Left Turn Right Speed Up Slow Down
Turn Left Turn Right Speed Up Slow Down

Complex Driving Scenarios

Dangerous Cut-In Double Cut-In Right Cut-In Left Cut-In & Brake
Dangerous Cut-In Double Cut-In Right Cut-In Left Cut-In & Brake

TODO

  • Inference code
  • Gradio demo
  • Pretrained checkpoint
  • Data processing pipeline (coming soon)
  • Training code (coming soon)

Setup

git clone --recurse-submodules https://github.com/Tianshuo-Xu/Motion-Forcing.git
cd Motion-Forcing
pip install -r requirements.txt

Build VGGT:

git clone git@github.com:facebookresearch/vggt.git 
cd vggt
pip install -e .

Download depth estimation weights:

cd Video-Depth-Anything
bash get_weights.sh

Download YOLO segmentation weights into weights/yolo11l-seg.pt (used for interactive object selection in the demo).

CogVideoX base model and the fine-tuned transformer (TSXu/MotionForcing_driving) are downloaded automatically from HuggingFace on first run.


Run the Demo

python gradio_demo.py

Open http://localhost:7860. Upload an image, click objects to draw trajectories, then generate.

Acknowledgements

We thank the authors of CogVideoX, Video-Depth-Anything, VGGT, and Ultralytics YOLO for their outstanding open-source contributions.


Citation

@misc{xu2026motion,
      title={Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics}, 
      author={Tianshuo Xu and Zhifei Chen and Leyi Wu and Hao Lu and Ying-cong Chen},
      year={2026},
      eprint={2603.10408},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.10408}, 
}

About

Official implementation of Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages