| Video&Mask | Output |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Create Conda Environment and Install Dependencies
# create new anaconda env
conda create -n undereraser python=3.12 -y
conda activate undereraser
# install python dependencies
pip3 install -r requirements.txtDownload the weights from this link. Put the two files under the folder weight.
We use pretrained Wan2.1-Fun-V1.1-14B-InP as our base model.
You can download the Wan2.1-Fun-14B-InP base model from this link. Put the whole folder under the folder models.
The models will be arranged like this:
models
├── Wan2.1-Fun-V1.1-14B-InP
├── google
├── umt5-xxl
├── spiece.model
...
├── xlm-roberta-large
├── sentencepiece.bpe.model
...
├── config.json
├── configuration.json
├── diffusion_pytorch_model.safetensors
├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
├── models_t5_umt5-xxl-enc-bf16.pth
├── Wan2.1_VAE.pth
We provide some examples in the data folder.
Run the following commands to try it out:
python infer.py You can also prepare and test your own data following the same format.
The test datasets are available at this link, including our constructed Camera-Bench and Scene-Bench.
- Release training code.
If you find our repo useful for your research, please consider citing our paper:
@article{liu2026eraser,
title={From Understanding to Erasing: Towards Complete and Stable Video Object Removal},
author={Liu, Dingming and Wang, Wenjing and Li, Chen and LYU, Jing},
journal={arXiv preprint},
year={2026}
}This code is based on VideoX-Fun and LightX2V. Thanks for their awesome works!

















