ER-Depth: Enhancing the Robustness of Self-Supervised Monocular Depth Estimation in Challenging Scenes
Ziyang Song*,
Ruijie Zhu*,
Chuxin Wang,
Jiacheng Deng,
Jianfeng He,
Tianzhu Zhang,
*Equal Contribution.
University of Science and Technology of China
TOMM 2025
The two-stage training framework of ER-Depth. In the first stage, we train DepthNet and PoseNet with the perturbation-invariant depth consistency loss. In the second stage, we leverage the teacher network to generate pseudo labels and construct a distillation loss to train the student network. Notably, we propose a depth consistency-based filter (DC-Filter) and a geometric consistency-based filter (GC-Filter) to filter out unreliable pseudo labels.
- 16 Dec. 2023: The code is now available.
- 28 Nov. 2023: The project website was released.
- 12 Oct. 2023: ER-Depth released on arXiv.
Please refer to dataset_prepare.md for dataset preparation and get_started.md for installation.
We provide example bash commands to run training or testing. Please modify these files according to your own configuration before running.
First stage training:
bash train_first_stage.sh train first_stage_model 2 4 Second stage training:
bash train_second_stage.sh train second_stage_model 2 4 Evaluate the model on KITTI dataset:
bash evaluate_kitti.shEvaluate the model on KITTI-C dataset:
bash evaluate_kittic.shWe provide the official weights of ER-Depth (the first stage model) and ER-Depth* (the second stage model) on Google Drive. Their experimental results on KITTI and KITTI-C are as below.
| Methods | AbsRel | SqRel | RMSE | RMSE log | a1 | a2 | a3 |
|---|---|---|---|---|---|---|---|
| ER-Depth | 0.100 | 0.708 | 4.367 | 0.175 | 0.896 | 0.966 | 0.984 |
| ER-Depth* | 0.100 | 0.689 | 4.315 | 0.173 | 0.896 | 0.967 | 0.985 |
| Methods | AbsRel | SqRel | RMSE | RMSE log | a1 | a2 | a3 |
|---|---|---|---|---|---|---|---|
| ER-Depth | 0.115 | 0.841 | 4.749 | 0.189 | 0.869 | 0.958 | 0.982 |
| ER-Depth* | 0.111 | 0.807 | 4.651 | 0.185 | 0.874 | 0.960 | 0.983 |
If you find our work useful in your research, please consider citing:
@article{song2025er,
title={Er-depth: Enhancing the robustness of self-supervised monocular depth estimation in challenging scenes},
author={Song, Ziyang and Zhu, Ruijie and Wang, Jing and Wang, Chuxin and He, Jianfeng and Deng, Jiacheng and Yang, Wenfei and Zhang, Tianzhu},
journal={ACM Transactions on Multimedia Computing, Communications and Applications},
volume={21},
number={12},
pages={1--23},
year={2025},
publisher={ACM New York, NY}
}
The code is based on MonoDepth2, MonoViT, and RoboDepth.
