HiDrop: Hierarchical Vision Token Reduction in MLLMs
via Late Injection, Concave Pyramid Pruning, and Early Exit
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
Hao Wu*,1,2, Yingqi Fan*,1, Jinyang Dai3, Junlong Tong1,2,4, Yunpu Ma5, Xiaoyu Shen†,1,2
1Institute of Digital Twin, Eastern Institute of Technology, Ningbo
2Ningbo Key Laboratory of Spatial Intelligence and Digital Derivative
3University of Science and Technology of China 4Shanghai Jiao Tong University
5Munich Center of Machine Learning, LMU Munich
* Equal Contribution, † Corresponding Author.
If you find this work useful for your research and applications, please consider citing:
@misc{wu2026hidrophierarchicalvisiontoken,
title={HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit},
author={Hao Wu and Yingqi Fan and Jinyang Dai and Junlong Tong and Yunpu Ma and Xiaoyu Shen},
year={2026},
eprint={2602.23699},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.23699},
}- [TODO] Code, checkpoints, and documentation are being prepared and will be released soon.
- [2026.02.27] The preprint is now published!
- News: Latest updates, news, and announcements.
- Highlights: Core insights and key features highlighted in this work.
- License: License information for this repository.
- Acknowledgments: Credits to projects and contributors that inspired or supported this work.
- Contact: Contact information for questions, feedback, or collaboration.
- Related Projects: Research projects from our group (EIT-NLP) related to MLLM compression.
This project is released under the Apache 2.0 license.
- Thanks for the LLaVA, FastV, and PyramidDrop library, which helps us to quickly implement our ideas.
For questions, suggestions, or collaboration opportunities, please feel free to reach out:
- Hao Wu: haowu.ai.research@gmail.com
- Xiaoyu Shen: xyshen@eitech.edu.cn
- Survey
- Vision Encoder
- ImageLLM