Skip to content

cokeshao/Awesome-Multimodal-Token-Compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

62 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome Multimodal Token Compression

License: MIT PRs Welcome arXiv Last Commit

[arXiv] [HuggingFace] [Database]

A Survey of Token Compression for Efficient Multimodal Large Language Models [arXiv]
Kele Shao*,1,2, Keda Tao*,1,2, Kejia Zhang3, Sicheng Feng2,4, Mu Cai5, Yuzhang Shang6, Haoxuan You7, Can Qin8, Yang Sui9, Huan Wang†,2

1Zhejiang University, 2Westlake University, 3Xiamen University, 4National University of Singapore, 5University of Wisconsin-Madison, 6University of Central Florida, 7Columbia University, 8Salesforce AI Research, 9Rice University

* Equal Contribution. † Corresponding Author (wanghuan@westlake.edu.cn).

If you find our paper or this resource helpful, please consider cite:

@article{
  shao2026a,
  title={A Survey of Token Compression for Efficient Multimodal Large Language Models},
  author={Kele Shao and Keda TAO and Kejia Zhang and Sicheng Feng and Mu Cai and Yuzhang Shang and Haoxuan You and Can Qin and Yang Sui and Huan Wang},
  journal={Transactions on Machine Learning Research},
  year={2026},
  }

Important

We welcome your help in improving the repository and paper. Please feel free to submit a pull request or contact us to:

  • Add a relevant paper not yet included.

  • Suggest a more suitable category.

  • Update the information.

  • Ask for clarification about any content.


πŸ”₯ News

  • [2026.02.22] ⚠️⚠️⚠️ We are very fortunate that our article was reported by ζœΊε™¨δΉ‹ζ˜Ÿ!
  • [2026.02.22] Paper accepted by ICLR 2026 could be checked in here, welcome contributions!
  • [2026.01.27] Paper accepted by EMNLP 2025 and ICLR 2026 could be checked in here.
  • [2026.01.24] Our survey paper has been accepted to TMLR 2026. Congratulations! πŸŽ‰πŸŽ‰πŸŽ‰
  • [2025.10.11] Papers accepted by NeurIPS 2025 about MLLM token compression have been updated here. Congratulations! πŸŽ‰πŸŽ‰πŸŽ‰
  • [2025.08.14] ❗ Added Recent Papers, Papers Published in Recent Conference/Journal, and a database for quick-search.
  • [2025.07.29] The v1 survey is now published! We've also initialized the repository.

🎯 Motivation

Awesome Token Compression

Motivation: Up: Image, video, and audio data types can scale in their representation dimensions, leading to a corresponding increase in the number of tokens. Down: Top-performing MLLMs cannot address real-world demands, as the number of tokens for multimodal information, especially video, vastly exceeds that of text. Therefore, token compression is crucial to address this limitation.

πŸ“š Contents

Please check out all the papers by selecting the sub-area you're interested in. On this main page, only papers released in the past 6 months are shown.


Badge Colors

  • arXiv Badge red for arXiv papers
  • PDF Badge blue for conference/journal papers
  • GitHub Badge white for GitHub repositories
  • Research Areas Badge purple for research areas
  • Categories Badge green for categories
  • Cost Badge yellow for training cost

Recent Papers (Last 6 Months)

Image
Title & Authors Areas Tags Links
Publish Star
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
Hanxun Yu, Wentong Li, Xuan Qu, Song Wang, Junbo Chen, Jianke Zhu
Area Cost Paper
GitHub
Publish Star
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
Mingyu Ouyang, Kevin Qinghong Lin, Mike Zheng Shou, Hwee Tou Ng
Area Area Type
Cost
Paper
GitHub
Model
Dataset
Publish
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
Area Cost Paper
Publish Star
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
Mouxiao Huang, Borui Jiang, Dehua Zheng, Hailin Hu, Kai Han, Xinghao Chen
Area Area Cost Paper
GitHub
Arxiv Star
DeepSeek-OCR: Contexts Optical Compression
Haoran Wei, Yaofeng Sun, Yukun Li
Area Type
Cost
Paper
GitHub
Model
Arxiv Star
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
Jiaying Zhu, Yurui Zhu, Xin Lu, Wenrui Yan, Dong Li, Kunlin Liu, Xueyang Fu, Zheng-Jun Zha
Area Area Cost Paper
GitHub
Model
Arxiv Star
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods
Chenfei Liao, Wensong Wang, Zichen Wen, Xu Zheng, Yiyu Wang, Haocong He, Yuanhuiyi Lyu, Lutao Jiang, Xin Zou, Yuqian Fu, Bin Ren, Linfeng Zhang, Xuming Hu
Area Area Paper
GitHub
Arxiv
Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models
Youngeun Kim, Youjia Zhang, Huiling Liu, Aecheon Jung, Sunwoo Lee, Sungeun Hong
Area Cost Paper
Publish Star
AutoPrune: Each Complexity Deserves a Pruning Policy
Hanshi Wang, Yuhao Xu, Zekun Xu, Jin Gao, Yufan Liu, Weiming Hu, Ke Wang, Zhipeng Zhang
Area Cost Paper
GitHub
Arxiv
HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score
Jingqi Xu, Jingxi Lu, Chenghao Li, Sreetama Sarkar, Peter A. Beerel
Area Type
Cost
Paper
Arxiv
Pyramid Token Pruning for High-Resolution Large Vision-Language Models via Region, Token, and Instruction-Guided Importance
Yuxuan Liang, Xu Li, Xiaolei Chen, Yi Zheng, Haotian Chen, Bin Li, Xiangyang Xue
Area Type Type Type
Cost
Paper
Arxiv
EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression
Jingyu Xiao, Zhongyi Zhang, Yuxuan Wan, Yintong Huo, Yang Liu, Michael R.Lyu
Area Area Type
Cost
Paper
Arxiv
Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge
Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, Sami Muhaidat
Area Type
Cost
Paper
Arxiv Star
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
InternVL Team
Area Area Type
Cost
Paper
GitHub
Model
Publish
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
Pengfei Jiang, Hanjun Li, Linglan Zhao, Fei Chao, Ke Yan, Shouhong Ding, Rongrong Ji
Area Area Type Type
Cost
Paper
Arxiv
Revisiting MLLM Token Technology through the Lens of Classical Visual Coding
Jinming Liu, Junyan Lin, Yuntao Wei, Kele Shao, Keda Tao, Jianguo Huang, Xudong Yang, Zhibo Chen, Huan Wang, Xin Jin
Area Area Paper
Arxiv
EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models
Wenhui Zhu, Xiwen Chen, Zhipeng Wang, Shao Tang, Sayan Ghosh, Xuanzhao Dong, Rajat Koner, Yalin Wang
Area Area Type
Cost
Paper
Arxiv
CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning
Yanshu Li, Jianjiang Yang, Zhennan Shen, Ligong Han, Haoyan Xu, Ruixiang Tang
Area Type Type
Cost
Paper
Arxiv
AdaptInfer: Adaptive Token Pruning for Vision-Language Model Inference with Dynamical Text Guidance
Weichen Zhang, Zhui Zhu, Ningbo Li, Kebin Liu, Yunhao Liu
Area Type
Cost
Paper
Arxiv
Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models
Huanyu Wang, Jushi Kai, Haoli Bai, Lu Hou, Bo Jiang, Ziwei He, Zhouhan Lin
Area Cost Paper
Publish Star
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui, Tai Wang, Dahua Lin, Jiangmiao Pang
Area Area Type
Cost
Paper
GitHub
Arxiv Star
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
Quan-Sheng Zeng, Yunheng Li, Qilong Wang, Peng-Tao Jiang, Zuxuan Wu, Ming-Ming Cheng, Qibin Hou
Area Type
Cost
Paper
GitHub
Model
Publish
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models
Mingyu Fu, Wei Suo, Ji Ma, Lin Yuanbo Wu, Peng Wang, Yanning Zhang
Area Cost Paper
Arxiv
HiPrune: Training-Free Visual Token Pruning via Hierarchical Attention in Vision-Language Models
Jizhihui Liu, Feiyi Du, Guangdao Zhu, Niu Lian, Jun Li, Bin Chen
Area Type
Cost
Paper
Video
Title & Authors Areas Tags Links
Arxiv
EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs
Chao Gong, Depeng Wang, Zhipeng Wei, Ya Guo, Huijia Zhu, Jingjing Chen
Area Area Cost Paper
Publish Star
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng Zhang
Area Area Cost Paper
GitHub
Publish
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
Janghoon Cho, Jungsoo Lee, Munawar Hayat, Kyuwoong Hwang, Fatih Porikli, Sungha Choi
Area Cost Paper
Publish Star
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
Mouxiao Huang, Borui Jiang, Dehua Zheng, Hailin Hu, Kai Han, Xinghao Chen
Area Area Cost Paper
GitHub
Publish Star
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
Xueyi Chen, Keda Tao, Kele Shao, Huan Wang
Area Area Type Type
Cost
Paper
GitHub
Publish
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
Vaggelis Dorovatas, Soroush Seifi, Gunshi Gupta, Rahaf Aljundi
Area Type
Cost
Paper
Arxiv Star
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
Jiaying Zhu, Yurui Zhu, Xin Lu, Wenrui Yan, Dong Li, Kunlin Liu, Xueyang Fu, Zheng-Jun Zha
Area Area Cost Paper
GitHub
Model
Publish
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Peiran Wu, Zhuorui Yu, Yunze Liu, Chi-Hao Wu, Enmin Zhou, Junxiao Shen
Area Cost Paper
Arxiv
PSTTS: A Plug-and-Play Token Selector for Efficient Event-based Spatio-temporal Representation Learning
Xiangmo Zhao, Nan Yang, Yang Wang, Zhanwen Liu
Area Area Type
Cost
Paper
Arxiv
Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning
Wenda Qin, Andrea Burns, Bryan A. Plummer, Margrit Betke
Area Area Type
Cost
Paper
Arxiv
The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning
Titong Jiang, Xuefeng Jiang, Yuan Ma, Xin Wen, Bailin Li, Kun Zhan, Peng Jia, Yahui Liu, Sheng Sun, Xianpeng Lang
Area Area Type
Cost
Paper
Arxiv Star
Focus Through Motion: RGB-Event Collaborative Token Sparsification for Efficient Object Detection
Nan Yang, Yang Wang, Zhanwen Liu, Yuchao Dai, Yang Liu, Xiangmo Zhao
Area Area Cost Paper
GitHub
Publish Star
Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
Xiangchen Wang, Jinrui Zhang, Teng Wang, Haigang Zhang, Feng Zheng
Area Cost Paper
GitHub
Arxiv Star
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
InternVL Team
Area Area Type
Cost
Paper
GitHub
Model
Publish
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
Pengfei Jiang, Hanjun Li, Linglan Zhao, Fei Chao, Ke Yan, Shouhong Ding, Rongrong Ji
Area Area Type Type
Cost
Paper
Publish Star
Language-Guided Temporal Token Pruning for Efficient VideoLLM Processing
Yogesh Kumar
Area Type
Cost
Paper
GitHub
Arxiv Star
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
Yicheng Ji, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li
Area Type Paper
GitHub
Arxiv
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
Yanlai Yang, Zhuokai Zhao, Satya Narayan Shukla, Aashu Singh, Shlok Kumar Mishra, Lizhu Zhang, Mengye Ren
Area Area Type
Cost
Paper
Arxiv
EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models
Wenhui Zhu, Xiwen Chen, Zhipeng Wang, Shao Tang, Sayan Ghosh, Xuanzhao Dong, Rajat Koner, Yalin Wang
Area Area Type
Cost
Paper
Publish Star
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui, Tai Wang, Dahua Lin, Jiangmiao Pang
Area Area Type
Cost
Paper
GitHub
Audio
Title & Authors Areas Tags Links
Arxiv
EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs
Chao Gong, Depeng Wang, Zhipeng Wei, Ya Guo, Huijia Zhu, Jingjing Chen
Area Area Cost Paper
Omni
Title & Authors Areas Tags Links
Publish Star
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian liu, Huan Wang
Area Type
Cost
Paper
GitHub

Published in Recent Conference/Journal

CVPR 2026
Title & Authors Areas Tags Links
Publish Star
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
Mingyu Ouyang, Kevin Qinghong Lin, Mike Zheng Shou, Hwee Tou Ng
Area Area Type
Cost
Paper
GitHub
Model
Dataset
Publish Star
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng Zhang
Area Area Cost Paper
GitHub
Publish Star
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian liu, Huan Wang
Area Type
Cost
Paper
GitHub
Publish Star
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
Xueyi Chen, Keda Tao, Kele Shao, Huan Wang
Area Area Type Type
Cost
Paper
GitHub
ICLR 2026
Title & Authors Areas Tags Links
Publish Star
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
Hanxun Yu, Wentong Li, Xuan Qu, Song Wang, Junbo Chen, Jianke Zhu
Area Cost Paper
GitHub
Publish
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
Area Cost Paper
Publish
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
Janghoon Cho, Jungsoo Lee, Munawar Hayat, Kyuwoong Hwang, Fatih Porikli, Sungha Choi
Area Cost Paper
Publish Star
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
Mouxiao Huang, Borui Jiang, Dehua Zheng, Hailin Hu, Kai Han, Xinghao Chen
Area Area Cost Paper
GitHub
Publish
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Peiran Wu, Zhuorui Yu, Yunze Liu, Chi-Hao Wu, Enmin Zhou, Junxiao Shen
Area Cost Paper
Publish
Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
Area Paper
EMNLP 2025
Title & Authors Areas Tags Links
Publish Star
Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
Xiangchen Wang, Jinrui Zhang, Teng Wang, Haigang Zhang, Feng Zheng
Area Cost Paper
GitHub
NeurIPS 2025
Title & Authors Areas Tags Links
Publish
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
Vaggelis Dorovatas, Soroush Seifi, Gunshi Gupta, Rahaf Aljundi
Area Type
Cost
Paper
Publish Star
AutoPrune: Each Complexity Deserves a Pruning Policy
Hanshi Wang, Yuhao Xu, Zekun Xu, Jin Gao, Yufan Liu, Weiming Hu, Ke Wang, Zhipeng Zhang
Area Cost Paper
GitHub
Publish Star
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Senqiao Yang, Junyi Li, Xin Lai, Bei Yu, Hengshuang Zhao, Jiaya Jia
Area Type
Cost
Paper
GitHub
Model
Dataset
Publish Star
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang, Mengzhen Liu, Lichen Li, Ming Lu, Yuan Zhang, Junwen Pan, Qi She, Shanghang Zhang
Area Area Type
Cost
Paper
GitHub
Publish Star
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
Yunzhu Zhang, Yu Lu, Tianyi Wang, Fengyun Rao, Yi Yang, Linchao Zhu
Area Type Type
Cost
Paper
GitHub
Publish
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
Kaiyuan Li, Xiaoyue Chen, Chen Gao, Yong Li, Xinlei Chen
Area Type Type
Cost
Paper
Publish Star
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang
Area Type Type
Cost
Paper
GitHub
Publish
Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering
Yangfu Li, Hongjian Zhan, Tianyi Chen, Qi Liu, Yue Lu
Area Type
Cost
Paper
Publish Star
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Haichao Zhang, Yun Fu
Area Type
Cost
Paper
GitHub
Model
Publish Star
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen, Guoqiang Gong, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Guiguang Ding
Area Type Type
Cost
Paper
GitHub
ICCV 2025
Title & Authors Areas Tags Links
Publish Star
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui, Tai Wang, Dahua Lin, Jiangmiao Pang
Area Area Type
Cost
Paper
GitHub
Publish Star
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko, Eunseo Kim, Jihyung Kil, Hyunwoo J. Kim
Area Type
Cost
Paper
GitHub
Publish Star
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu, Yaoming Wang, Bowen Shi, Xiaopeng Zhang, Wenrui Dai, Chenglin Li, Hongkai Xiong, Qi Tian
Area Type Type
Cost
Paper
GitHub
Publish Star
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video-LLMs
Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim
Area Type
Cost
Paper
GitHub
Publish
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu, Enxin Song, Wenhao Chai, Xuexiang Wen, Tian Ye, Gaoang Wang
Area Type
Cost
Paper
Publish Star
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng, Ziyuan Huang, Kaixiang Ji, Yichao Yan
Area Type
Cost
Paper
GitHub
Publish
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao, Mingyang Wang, Zhou Yu, Wenwen Pan, Yan Yang, Tao Wei, Hongyuan Zhang, Ning Mao, Wei Chen, Jun Yu
Area Type
Cost
Paper
Publish Star
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu, Longxiang Tang, Bohao Peng, Senqiao Yang, Bei Yu, Jiaya Jia
Area Area Paper
GitHub
Dataset
Publish Star
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang, Zhemeng Yu, Gabriele Spadaro, Chen Ju, Victor QuΓ©tu, Shuai Xiao, Enzo Tartaglione
Area Area Type
Cost
Paper
GitHub
Publish Star
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models
Tianyu Fu, Tengxuan Liu, Qinghao Han, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang
Area Type Type Paper
GitHub
Publish Star
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Han Wang, Yuxiang Nie, Yongjie Ye, Deng GuanYu, Yanjie Wang, Shuai Li, Haiyang Yu, Jinghui Lu, Can Huang
Area Type Type
Cost
Paper
GitHub
Publish Star
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu, Senqiao Yang, Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia
Area Area Area Type
Cost
Paper
GitHub
Model
Dataset
Publish Star
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang
Area Type Type
Cost
Paper
GitHub
Publish Star
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang, Aosong Cheng, Ming Lu, Renrui Zhang, Zhiyong Zhuo, Jiajun Cao, Shaobo Guo, Qi She, Shanghang Zhang
Area Area Type Type
Cost
Paper
GitHub
Publish
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
Area Area Type
Cost
Paper
Publish Star
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen, Hung-Ting Su, Shang-Hong Lai, Winston H. Hsu
Area Type
Cost
Paper
GitHub
Publish Star
LLaVA-PruMerge:Β Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
Area Area Type Type
Cost
Paper
GitHub
ACL 2025
Title & Authors Areas Tags Links
Publish Star
EffiVLM-Bench: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Visual-Languge Models
Zekun Wang, Minghua Ma, Zexin Wang, Rongchuan Mu, Liping Shan, Ming Liu, Bing Qin
Area Area Area Paper
GitHub
Publish Star
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo, Hyeongseop Rha, Se Jin Park, Yong Man Ro
Area Type
Cost
Paper
GitHub
Publish Star
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Zhongwei Wan, Hui Shen, Xin Wang, Che Liu, Zheda Mai, Mi Zhang
Area Area Type Type
Cost
Paper
GitHub
Publish Star
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
Xiaohu Huang, Hao Zhou, Kai Han
Area Type Type
Cost
Paper
GitHub
Publish Star
Prompt Compression for Large Language Models: A Survey
Zongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier
Area Area Paper
GitHub
ICML 2025
Title & Authors Areas Tags Links
Publish Star
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen
Area Area Type
Cost
Paper
GitHub
Publish Star
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang, Songxiang Liu, Haohan Guo, Jiankun Zhao, Yuanyuan Wang, Helin Wang, Zeqian Ju, Xubo Liu, Xueyuan Chen, Xu Tan, Xixin Wu, Helen Meng
Area Type
Cost
Paper
GitHub
Publish Star
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Chuanqi Cheng, Jian Guan, Wei Wu, Rui Yan
Area Type Type
Cost
Paper
GitHub
Model
Publish Star
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen, Chenchen Zhu, Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu, Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, Mohamed Elhoseiny, Vikas Chandra
Area Type Type
Cost
Paper
GitHub
Model
Publish Star
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang
Area Area Type Type
Cost
Paper
GitHub
ACM MM 2025
Title & Authors Areas Tags Links
Publish
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
Pengfei Jiang, Hanjun Li, Linglan Zhao, Fei Chao, Ke Yan, Shouhong Ding, Rongrong Ji
Area Area Type Type
Cost
Paper
Publish
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models
Mingyu Fu, Wei Suo, Ji Ma, Lin Yuanbo Wu, Peng Wang, Yanning Zhang
Area Cost Paper
Publish Star
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao, Yicheng Li, Yuancheng Wei, Lei Li, Shuhuai Ren, Yuanxin Liu, Kun Ouyang, Lean Wang, Shicheng Li, Sida Li, Lingpeng Kong, Qi Liu, Yuanxing Zhang, Xu Sun
Area Type
Cost
Paper
GitHub
Model
Dataset

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

This repository is inspired by Awesome-Efficient-Reasoning-Models, Awesome-Efficient-LLM, Awesome-Context-Engineering

πŸ§‘β€πŸ’» Contributors

πŸ‘ Thanks to these contributors for this excellent work!

βœ‰οΈ Contact

For questions, suggestions, or collaboration opportunities, please feel free to reach out:

βœ‰οΈ Email: shaokele@gmail.com / KD.TAO.CT@outlook.com

✨ Star History

Star History Chart

⬆ Back to top