This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
benchmark representation-learning image-retrieval embedding vlm multimodal rag video-retrieval contrastive-learning mmeb visual-document-retrieval
-
Updated
Mar 22, 2026 - Python