Skip to content

gaash-lab/QTrack

Repository files navigation

This is the official PyTorch implementation of QTrack:

"QTrack: Query-Driven Reasoning for Multi-modal MOT" by Tajamul Ashraf, Tavaheed Tariq, Sonia Yadav, Abrar Ul Riyaz, Wasif Tak, Moloud Abdar, and Janibul Bashir.


NEWS

  • [03/25/2026] 💥 QTrack achieves new state-of-the-art on RMOT26 benchmark with 0.30 MCP and 0.75 MOTP! Check out our project page for demos.
  • [03/18/2026] We released the RMOT26 benchmark and QTrack codebase. See more details in our arXiv paper!
  • [03/10/2026] Dataset and model checkpoints are now publicly available.

image

QTrack performs query-driven multi-object tracking based on natural language instructions, tracking only the specified targets while maintaining temporal coherence.

🎯 What is QTrack?

Multi-object tracking (MOT) has traditionally focused on estimating trajectories of all objects in a video, without selectively reasoning about user-specified targets under semantic instructions. In this work, we introduce a query-driven tracking paradigm that formulates tracking as a spatiotemporal reasoning problem conditioned on natural language queries. Given a reference frame, a video sequence, and a textual query, the goal is to localize and track only the target(s) specified in the query while maintaining temporal coherence and identity consistency.

Key Contributions:

  • RMOT26 Benchmark: A large-scale benchmark with grounded queries and sequence-level splits to prevent identity leakage and enable robust generalization evaluation
  • QTrack Model: An end-to-end vision-language model that integrates multi-modal reasoning with tracking-oriented localization
  • Temporal Perception-Aware Policy Optimization (TPA-PO): A structured reward strategy to encourage motion-aware reasoning

🔥 Check out our project website for more overview and demos!


📊 Benchmark Results

QTrack achieves state-of-the-art performance on the RMOT26 benchmark, significantly outperforming both open-source and closed-source models.

Main Results on RMOT26

Model Params MCP↑ MOTP↑ CLE (px)↓ NDE↓
GPT-5.2 - 0.25 0.61 94.2 0.55
Qwen3-VL-Instruct 8B 0.25 0.64 96.0 0.97
Gemma 3 27B 0.24 0.56 58.4 0.88
Gemma 3 12B 0.18 0.73 172.9 0.95
VisionReasoner 7B 0.23 0.24 428.9 2.24
Qwen2.5-VL-Instruct 7B 0.24 0.48 289.2 2.07
InternVL 8B 0.21 0.66 117.44 0.64
gpt-4o-mini - 0.20 0.57 130.48 0.67
QTrack (Ours) 3B 0.30 0.75 44.61 0.39

Comparison with Traditional MOT Methods

MOT17 Dataset DanceTrack Dataset
Model MOTA MOTP HOTA MCP MOTA MOTP HOTA MCP
MOTR 0.61 0.81 0.22 0.44 0.42 0.70 0.35 0.51
BoostTrack++ 0.63 0.76 0.38 0.44 - - - -
MOTRv2 - - - - 0.49 0.73 0.37 0.52
TrackTrack 0.75 0.50 0.23 0.29 0.36 0.73 0.40 0.55
VisionReasoner 0.64 0.86 0.60 0.21 0.59 0.85 0.61 0.26
QTrack (Ours) 0.69 0.87 0.69 0.26 0.63 0.83 0.66 0.35

Fine-tuned VLLM Comparison

Model Params MCP↑ MOTP↑ MOTA↑ NDE↓
VisionReasoner 3B 0.22 0.65 0.01 0.76
Gemma3 4B 0.18 0.73 -0.16 0.95
Qwen2.5-VL 3B 0.14 0.76 -0.51 3.41
QTrack (Ours) 3B 0.30 0.75 0.21 0.39

🔧 Installation

Requirements

  • Python ≥ 3.12
  • PyTorch ≥ 2.6
  • CUDA ≥ 12.1
  • Transformers ≥ 4.51.3

Setup Environment

# Create conda environment
conda create -n qtrack python=3.12
conda activate qtrack

# Install PyTorch
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

# Install QTrack and dependencies
git clone https://github.com/gaash-lab/QTrack.git
cd QTrack
pip install -r requirements.txt
pip install -e .

Training

Training details and scripts will be provided soon.


Evaluation

Evaluation settings and benchmarks will be released soon.


Citation

If you find QTrack useful for your research, please cite:

@misc{ashraf2026qtrackquerydrivenreasoningmultimodal,
      title={QTrack: Query-Driven Reasoning for Multi-modal MOT}, 
      author={Tajamul Ashraf and Tavaheed Tariq and Sonia Yadav and Abrar Ul Riyaz and Wasif Tak and Moloud Abdar and Janibul Bashir},
      year={2026},
      eprint={2603.13759},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.13759}, 
}

Releases

No releases published

Packages

 
 
 

Contributors