Xinyan Wang1, Xiaogeng Liu2, Chaowei Xiao2
1University of Wisconsin-Madison 2Johns Hopkins University
ROM is a lightweight streaming framework that detects and mitigates overthinking in Large Reasoning Models (LRMs) at the token level. It attaches a small detection head (< 0.01% additional parameters) to a frozen LLM backbone and monitors generation in real time, intervening when redundant reasoning is detected.
ROM/
├── rom/ # Core package
│ ├── models.py # StreamingHead, Qwen3WithHead
│ ├── dataset.py # Dataset loading & embedding cache
│ ├── train.py # Training pipeline
│ ├── eval.py # Offline evaluation (vLLM)
│ ├── env.py # Environment setup
│ └── utils/
│ ├── math.py # Answer extraction & correctness checking
│ └── eval_helpers.py # Metrics, probability computation
├── configs/
│ ├── train.yaml # Training defaults
│ └── eval.yaml # Evaluation defaults
├── requirements.txt
├── LICENSE
└── README.md
pip install -r requirements.txtRequires Python 3.11+, PyTorch >= 2.9.0, and a CUDA-capable GPU.
Training data is hosted on HuggingFace: xinyan-wang/ROM.
Download and place under data/:
# Using huggingface-cli
huggingface-cli download xinyan-wang/ROM --repo-type dataset --local-dir dataAll parameters are in configs/train.yaml. Run with defaults:
python -m rom.trainOverride via CLI:
python -m rom.train --lr 1e-4 --num_train_epochs 30W&B logging is enabled by default. Disable with --no_wandb.
python -m rom.evalOverride as needed:
python -m rom.eval --ckpt_path checkpoints/my_model.pt --test_data data/test_data/math500.jsonlIf you find ROM useful, please cite our paper 📝 and give us a ⭐!
@article{wang2025rom,
title={ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention},
author={Wang, Xinyan and Liu, Xiaogeng and Xiao, Chaowei},
journal={arXiv preprint arXiv:2603.22016},
year={2025}
}This project is licensed under the MIT License.
