Diversity-Incentivized Exploration for Versatile Reasoning

Zican Hu^12*, Shilin Zhang^12*, Yafu Li^2†✉, Jianhao Yan⁴², Xuyang Hu², Leyang Cui⁴, Xiaoye Qu², Chunlin Chen¹, Yu Cheng^3✉, Zhi Wang^12✉

¹Nanjing University ²Shanghai AI Laboratory ³The Chinese University of Hong Kong ⁴Westlake University

^*Equal contributions. Zican Hu and Shilin Zhang are listed alphabetically by last name. ^†Project lead. ^✉ Corresponding authors.

Contact: zicanhu@smail.nju.edu.cn, shilinzhang@smail.nju.edu.cn, yafuly@gmail.com, chengyu@cse.cuhk.edu.hk, zhiwang@nju.edu.cn

⭐Overview

📖Introduction

We first conduct a primary empirical study to reveal a strong positive correlation between global diversity and reasoning capacity, and propose DIVER, an innovative framework that highlights the pivotal role of global sequence-level diversity to incentivize deep exploration for versatile reasoning. Building on this insight, we introduce global diversity incentives as an intrinsic reward to promote deep exploration in a semantically structured space. Incorporating the intrinsic reward, we develop a potential-based reward shaping mechanism to preserve optimal policy invariance and design simple heuristics to mitigate possible reward hacking.

Key Highlights:

The sequence-level vs. token-level Diversity on RLVR
Metrics for Quantifying sequence-Level Diversity
Promoting Global Diversity for Deep Exploration
Mitigating Reward Hacking

🚀Usage

Installation

conda create -n diver python=3.10 -y
conda activate diver
pip install -r requirements.txt

Preparation

cd dataset
huggingface-cli download --resume-download huzican/DIVER-Training-Openr1-Math-46k --local-dir openr1
huggingface-cli download --resume-download huzican/DIVER-Test --local-dir valid.all

cd model
huggingface-cli download --resume-download huzican/Qwen2.5-Math-7B-16k-think --local-dir Qwen2.5-Math-7B-16k-think

Training

  export MODEL_PATH="Qwen2.5-Math-7B-16k-think"
  bash scripts/train/train_diver.sh --model $MODEL_PATH

Evaluation

  export CHECKPOINT_PATH="checkpoints/diver"
  bash scripts/eval/eval_checkpoint.sh --model $CHECKPOINTS_PATH

📊Main Results

Zero RLVR on DIVER vs. Baselines based on Qwen2.5-Math-7B

Comparison of different Pass@k performance

✨Acknowledgement

DIVER builds upon veRL and deepscaler, and utilizes vLLM for inference. We utilize Math-Verify for RLVR reward model. We thank the open-source community for datasets and backbones, including OpenR1-Math-220k, OpenR1-Math-46k, Qwen-2.5 and Llama-3.1 model.

📝Citation

If you find our paper useful, please consider to star this repository and cite it:

@article{hu2025diversity,
  title={Diversity-Incentivized Exploration for Versatile Reasoning},
  author={Zican Hu and Shilin Zhang and Yafu Li and Jianhao Yan and Xuyang Hu and Leyang Cui and Xiaoye Qu and Chunlin Chen and Yu Cheng and Zhi Wang},
  journal={arXiv preprint arXiv:2509.26209},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
deepscaler		deepscaler
figures		figures
scripts		scripts
verl		verl
.DS_Store		.DS_Store
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diversity-Incentivized Exploration for Versatile Reasoning

⭐Overview

📖Introduction

Key Highlights:

🚀Usage

Installation

Preparation

Training

Evaluation

📊Main Results

Zero RLVR on DIVER vs. Baselines based on Qwen2.5-Math-7B

Comparison of different Pass@k performance

✨Acknowledgement

📝Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Diversity-Incentivized Exploration for Versatile Reasoning

⭐Overview

📖Introduction

Key Highlights:

🚀Usage

Installation

Preparation

Training

Evaluation

📊Main Results

Zero RLVR on DIVER vs. Baselines based on Qwen2.5-Math-7B

Comparison of different Pass@k performance

✨Acknowledgement

📝Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages