#

step-level-evaluation

Here is 1 public repository matching this topic...

waybarrios / crystal-benchmark

CRYSTAL: Beyond Final Answers: Benchmark for Transparent Multimodal Reasoning Evaluation | arXiv 2603.13099

benchmark reinforcement-learning computer-vision deep-learning evaluation vqa dartmouth reasoning multimodal chain-of-thought mllm llm-evaluation vision-language-models multimodal-reasoning grpo process-reward step-level-evaluation

Updated Mar 18, 2026

Improve this page

Add a description, image, and links to the step-level-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the step-level-evaluation topic, visit your repo's landing page and select "manage topics."