process-reward

Here are 2 public repositories matching this topic...

liuxiaotong / knowlyr-gym

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

python cli reinforcement-learning mcp gymnasium trajectory sft dpo llm ai-data-pipeline process-reward knowlyr

Updated Mar 15, 2026
Python

waybarrios / crystal-benchmark

Star

CRYSTAL: Beyond Final Answers: Benchmark for Transparent Multimodal Reasoning Evaluation | arXiv 2603.13099

benchmark reinforcement-learning computer-vision deep-learning evaluation vqa dartmouth reasoning multimodal chain-of-thought mllm llm-evaluation vision-language-models multimodal-reasoning grpo process-reward step-level-evaluation

Updated Mar 18, 2026

Improve this page

Add a description, image, and links to the process-reward topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the process-reward topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly