Skip to content
This repository was archived by the owner on Nov 19, 2025. It is now read-only.

WIP: Add GRPO support for RL reasoning tasks#510

Draft
SahilJain314 wants to merge 56 commits intoNVIDIA:devfrom
SahilJain314:grpo
Draft

WIP: Add GRPO support for RL reasoning tasks#510
SahilJain314 wants to merge 56 commits intoNVIDIA:devfrom
SahilJain314:grpo

Conversation

@SahilJain314
Copy link
Copy Markdown
Contributor

@SahilJain314 SahilJain314 commented Feb 17, 2025

This PR adds support for Group Relative Policy Optimization (GRPO) in NeMo-Aligner. It also introduces a new vLLM-based inference engine backend to fix the existing sampling errors.

ko3n1g and others added 30 commits January 30, 2025 15:32
Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants