WIP: Add GRPO support for RL reasoning tasks by SahilJain314 · Pull Request #510 · NVIDIA/NeMo-Aligner

SahilJain314 · 2025-02-17T17:28:57Z

This PR adds support for Group Relative Policy Optimization (GRPO) in NeMo-Aligner. It also introduces a new vLLM-based inference engine backend to fix the existing sampling errors.

Signed-off-by: oliver könig <okoenig@nvidia.com>

…xperimental/grpo

…rrect samplers

…_sampling

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

ko3n1g and others added 30 commits January 30, 2025 15:32

chore: Bump to 0.6.0rc1 (NVIDIA#456)

4ce6ab2

Signed-off-by: oliver könig <okoenig@nvidia.com>

chore: Bump to 0.6.0rc2 (NVIDIA#470)

7af4690

chore: Update package_info.py (NVIDIA#472)

ce2b6df

chore: Update package_info.py (NVIDIA#473)

957340c

docs: Update CHANGELOG.md

02f4808

Basic skeleton of upstreamed GRPO with envrionments

fdef021

Added dataset utils and improved rollout generation

b9ec7b4

Fleshed out full loop, environments, and datasets

5b298b4

Updated configs

caff357

Config and bug fixes

bd77b5c

Fixed naming bugs. Runs e2e without crashing

75af5f8

Fixed checkpoint loading bug

a8be3b1

Added some documentation

6a1435d

Added better documentation and moved all GRPO experimental files to e…

d232e1a

…xperimental/grpo

feat: Add inference backend infrastructure to grpo

2a4a0f4

Add missing copyright

d1badce

Added trt logprob checking against nemo

05fbb4e

Added TP llama conversion to hf and checkpointing to ramdisk

a104974

Functional vllm support

4d9747e

fix: Fix checkpoint file read failure

54d8f20

Parth's /dev/shm checkpoint fix

9b3d202

Replaced NCCL barrier with GLOO when training is supposed to be asleep

72a1674

Added missing copyright statements

3a94b8b

Added refitting and job-specific checkpint dirs

0db11d2

Changed sleep level to 1 for vllm. Seems to fix logprob issues

7f26dcb

functional (slow) 70b w refit and reshard

5a29458

Bugfiugfix for PP reshard

f4ff8fc

Bugfix for PP reshard 2

dfd4a31

parameter offloading

2640107

Fix sleep hang

496b620

SahilJain314 added 8 commits February 12, 2025 13:27

Fix refit reshard context

553bf53

Added CPU OOM protection

b04cf7c

Enabled faster refitting via shared memory

39170e1

Fixed dataloader in the resharded case

f99ebbf

Un-hardcoded sampling params. Specified by NeMo now for vLLM

ddafa64

Bugfix

2f37034

rollback greedy generation -- breaks vllm

558ec67

PP reshard bugfix

b264bb6

github-actions bot added Utils Servers labels Feb 17, 2025

SahilJain314 and others added 18 commits February 18, 2025 04:56

Fixed bug with multiple environments

44cc49d

Add trt-llm pytorch backend to grpo branch

86c0e4e

Use load_shared_memory_state_dict api

97750d0

Added NeMo diff

2c649f7

Minor fixes to remove dead code

f634f9e

Added prototype (default off) importance sampling correction for inco…

276529b

…rrect samplers

Added importance flag to config

534d5a3

Add trt-llm pytorch backend to grpo branch

9aadd30

Use load_shared_memory_state_dict api

5427ece

Minor fixes to remove dead code

8251152

Skip optimizer step in case grad_norm is nan

fd1803d

Merge remote-tracking branch 'gh/grpo_importance' into add_importance…

4fdd037

…_sampling

Fix importance smapling computation and mask nan to 0.0

0450160

cp support

cdbad6e

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

fix rebase

9ffcf7a

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

cleanup

6195a46

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Update grpo_math_llama_8b.sh

2749bf3

Update README.md

fe3314e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add GRPO support for RL reasoning tasks#510

WIP: Add GRPO support for RL reasoning tasks#510
SahilJain314 wants to merge 56 commits intoNVIDIA:devfrom
SahilJain314:grpo

SahilJain314 commented Feb 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

SahilJain314 commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SahilJain314 commented Feb 17, 2025 •

edited

Loading