Initial version porting the eden configs over to the new evo2 recipe by jstjohn · Pull Request #1502 · NVIDIA/bionemo-framework

jstjohn · 2026-03-09T18:55:21Z

Description

This PR adds Eden (Llama 3.1) model support, Savanna/Vortex checkpoint converters, and a standardized model naming convention to the Megatron Bridge–based Evo2 recipe (bionemo-recipes/recipes/evo2_megatron/).

Eden (Llama 3.1) model support

New eden_provider.py defining EdenModelProvider and size-specific subclasses (eden_7b through eden_35b) that inherit from Llama31ModelProvider.
train.py now dispatches to gpt_forward_step for Eden models and automatically disables fp32_residual_connection (incompatible with standard TE LayerNormLinear layers — Hyena handles this via manual dtype casting, but GPT/Llama does not).
infer.py now initializes ProcessGroupCollection for non-Hyena providers (required by GPTModelProvider.provide()) and uses StaticInferenceContext instead of HyenaInferenceContext for Eden models. The flash_decode attribute is guarded to Hyena-only.
predict.py already worked architecture-agnostically via dynamic model loading; no changes required.

Checkpoint converters

savanna_to_mbridge.py — converts ARC Savanna .pt checkpoints (local or downloaded from Hugging Face via hf_hub_download) into MBridge distributed checkpoint format.
mbridge_to_vortex.py — exports MBridge checkpoints to ARC's single-file Vortex inference format, handling MLP weight splitting, Hyena filter pole/residue computation, and TE layernorm key remapping.
Both are registered as console scripts (evo2_convert_savanna_to_mbridge, evo2_export_mbridge_to_vortex).

Model naming convention

The previous model size keys (1b, 7b, 40b, 7b_arc_longcontext, …) were ambiguous — 7b referred to Striped Hyena while 7B referred to Llama. This PR replaces them with explicit, architecture-prefixed keys:

evo2_* for models matching public ARC checkpoints (e.g. evo2_1b_base, evo2_7b, evo2_40b_base). _base = 8K context, without it = 1M context.
striped_hyena_*_nv for NVIDIA-modified Hyena variants.
eden_* for Llama 3.1 variants.
Added evo2_20b config based on arcinstitute/savanna_evo2_20b.

Documentation updates

README.md — added model naming convention tables, Vortex export section with round-trip example, updated all CLI examples to new model keys.
checkpoint/README.md — updated --model-size documentation.
Both Jupyter notebooks (zeroshot_brca1.ipynb, fine-tuning-tutorial.ipynb) — updated MODEL_SIZE and --model-size references.

Usage

Training an Eden model:

torchrun --nproc-per-node 1 --no-python train_evo2 \
  --model-size eden_7b --num-layers 2 --max-steps 5 \
  --mock-data --seq-length 64 --mixed-precision-recipe bf16_mixed \
  --no-activation-checkpointing

Converting Savanna checkpoint to MBridge:

evo2_convert_savanna_to_mbridge \
  --savanna-ckpt-path arcinstitute/savanna_evo2_1b_base \
  --mbridge-ckpt-dir /tmp/mbridge_1b \
  --model-size evo2_1b_base \
  --tokenizer-path tokenizers/nucleotide_fast_tokenizer_256

Exporting MBridge to Vortex:

evo2_export_mbridge_to_vortex \
  --mbridge-ckpt-dir /tmp/mbridge_1b/iter_0000001 \
  --output-path /tmp/evo2_1b_vortex.pt \
  --model-size evo2_1b_base

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

ciflow:skip - Skip all CI tests for this PR
ciflow:notebooks - Run Jupyter notebooks execution tests for bionemo2
ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2
ciflow:all - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2.
ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

@coderabbitai review - Triggers a standard review
@coderabbitai full review - Triggers a comprehensive review

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

Signed-off-by: John St. John <jstjohn@nvidia.com>

coderabbitai · 2026-03-09T18:55:39Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f788ed66-1d69-4368-887a-18890c069374

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jstjohn/evo2_llama_configs_and_savanna_convert

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: John St. John <jstjohn@nvidia.com>

…tions Signed-off-by: John St. John <jstjohn@nvidia.com>

Initial version porting the eden configs over to the new evo2 recipe

0a1c3ce

Signed-off-by: John St. John <jstjohn@nvidia.com>

jstjohn requested review from cspades, dorotat-nv, jomitchellnv, jwilber, pstjohn, savitha-eng and trvachov as code owners March 9, 2026 18:55

jstjohn added 2 commits March 9, 2026 20:45

Cleanup the environment between tests

ea7fdba

Signed-off-by: John St. John <jstjohn@nvidia.com>

Work on test failures and making sure we are following mbridge conven…

7cd8ede

…tions Signed-off-by: John St. John <jstjohn@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial version porting the eden configs over to the new evo2 recipe#1502

Initial version porting the eden configs over to the new evo2 recipe#1502
jstjohn wants to merge 3 commits intomainfrom
jstjohn/evo2_llama_configs_and_savanna_convert

jstjohn commented Mar 9, 2026

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jstjohn commented Mar 9, 2026

Description

Usage

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Triggering Code Rabbit AI Review

Pre-submit Checklist

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 9, 2026 •

edited

Loading