Initial version porting the eden configs over to the new evo2 recipe#1502
Open
Initial version porting the eden configs over to the new evo2 recipe#1502
Conversation
Signed-off-by: John St. John <jstjohn@nvidia.com>
Contributor
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: John St. John <jstjohn@nvidia.com>
…tions Signed-off-by: John St. John <jstjohn@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds Eden (Llama 3.1) model support, Savanna/Vortex checkpoint converters, and a standardized model naming convention to the Megatron Bridge–based Evo2 recipe (
bionemo-recipes/recipes/evo2_megatron/).Eden (Llama 3.1) model support
eden_provider.pydefiningEdenModelProviderand size-specific subclasses (eden_7bthrougheden_35b) that inherit fromLlama31ModelProvider.train.pynow dispatches togpt_forward_stepfor Eden models and automatically disablesfp32_residual_connection(incompatible with standard TELayerNormLinearlayers — Hyena handles this via manual dtype casting, but GPT/Llama does not).infer.pynow initializesProcessGroupCollectionfor non-Hyena providers (required byGPTModelProvider.provide()) and usesStaticInferenceContextinstead ofHyenaInferenceContextfor Eden models. Theflash_decodeattribute is guarded to Hyena-only.predict.pyalready worked architecture-agnostically via dynamic model loading; no changes required.Checkpoint converters
savanna_to_mbridge.py— converts ARC Savanna.ptcheckpoints (local or downloaded from Hugging Face viahf_hub_download) into MBridge distributed checkpoint format.mbridge_to_vortex.py— exports MBridge checkpoints to ARC's single-file Vortex inference format, handling MLP weight splitting, Hyena filter pole/residue computation, and TE layernorm key remapping.evo2_convert_savanna_to_mbridge,evo2_export_mbridge_to_vortex).Model naming convention
The previous model size keys (
1b,7b,40b,7b_arc_longcontext, …) were ambiguous —7breferred to Striped Hyena while7Breferred to Llama. This PR replaces them with explicit, architecture-prefixed keys:evo2_*for models matching public ARC checkpoints (e.g.evo2_1b_base,evo2_7b,evo2_40b_base)._base= 8K context, without it = 1M context.striped_hyena_*_nvfor NVIDIA-modified Hyena variants.eden_*for Llama 3.1 variants.evo2_20bconfig based onarcinstitute/savanna_evo2_20b.Documentation updates
README.md— added model naming convention tables, Vortex export section with round-trip example, updated all CLI examples to new model keys.checkpoint/README.md— updated--model-sizedocumentation.zeroshot_brca1.ipynb,fine-tuning-tutorial.ipynb) — updatedMODEL_SIZEand--model-sizereferences.Usage
Training an Eden model:
Converting Savanna checkpoint to MBridge:
Exporting MBridge to Vortex:
Type of changes
CI Pipeline Configuration
Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.
Unit tests marked as
@pytest.mark.multi_gpuor@pytest.mark.distributedare not run in the PR pipeline.For more details, see CONTRIBUTING
Note
By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.
Authorizing CI Runs
We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
/ok to testcomment on the pull request to trigger CI. This will need to be done for each new commit.Triggering Code Rabbit AI Review
To trigger a code review from code rabbit, comment on a pull request with one of these commands:
See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.
Pre-submit Checklist