From 374253ef8e4b98d9ff20655a802bf821137bd16f Mon Sep 17 00:00:00 2001 From: Yi-Fu Wu Date: Thu, 6 Mar 2025 12:05:07 -0800 Subject: [PATCH] Add DeepSeek entry in Changelog and bump Nemo commit Signed-off-by: Yi-Fu Wu --- CHANGELOG.md | 7 +++++++ Dockerfile | 2 +- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index adc9f161f..f516434b1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -62,6 +62,13 @@ for more details on running `prepare_packed_ft_dataset.py` and on running SFT wi ``` - Add code and instructions for replicating Reward Modeling training in HelpSteer2 and HelpSteer2-Preference - Implement REINFORCE algorithm. +- Added support for DeepSeek-V3. Training from a DeepSeek-v3 NeMo 2.0 checkpoint requires adding these additional parameters to the training script: + ``` + ++model.transformer_engine=True \ + ++model.dist_ckpt_load_strictness=log_all \ + ++model.name=decoder_block_gpt \ + ++model.moe_layer_freq=[0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] + ``` ### Breaking Changes - Upgrade TRTLLM dependency from v0.10.0 to v0.12.0 and migrate from `GPTSession` cpp runtime to `ModelRunner` python runtime. Please use the latest Dockerfile. diff --git a/Dockerfile b/Dockerfile index 40ca71d96..dce88d31c 100644 --- a/Dockerfile +++ b/Dockerfile @@ -13,7 +13,7 @@ ARG MAX_JOBS=8 # Git refs for dependencies ARG TE_TAG=7d576ed25266a17a7b651f2c12e8498f67e0baea ARG PYTRITON_VERSION=0.5.10 -ARG NEMO_TAG=633cb602777bffefbe12066b0c915c87e7b469e9 # On: v2.1.0 +ARG NEMO_TAG=6a07e88e1ecabcc10b05f73ae8bdd102fb734f0d # On: main ARG MLM_TAG=d15cec53beb283e7127b7d594e1c46b8a0719b6d # On: core_r0.10.0 ARG ALIGNER_COMMIT=main ARG TRTLLM_VERSION=v0.13.0