NVIDIA · yfw · Mar 6, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -62,6 +62,13 @@ for more details on running `prepare_packed_ft_dataset.py` and on running SFT wi
     ```
 - Add code and instructions for replicating Reward Modeling training in HelpSteer2 and HelpSteer2-Preference
 - Implement REINFORCE algorithm.
+- Added support for DeepSeek-V3. Training from a DeepSeek-v3 NeMo 2.0 checkpoint requires adding these additional parameters to the training script:
+    ```
+    ++model.transformer_engine=True \
+    ++model.dist_ckpt_load_strictness=log_all \
+    ++model.name=decoder_block_gpt \
+    ++model.moe_layer_freq=[0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
+    ```
 
 ### Breaking Changes
 - Upgrade TRTLLM dependency from v0.10.0 to v0.12.0 and migrate from `GPTSession` cpp runtime to `ModelRunner` python runtime. Please use the latest Dockerfile.

diff --git a/Dockerfile b/Dockerfile
@@ -13,7 +13,7 @@ ARG MAX_JOBS=8
 # Git refs for dependencies
 ARG TE_TAG=7d576ed25266a17a7b651f2c12e8498f67e0baea
 ARG PYTRITON_VERSION=0.5.10
-ARG NEMO_TAG=633cb602777bffefbe12066b0c915c87e7b469e9 # On: v2.1.0
+ARG NEMO_TAG=6a07e88e1ecabcc10b05f73ae8bdd102fb734f0d # On: main
 ARG MLM_TAG=d15cec53beb283e7127b7d594e1c46b8a0719b6d  # On: core_r0.10.0
 ARG ALIGNER_COMMIT=main
 ARG TRTLLM_VERSION=v0.13.0