[NVBug: 6038899] Fix MoE export crash on meta tensors with CPU offload#1155
[NVBug: 6038899] Fix MoE export crash on meta tensors with CPU offload#1155
Conversation
…nsors Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1155 +/- ##
==========================================
+ Coverage 70.20% 70.21% +0.01%
==========================================
Files 230 230
Lines 26098 26098
==========================================
+ Hits 18322 18325 +3
+ Misses 7776 7773 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Fixes
NotImplementedErrorinsync_moe_gate_up_amaxwhen quantizing MoE models (e.g. Qwen3-30B-A3B) on a single GPU with insufficient VRAM.When GPU memory is insufficient, ModelOpt enables CPU offload via accelerate, leaving uncalibrated expert parameters on the
metadevice. During export,sync_moe_gate_up_amaxcallstorch.equal()on these meta tensors, which raisesNotImplementedErrorbecauseaten::equaldoes not support meta tensors — even though calibration itself completed successfully.Changes
sync_moe_gate_up_amaxto skip amax sync for meta tensors (which have no real data to sync) and emit a warning explaining the root cause.Bug: https://nvbugspro.nvidia.com/bug/6038899
🤖 Generated with Claude Code
Summary by CodeRabbit