Skip to content

[NVBug: 6038899] Fix MoE export crash on meta tensors with CPU offload#1155

Open
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/fix_6038899
Open

[NVBug: 6038899] Fix MoE export crash on meta tensors with CPU offload#1155
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/fix_6038899

Conversation

@cjluo-nv
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv commented Apr 1, 2026

Summary

Fixes NotImplementedError in sync_moe_gate_up_amax when quantizing MoE models (e.g. Qwen3-30B-A3B) on a single GPU with insufficient VRAM.

When GPU memory is insufficient, ModelOpt enables CPU offload via accelerate, leaving uncalibrated expert parameters on the meta device. During export, sync_moe_gate_up_amax calls torch.equal() on these meta tensors, which raises NotImplementedError because aten::equal does not support meta tensors — even though calibration itself completed successfully.

Changes

  • Add a guard in sync_moe_gate_up_amax to skip amax sync for meta tensors (which have no real data to sync) and emit a warning explaining the root cause.

Bug: https://nvbugspro.nvidia.com/bug/6038899

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Added warning messages for unsupported tensor configurations in quantization workflows.
    • Improved edge case detection to gracefully skip processing in incompatible scenarios.

…nsors

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv requested a review from a team as a code owner April 1, 2026 16:54
@cjluo-nv cjluo-nv requested a review from jingyu-ml April 1, 2026 16:54
@cjluo-nv cjluo-nv changed the title Fix sync_moe_gate_up_amax crash on meta tensors with CPU offload [NVBug: 6038899] Fix MoE export crash on meta tensors with CPU offload Apr 1, 2026
@cjluo-nv cjluo-nv requested a review from meenchen April 1, 2026 16:56
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1155/

Built to branch gh-pages at 2026-04-01 16:58 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@kevalmorabia97 kevalmorabia97 added the cherry-pick After code freeze, cherry-pick into release branch for next rc. Only for bug fixes and doc updates label Apr 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5d521494-9dbb-4f17-9bf2-fe40cbf579c8

📥 Commits

Reviewing files that changed from the base of the PR and between 09b3c0b and 7b677c5.

📒 Files selected for processing (1)
  • modelopt/torch/export/layer_utils.py

📝 Walkthrough

Walkthrough

The sync_moe_gate_up_amax function in layer_utils.py now includes detection for meta tensors in gate and up weight quantizer amax attributes. When meta tensors are detected, the function emits a warning and skips that gate/up pair without synchronizing amax values. Existing behavior for non-meta tensors is unchanged.

Changes

Cohort / File(s) Summary
Meta Tensor Detection
modelopt/torch/export/layer_utils.py
Added conditional check to detect meta tensors in gate and up quantizer amax attributes. When detected, function emits warning message and breaks the processing loop for that pair, leaving non-meta tensor behavior unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing a MoE export crash on meta tensors with CPU offload, which directly corresponds to the changeset's modification to sync_moe_gate_up_amax.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed Pull request adds security guard for meta tensor detection in sync_moe_gate_up_amax function with no anti-pattern violations detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chenjiel/fix_6038899

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.21%. Comparing base (c37c74f) to head (7b677c5).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1155      +/-   ##
==========================================
+ Coverage   70.20%   70.21%   +0.01%     
==========================================
  Files         230      230              
  Lines       26098    26098              
==========================================
+ Hits        18322    18325       +3     
+ Misses       7776     7773       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjluo-nv cjluo-nv enabled auto-merge (squash) April 1, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick After code freeze, cherry-pick into release branch for next rc. Only for bug fixes and doc updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants