[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 by vadiklyutiy · Pull Request #38832 · vllm-project/vllm

vadiklyutiy · 2026-04-02T17:20:51Z

Description

Fix AssertionError when loading nvidia/Qwen3.5-397B-A17B-NVFP4 with method="mtp".

The NVFP4 checkpoint stores the entire MTP branch in BF16, but hf_quant_config.json only excludes mtp.layers.0* — missing mtp.fc. This causes ColumnParallelLinear for mtp.fc to be created with NVFP4 quantization (packed uint8, half input dim), which then crashes at weight loading when the BF16 checkpoint weight shape doesn't match.

Fix: Force quant_config=None for mtp.fc when the quant is modelopt_fp4.

This is a temporary workaround until NVIDIA/Model-Optimizer#1124 is merged and the checkpoint is re-exported with the corrected exclude_modules.

2x B200, TP=2:

vllm serve nvidia/Qwen3.5-397B-A17B-NVFP4 \
  --tensor-parallel-size 2 \
  --language-model-only \
  --speculative-config '{"method": "mtp", "num_speculative_tokens": 1}' \
  --max-model-len 1024

Before: AssertionError at parameter.py:153 during MTP weight loading.

After: Server starts, inference works:

{"prompt": "What is 2+2?", "max_tokens": 32}
-> "The sum of 2 and 2 is 4..."

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

gemini-code-assist

Code Review

This pull request implements a workaround for Qwen 3.5 MTP models by forcing the fc layer to remain unquantized when the modelopt_fp4 quantization configuration is used. This addresses an issue where the layer is stored as BF16 in checkpoints but missing from the exclusion list in the quantization configuration. I have no feedback to provide.

[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5

11b9c12

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy requested a review from sighingnow as a code owner April 2, 2026 17:20

mergify bot added qwen Related to Qwen models bug Something isn't working labels Apr 2, 2026

gemini-code-assist bot reviewed Apr 2, 2026

View reviewed changes

vadiklyutiy mentioned this pull request Apr 2, 2026

[Bugfix] Enable MTP for the official Qwen3.5 NVFP4 checkpoint #38650

Closed

ZJY0516 approved these changes Apr 2, 2026

View reviewed changes

ZJY0516 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 2, 2026

Merge branch 'main' into qwen35-fp4-mtp.fc

a07aebe

vadiklyutiy merged commit 771913e into vllm-project:main Apr 3, 2026
57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5#38832

[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5#38832
vadiklyutiy merged 2 commits intovllm-project:mainfrom
vadiklyutiy:qwen35-fp4-mtp.fc

vadiklyutiy commented Apr 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vadiklyutiy commented Apr 2, 2026

Description

Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants