Hi, I found the root cause of the garbled output issue when quantizing my Qwen3.5-MoE checkpoint with GPTQModel 6.0.3.
The problem is not just quantization quality. The MoE expert weights are not loaded correctly.
In my checkpoint, the expert keys are stored as:
- model.layers.{i}.mlp.experts.gate_up_proj
- model.layers.{i}.mlp.experts.down_proj
But during GPTQModel 6.0.3 load, the model expects:
- model.language_model.layers.{i}.mlp.experts.gate_up_proj
- model.language_model.layers.{i}.mlp.experts.down_proj
So the load report shows:
model.layers... as UNEXPECTED
model.language_model.layers... as MISSING
and the expert weights get re-initialized instead of loaded from checkpoint.
This explains why:
- the same checkpoint worked normally with GPTQModel 5.8.0 (only accuracy drop)
- but with 6.0.3 the quantized model output becomes garbled / nonsensical
I also verified with direct inspection that under my current transformers environment, the checkpoint contains:
model.layers.0.mlp.experts.gate_up_proj
model.layers.0.mlp.experts.down_proj
and does not contain:
model.language_model.layers.0.mlp.experts...
So it looks like there is a path mismatch/regression in the qwen3_5_moe load logic for MoE expert weights in 6.0.3.
Hi, I found the root cause of the garbled output issue when quantizing my Qwen3.5-MoE checkpoint with GPTQModel 6.0.3.
The problem is not just quantization quality. The MoE expert weights are not loaded correctly.
In my checkpoint, the expert keys are stored as:
But during GPTQModel 6.0.3 load, the model expects:
So the load report shows:
model.layers...asUNEXPECTEDmodel.language_model.layers...asMISSINGand the expert weights get re-initialized instead of loaded from checkpoint.
This explains why:
I also verified with direct inspection that under my current transformers environment, the checkpoint contains:
model.layers.0.mlp.experts.gate_up_projmodel.layers.0.mlp.experts.down_projand does not contain:
model.language_model.layers.0.mlp.experts...So it looks like there is a path mismatch/regression in the qwen3_5_moe load logic for MoE expert weights in 6.0.3.