Skip to content

[Compat] Finetuned Qwen3.5-MoE weights deviating from expectation: model.layers vs model.language_model.layers #2694

@caixiaoxx

Description

@caixiaoxx

Hi, I found the root cause of the garbled output issue when quantizing my Qwen3.5-MoE checkpoint with GPTQModel 6.0.3.

The problem is not just quantization quality. The MoE expert weights are not loaded correctly.

In my checkpoint, the expert keys are stored as:

  • model.layers.{i}.mlp.experts.gate_up_proj
  • model.layers.{i}.mlp.experts.down_proj

But during GPTQModel 6.0.3 load, the model expects:

  • model.language_model.layers.{i}.mlp.experts.gate_up_proj
  • model.language_model.layers.{i}.mlp.experts.down_proj

So the load report shows:

  • model.layers... as UNEXPECTED
  • model.language_model.layers... as MISSING

and the expert weights get re-initialized instead of loaded from checkpoint.

This explains why:

  • the same checkpoint worked normally with GPTQModel 5.8.0 (only accuracy drop)
  • but with 6.0.3 the quantized model output becomes garbled / nonsensical

I also verified with direct inspection that under my current transformers environment, the checkpoint contains:

  • model.layers.0.mlp.experts.gate_up_proj
  • model.layers.0.mlp.experts.down_proj

and does not contain:

  • model.language_model.layers.0.mlp.experts...

So it looks like there is a path mismatch/regression in the qwen3_5_moe load logic for MoE expert weights in 6.0.3.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions