[Compat] Finetuned Qwen3.5-MoE weights deviating from expectation: model.layers vs model.language_model.layers

Hi, I found the root cause of the garbled output issue when quantizing my Qwen3.5-MoE checkpoint with GPTQModel 6.0.3.

The problem is not just quantization quality. The MoE expert weights are not loaded correctly.

In my checkpoint, the expert keys are stored as:
- model.layers.{i}.mlp.experts.gate_up_proj
- model.layers.{i}.mlp.experts.down_proj

But during GPTQModel 6.0.3 load, the model expects:
- model.language_model.layers.{i}.mlp.experts.gate_up_proj
- model.language_model.layers.{i}.mlp.experts.down_proj

So the load report shows:
- `model.layers...` as `UNEXPECTED`
- `model.language_model.layers...` as `MISSING`

and the expert weights get re-initialized instead of loaded from checkpoint.

This explains why:
- the same checkpoint worked normally with GPTQModel 5.8.0 (only accuracy drop)
- but with 6.0.3 the quantized model output becomes garbled / nonsensical

I also verified with direct inspection that under my current transformers environment, the checkpoint contains:
- `model.layers.0.mlp.experts.gate_up_proj`
- `model.layers.0.mlp.experts.down_proj`

and does not contain:
- `model.language_model.layers.0.mlp.experts...`

So it looks like there is a path mismatch/regression in the qwen3_5_moe load logic for MoE expert weights in 6.0.3.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Compat] Finetuned Qwen3.5-MoE weights deviating from expectation: model.layers vs model.language_model.layers #2694

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Compat] Finetuned Qwen3.5-MoE weights deviating from expectation: model.layers vs model.language_model.layers #2694

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions