python quantize_autoround.py --model Qwen/Qwen3.5-9B --group-size 128 --iters 100 --scheme W3A16 --output ./Qwen3.5-9B-AutoRound-W3A16-g128-coding --calib coding --quant-lm-head --torch-compile --low-gpu-mem
WOQ[RTN] quantizing missing weights: 52%|██████████████▍ | 190/368 [02:25<00:34, 5.15weight/s]2026-04-02 18:02:37 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.12.linear_attn.out_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 52%|██████████████▌ | 191/368 [02:26<00:33, 5.25weight/s]2026-04-02 18:02:37 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.13.linear_attn.in_proj_z.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 52%|██████████████▌ | 192/368 [02:26<00:31, 5.60weight/s]2026-04-02 18:02:37 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.13.linear_attn.out_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 52%|██████████████▋ | 193/368 [02:26<00:29, 5.86weight/s]2026-04-02 18:02:37 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.0.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:37 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.0.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 53%|██████████████▊ | 195/368 [02:26<00:19, 8.65weight/s]2026-04-02 18:02:37 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.1.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:37 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.1.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.10.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 54%|███████████████ | 198/368 [02:26<00:12, 13.32weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.10.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.11.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.11.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 55%|███████████████▎ | 201/368 [02:26<00:09, 17.24weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.12.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.12.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.13.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 55%|███████████████▌ | 204/368 [02:26<00:08, 20.34weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.13.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.14.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.14.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 56%|███████████████▊ | 207/368 [02:26<00:07, 22.37weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.15.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.15.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.16.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 57%|███████████████▉ | 210/368 [02:26<00:06, 23.84weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.16.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.17.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.17.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 58%|████████████████▏ | 213/368 [02:27<00:06, 25.05weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.18.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.18.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.19.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 59%|████████████████▍ | 216/368 [02:27<00:05, 25.86weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.19.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.2.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.2.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 60%|████████████████▋ | 219/368 [02:27<00:06, 21.65weight/s]2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.20.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.20.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:38 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.21.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 60%|████████████████▉ | 222/368 [02:27<00:06, 22.19weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.21.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.22.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.22.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 61%|█████████████████ | 225/368 [02:27<00:06, 22.61weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.23.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.23.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.24.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 62%|█████████████████▎ | 228/368 [02:27<00:05, 23.35weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.24.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.25.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.25.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 63%|█████████████████▌ | 231/368 [02:27<00:05, 23.18weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.26.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.26.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.3.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 64%|█████████████████▊ | 234/368 [02:27<00:05, 23.62weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.3.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.4.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.4.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 64%|██████████████████ | 237/368 [02:28<00:05, 24.20weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.5.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.5.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.6.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 65%|██████████████████▎ | 240/368 [02:28<00:05, 25.25weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.6.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.7.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.7.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 66%|██████████████████▍ | 243/368 [02:28<00:04, 25.96weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.8.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.8.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.9.mlp.linear_fc1.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 67%|██████████████████▋ | 246/368 [02:28<00:04, 26.82weight/s]2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.9.mlp.linear_fc2.weight: The size of tensor a (435) must match the size of tensor b (436) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:39 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.15.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.15.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 68%|██████████████████▉ | 249/368 [02:28<00:04, 26.92weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.27.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.27.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.3.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 68%|███████████████████▏ | 252/368 [02:28<00:04, 27.18weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.3.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize mtp.layers.0.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize mtp.layers.0.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 69%|███████████████████▍ | 255/368 [02:28<00:04, 26.37weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.11.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.11.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.23.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.23.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 70%|███████████████████▋ | 259/368 [02:28<00:03, 29.24weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.7.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.7.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.31.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 71%|███████████████████▉ | 262/368 [02:28<00:03, 28.36weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.31.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.19.self_attn.k_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.19.self_attn.v_proj.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 72%|████████████████████▏ | 265/368 [02:29<00:03, 27.56weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.0.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.1.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.10.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.11.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 73%|████████████████████▍ | 269/368 [02:29<00:03, 29.70weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.12.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.13.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.14.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.15.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 74%|████████████████████▊ | 273/368 [02:29<00:03, 31.03weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.16.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.17.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.18.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.19.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 75%|█████████████████████ | 277/368 [02:29<00:02, 31.95weight/s]2026-04-02 18:02:40 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.2.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.20.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.21.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.22.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 76%|█████████████████████▍ | 281/368 [02:29<00:02, 32.47weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.23.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.24.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.25.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.26.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 77%|█████████████████████▋ | 285/368 [02:29<00:02, 33.61weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.3.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.4.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.5.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.6.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 79%|█████████████████████▉ | 289/368 [02:29<00:02, 34.44weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.7.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.8.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.9.attn.qkv.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.pos_embed.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 80%|██████████████████████▎ | 293/368 [02:29<00:02, 35.77weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.0.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.1.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.10.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.11.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.12.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.13.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.14.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.15.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.16.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.17.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 82%|███████████████████████ | 303/368 [02:29<00:01, 52.23weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.18.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.19.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.2.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.20.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.21.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.22.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.23.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.24.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.25.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.26.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 85%|███████████████████████▊ | 313/368 [02:30<00:00, 64.08weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.3.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.4.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.5.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.6.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.7.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.8.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.visual.blocks.9.attn.proj.weight: The size of tensor a (115) must match the size of tensor b (116) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.14.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.14.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.26.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.26.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.30.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.30.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.10.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.10.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.0.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.0.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.1.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 90%|█████████████████████████▏ | 331/368 [02:30<00:00, 96.56weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.1.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.22.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.22.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.8.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.8.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.9.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.9.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.4.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.4.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.16.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.16.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.17.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.17.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.18.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.18.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.2.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.2.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.24.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.24.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.25.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.25.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.5.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.5.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.6.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.6.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.20.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.20.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.21.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.21.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.28.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.28.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.29.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.29.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.12.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 99%|██████████████████████████▊| 365/368 [02:30<00:00, 165.55weight/s]2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.12.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.13.linear_attn.in_proj_b.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
2026-04-02 18:02:41 WARNING missing_tensors.py L702: Failed to quantize model.language_model.layers.13.linear_attn.in_proj_a.weight: The size of tensor a (409) must match the size of tensor b (410) at non-singleton dimension 0, keeping original weight
WOQ[RTN] quantizing missing weights: 100%|████████████████████████████| 368/368 [02:30<00:00, 2.45weight/s]
2026-04-02 18:04:36 INFO missing_tensors.py L370: Successfully wrote 774 missing tensor(s) to 'model_extra_tensors.safetensors' in /var/lib/docker/autoround/Qwen3.5-9B-AutoRound-W3A16-g128-coding.
Problem Description
When quantizing Qwen/Qwen3.5-9B, the output is LARGER than the original model:
Reproduction Steps
python quantize_autoround.py --model Qwen/Qwen3.5-9B --group-size 128 --iters 100 --scheme W3A16 --output ./Qwen3.5-9B-AutoRound-W3A16-g128-coding --calib coding --quant-lm-head --torch-compile --low-gpu-mem
Environment Information
Additional Context
Root cause
Qwen3.5-9B uses a hybrid Gated Delta Network (DeltaNet) architecture:
not divisible by group_size (e.g. 409, 1228)
WARNING missing_tensors.py L702: Failed to quantize
model.language_model.layers.13.linear_attn.in_proj_a.weight:
The size of tensor a (409) must match the size of tensor b (410)
Since DeltaNet layers make up ~75% of model weight, the unquantized
extra_tensors file is larger than the quantized transformer layers.
Related: Issue #1496 mentions similar dimension issues for AWQ export
with workaround
ignore_modules=["in_proj_ba"]Expected behavior
Either:
DeltaNet layers is not fully supported by AutoRound. These layers
will remain at bf16, resulting in larger output than original."
quantize_autoround.py
Error Logs