Problem Description
When trying to run the flash-moe project with the Qwen3.5-397B-A17B-4bit model, I encountered an issue where the provided expert_index.json file only contains expert information for layer 0, but the project requires expert information for all 60 layers.
Steps to Reproduce
- Clone the repository and set up the environment
- Download the Qwen3.5-397B-A17B-4bit model from ModelScope (46 safetensors files, ~224GB total)
- Run
extract_weights.py to create model_weights.bin and model_weights.json (successful)
- Run
repack_experts.py to create packed expert files
- Try to run
./metal_infer --full or ./infer with the model
Error Message
ERROR: Cannot open /path/to/model/packed_experts/layer_01.bin: No such file or directory
Root Cause Analysis
-
Current expert_index.json structure: Only contains expert information for layer 0
{
"model_path": "...",
"expert_reads": {
"0": { ... } // Only layer 0!
}
}
-
Expected structure: Should contain expert information for all 60 layers (0-59)
{
"model_path": "...",
"expert_reads": {
"0": { ... },
"1": { ... },
// ... layers 2-58 ...
"59": { ... }
}
}
-
Impact: The repack_experts.py script can only process layer 0, leaving layers 1-59 without packed expert files.
Workarounds Attempted
- Modified
expert_index.json: Tried to manually add layer information, but need accurate offsets from all 46 safetensors files
- Partial testing: Can only test layer 0 performance, cannot run full 60-layer inference
- Code inspection: Found that
repack_experts.py expects a complete index but the provided one is incomplete
Questions for the Author
- Is there a script to generate the complete
expert_index.json for all 60 layers from the 46 safetensors files?
- Can you provide the complete
expert_index.json file for the Qwen3.5-397B-A17B-4bit model?
- What's the intended workflow for users who download the model from ModelScope/Hugging Face?
Environment
- Project: flash-moe (commit 3601d41)
- Model: mlx-community/Qwen3.5-397B-A17B-4bit from ModelScope
- Hardware: MacBook Pro M4 Max, 128GB RAM
- OS: macOS (Darwin Kernel Version 24.6.0)
Additional Context
The project is amazing and the performance claims are impressive! I was able to successfully:
- Download the 224GB model
- Extract non-expert weights (5.5GB
model_weights.bin)
- Run single-layer MoE benchmarks (showing ~160 tok/s theoretical throughput)
- Build all binaries successfully
The only blocker is the missing expert information for layers 1-59.
Suggested Solution
- Option A: Provide a script that analyzes the 46 safetensors files and generates the complete
expert_index.json
- Option B: Share the complete
expert_index.json file in the repository
- Option C: Document the exact process for users to generate this index themselves
Thank you for creating this incredible project! Looking forward to running the full 397B model on my MacBook Pro.
Problem Description
When trying to run the flash-moe project with the Qwen3.5-397B-A17B-4bit model, I encountered an issue where the provided
expert_index.jsonfile only contains expert information for layer 0, but the project requires expert information for all 60 layers.Steps to Reproduce
extract_weights.pyto createmodel_weights.binandmodel_weights.json(successful)repack_experts.pyto create packed expert files./metal_infer --fullor./inferwith the modelError Message
Root Cause Analysis
Current
expert_index.jsonstructure: Only contains expert information for layer 0{ "model_path": "...", "expert_reads": { "0": { ... } // Only layer 0! } }Expected structure: Should contain expert information for all 60 layers (0-59)
{ "model_path": "...", "expert_reads": { "0": { ... }, "1": { ... }, // ... layers 2-58 ... "59": { ... } } }Impact: The
repack_experts.pyscript can only process layer 0, leaving layers 1-59 without packed expert files.Workarounds Attempted
expert_index.json: Tried to manually add layer information, but need accurate offsets from all 46 safetensors filesrepack_experts.pyexpects a complete index but the provided one is incompleteQuestions for the Author
expert_index.jsonfor all 60 layers from the 46 safetensors files?expert_index.jsonfile for the Qwen3.5-397B-A17B-4bit model?Environment
Additional Context
The project is amazing and the performance claims are impressive! I was able to successfully:
model_weights.bin)The only blocker is the missing expert information for layers 1-59.
Suggested Solution
expert_index.jsonexpert_index.jsonfile in the repositoryThank you for creating this incredible project! Looking forward to running the full 397B model on my MacBook Pro.