-
Notifications
You must be signed in to change notification settings - Fork 288
OOM on 32GB while running auto-opt Qwen/Qwen2.5-Coder-7B-Instruct #2316
Copy link
Copy link
Open
Description
Hi,
I've been playing around with olive after being frustrated with the limited amounts of models available for my "AI" PC. Anyhow, I have been trying to run the following on a 32GB Lenovo Yoga Slim 7x as well as a dedicated cloud server with 32GB:
olive auto-opt \
--model_name_or_path "$CACHE_MODEL_PATH" \
$TRUST_FLAG \
--output_path "$OUTPUT_PATH" \
--device "$DEVICE" \
--provider "$PROVIDER" \
--use_ort_genai \
--precision "$PRECISION" \
--log_level "$LOG_LEVEL" \
$EXTRA_ARGSwith the following parameters
==========================================
QNN Model Conversion Container
==========================================
Model: Qwen/Qwen2.5-Coder-7B-Instruct
Output Directory: .
Output Name: qwen-coder-7b
Precision: int4
Device: npu
Provider: QNNExecutionProvider
Cache Directory: model-cache
==========================================But no matter what I run into an OOM error:
[ 1299.228531] Out of memory: Killed process 11552 (olive) total-vm:43667312kB, anon-rss:31484340kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:64956kB oom_score_adj:0Is this expected? If so is there any way to reduce the memory load while keeping the model "intelligent"? Originally I had planned on converting the devstral-23b from Mistral to finally run on my Qualcomm NPU, but as it seems that will continue to be a dream.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels