Skip to content

OOM on 32GB while running auto-opt Qwen/Qwen2.5-Coder-7B-Instruct #2316

@stiller-leser

Description

@stiller-leser

Hi,

I've been playing around with olive after being frustrated with the limited amounts of models available for my "AI" PC. Anyhow, I have been trying to run the following on a 32GB Lenovo Yoga Slim 7x as well as a dedicated cloud server with 32GB:

olive auto-opt \
    --model_name_or_path "$CACHE_MODEL_PATH" \
    $TRUST_FLAG \
    --output_path "$OUTPUT_PATH" \
    --device "$DEVICE" \
    --provider "$PROVIDER" \
    --use_ort_genai \
    --precision "$PRECISION" \
    --log_level "$LOG_LEVEL" \
    $EXTRA_ARGS

with the following parameters

==========================================
QNN Model Conversion Container
==========================================
Model: Qwen/Qwen2.5-Coder-7B-Instruct
Output Directory: .
Output Name: qwen-coder-7b
Precision: int4
Device: npu
Provider: QNNExecutionProvider
Cache Directory: model-cache
==========================================

But no matter what I run into an OOM error:

[ 1299.228531] Out of memory: Killed process 11552 (olive) total-vm:43667312kB, anon-rss:31484340kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:64956kB oom_score_adj:0

Is this expected? If so is there any way to reduce the memory load while keeping the model "intelligent"? Originally I had planned on converting the devstral-23b from Mistral to finally run on my Qualcomm NPU, but as it seems that will continue to be a dream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions