OOM on 32GB while running auto-opt Qwen/Qwen2.5-Coder-7B-Instruct

Hi,

I've been playing around with `olive` after being frustrated with the limited amounts of models available for my "AI" PC. Anyhow, I have been trying to run the following on a 32GB Lenovo Yoga Slim 7x as well as a dedicated cloud server with 32GB:

```bash
olive auto-opt \
    --model_name_or_path "$CACHE_MODEL_PATH" \
    $TRUST_FLAG \
    --output_path "$OUTPUT_PATH" \
    --device "$DEVICE" \
    --provider "$PROVIDER" \
    --use_ort_genai \
    --precision "$PRECISION" \
    --log_level "$LOG_LEVEL" \
    $EXTRA_ARGS
```

with the following parameters

```markdown
==========================================
QNN Model Conversion Container
==========================================
Model: Qwen/Qwen2.5-Coder-7B-Instruct
Output Directory: .
Output Name: qwen-coder-7b
Precision: int4
Device: npu
Provider: QNNExecutionProvider
Cache Directory: model-cache
==========================================
```

But no matter what I run into an OOM error:

```bash
[ 1299.228531] Out of memory: Killed process 11552 (olive) total-vm:43667312kB, anon-rss:31484340kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:64956kB oom_score_adj:0
```

 Is this expected? If so is there any way to reduce the memory load while keeping the model "intelligent"? Originally I had planned on converting the devstral-23b from Mistral to finally run on my Qualcomm NPU, but as it seems that will continue to be a dream. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM on 32GB while running auto-opt Qwen/Qwen2.5-Coder-7B-Instruct #2316

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM on 32GB while running auto-opt Qwen/Qwen2.5-Coder-7B-Instruct #2316

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions