Skip to content

Fix legacy ONNX export crash with transformers >= 5.0#2381

Open
Lidang-Jiang wants to merge 1 commit intomicrosoft:mainfrom
Lidang-Jiang:fix/auto-opt-transformers5
Open

Fix legacy ONNX export crash with transformers >= 5.0#2381
Lidang-Jiang wants to merge 1 commit intomicrosoft:mainfrom
Lidang-Jiang:fix/auto-opt-transformers5

Conversation

@Lidang-Jiang
Copy link
Copy Markdown

Summary

Fix olive auto-opt crash when using the default (non-dynamo) ONNX export path with transformers >= 5.0.

Root cause: transformers 5.0 removed DynamicCache.from_legacy_cache() / .to_legacy_cache() and changed the cache API so that past_key_values must be a Cache object (not a list of tuples). The legacy ONNX export path passes list-format past_key_values via merge_kv_cache_to_tuple_hook, which causes AttributeError during model tracing.

Fix:

  1. Add an early check in _convert_model_on_device: when transformers >= 5.0 and use_dynamo_exporter=False, raise a clear RuntimeError directing users to --use_model_builder --use_ort_genai (recommended) or --use_dynamo_exporter.
  2. Guard _patch_model_if_necessary to skip on transformers >= 5.0 (the from_legacy_cache / to_legacy_cache calls are only valid for 4.45 <= transformers < 5.0).

Fixes #2335

Before (cryptic crash)
$ python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path /tmp/test/qwen-cpu-int4 \
  --device cpu --provider CPUExecutionProvider --precision int4 --log_level 1

Loading HuggingFace model from Qwen/Qwen2.5-0.5B-Instruct
[INFO] Running workflow default_workflow
[INFO] Running pass conversion:onnxconversion

...full traceback...

  File ".../transformers/masking_utils.py", line 846, in _preprocess_mask_arguments
    q_offset = past_key_values.get_seq_length()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'get_seq_length'
After (clear error message)
$ python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path /tmp/test/qwen-cpu-int4 \
  --device cpu --provider CPUExecutionProvider --precision int4 --log_level 1

RuntimeError: Legacy ONNX export (use_dynamo_exporter=False) is not compatible with
transformers 5.5.0. transformers >= 5.0 changed the DynamicCache API, which breaks
the non-dynamo export path for models with KV cache. Please use one of the following options:
  1. Add --use_model_builder --use_ort_genai to use the model builder (recommended)
  2. Add --use_dynamo_exporter to use the dynamo-based ONNX export
  3. Downgrade transformers below 5.0
Example: olive auto-opt --model_name_or_path <model> --use_model_builder --use_ort_genai ...
After (with recommended --use_model_builder flag, succeeds)
$ python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path /tmp/test/qwen-cpu-int4 \
  --device cpu --provider CPUExecutionProvider \
  --use_model_builder --use_ort_genai --precision int4 --log_level 1

Loading HuggingFace model from Qwen/Qwen2.5-0.5B-Instruct
[INFO] Running workflow default_workflow
[INFO] Running pass model_builder:modelbuilder
Saving processing files in .olive-cache/.../models for GenAI
[INFO] Pass model_builder:modelbuilder finished in 32.39 seconds
[INFO] Pass extract_adapters:extractadapters finished in 0.04 seconds
[INFO] Saved output model to /tmp/test/qwen-cpu-int4
Model is saved at /tmp/test/qwen-cpu-int4

Test plan

  • Reproduce original crash with transformers 5.5.0 + torch 2.11.0
  • Verify clear error message is shown on legacy export path
  • Verify --use_model_builder --use_ort_genai workaround succeeds
  • Existing unit tests pass

transformers 5.0 removed DynamicCache.from_legacy_cache() and changed
the cache API so that past_key_values must be a Cache object. The legacy
(non-dynamo) ONNX export path passes list-format past_key_values which
causes AttributeError during model tracing.

Add an early check in _convert_model_on_device to raise a clear error
message directing users to --use_model_builder or --use_dynamo_exporter.
Also guard _patch_model_if_necessary to skip on transformers >= 5.0.

Fixes microsoft#2335

Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

olive auto-opt for CPU INT4 fails without --use_model_builderDynamicCache.from_legacy_cache AttributeError with transformers 5.x

1 participant