Skip to content

Fix[bug] ONNX models generated by llm_export.py are missing some i/o #1157

Open
Ratheesh1104 wants to merge 1 commit intoNVIDIA:mainfrom
Ratheesh1104:fix/onnx-missing-io-1147
Open

Fix[bug] ONNX models generated by llm_export.py are missing some i/o #1157
Ratheesh1104 wants to merge 1 commit intoNVIDIA:mainfrom
Ratheesh1104:fix/onnx-missing-io-1147

Conversation

@Ratheesh1104
Copy link
Copy Markdown

@Ratheesh1104 Ratheesh1104 commented Apr 1, 2026

What does this PR do?

Fixes the missing input and output nodes in ONNX models generated by llm_export.py (#1147).
Now the following nodes are properly included:

  • attention_mask
  • position_ids
  • past_key_values*

This ensures ONNX models are fully compatible with downstream TensorRT workflows and standard LLM inference pipelines.

Type of change: Bug fix

Usage

from modelopt.onnx.llm_export import llm_export

# Export HuggingFace model to ONNX with all necessary inputs/outputs
llm_export(
    hf_model_path="meta-llama/Llama-3.1-8B-Instruct",
    dtype="int4_awq",
    output_dir="models/Llama-3.1-8B-Instruct-ONNX-INT4"
)

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

## Bug Fixes

* **LLM ONNX Export**Enhanced the LLM export example to use concrete dummy tensor inputs with proper attention masks and position IDs instead of empty placeholders. Added dynamic-axes mappings for batch and sequence dimensions to improve ONNX model compatibility.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@Ratheesh1104 Ratheesh1104 requested a review from a team as a code owner April 1, 2026 17:45
@Ratheesh1104 Ratheesh1104 requested a review from galagam April 1, 2026 17:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

The main() function in examples/torch_onnx/llm_export.py now populates dummy tensor inputs with concrete values (batch_size=1, seq_len=8) for LLM export, including input_ids, attention_mask, and position_ids. Dynamic axis mappings are also defined for these inputs and the output logits.

Changes

Cohort / File(s) Summary
LLM Export Input Setup
examples/torch_onnx/llm_export.py
Updated main() to construct concrete dummy tensors (input_ids, attention_mask, position_ids) with fixed dimensions and populate extra_dyn_axes with batch/sequence dynamic axis mappings for LLM→ONNX export.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the bug fix by referencing missing I/O in ONNX models from llm_export.py, which aligns with the PR's core objective of adding attention_mask, position_ids, and past_key_values to exported ONNX models.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The pull request introduces only local variable assignments and tensor creation operations without violating any critical security anti-patterns outlined in SECURITY.md.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/torch_onnx/llm_export.py`:
- Around line 374-384: extra_inputs currently supplies attention_mask and
position_ids which don't exist in WrapperModelForCausalLM.forward (only
input_ids and past_key_values), causing unexpected kwargs and duplicate
input_ids; either update WrapperModelForCausalLM.forward and the
llm_to_onnx/torch_to_onnx plumbing to accept attention_mask/position_ids, or
minimally fix extra_inputs to only include input_ids (and adjust extra_dyn_axes
to remove attention_mask/position_ids) so the call that expands **extra_inputs
matches the wrapper signature and does not pass input_ids twice; update
references where extra_inputs and extra_dyn_axes are used in the
torch_to_onnx/llm_to_onnx flow accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9361d85b-0639-4da1-a683-67ddefd86246

📥 Commits

Reviewing files that changed from the base of the PR and between 09b3c0b and 99c9912.

📒 Files selected for processing (1)
  • examples/torch_onnx/llm_export.py

Comment on lines +374 to +384
extra_inputs = {
"input_ids": dummy_input_ids,
"attention_mask": dummy_attention_mask,
"position_ids": dummy_position_ids,
}
extra_dyn_axes = {
"input_ids": {0: "batch", 1: "seq_len"},
"attention_mask": {0: "batch", 1: "seq_len"},
"position_ids": {0: "batch", 1: "seq_len"},
"logits": {0: "batch", 1: "seq_len"},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only verification of signature/call-path mismatch
rg -n -C3 'def forward\(self, input_ids.*past_key_values' modelopt/onnx/llm_export_utils/export_utils.py
rg -n -C8 'torch_to_onnx\(' modelopt/onnx/llm_export_utils/export_utils.py
rg -n -C8 'extra_inputs = \{|extra_dyn_axes = \{' examples/torch_onnx/llm_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2743


extra_inputs breaks the export call contract and causes runtime failure.

The wrapper's forward signature only accepts input_ids and past_key_values, but extra_inputs passes input_ids, attention_mask, and position_ids as kwargs. When expanded via **extra_inputs in the torch_to_onnx call at line 134, the model's forward receives unexpected keyword arguments (attention_mask, position_ids), causing a TypeError. Additionally, input_ids would be passed twice — once positionally and once as a kwarg, which is invalid.

Suggested minimal fix
-        extra_inputs = {
-            "input_ids": dummy_input_ids,
-            "attention_mask": dummy_attention_mask,
-            "position_ids": dummy_position_ids,
-        }
-        extra_dyn_axes = {
-            "input_ids": {0: "batch", 1: "seq_len"},
-            "attention_mask": {0: "batch", 1: "seq_len"},
-            "position_ids": {0: "batch", 1: "seq_len"},
-            "logits": {0: "batch", 1: "seq_len"},
-        }
+        extra_inputs = {}
+        extra_dyn_axes = {"logits": {0: "batch_size", 1: "seq_len"}}

If attention_mask and position_ids are required as ONNX inputs, coordinate updates in WrapperModelForCausalLM.forward signature and llm_to_onnx plumbing first.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/torch_onnx/llm_export.py` around lines 374 - 384, extra_inputs
currently supplies attention_mask and position_ids which don't exist in
WrapperModelForCausalLM.forward (only input_ids and past_key_values), causing
unexpected kwargs and duplicate input_ids; either update
WrapperModelForCausalLM.forward and the llm_to_onnx/torch_to_onnx plumbing to
accept attention_mask/position_ids, or minimally fix extra_inputs to only
include input_ids (and adjust extra_dyn_axes to remove
attention_mask/position_ids) so the call that expands **extra_inputs matches the
wrapper signature and does not pass input_ids twice; update references where
extra_inputs and extra_dyn_axes are used in the torch_to_onnx/llm_to_onnx flow
accordingly.

dummy_attention_mask = torch.ones((batch_size, seq_len), dtype=torch.long)
dummy_position_ids = torch.arange(seq_len).unsqueeze(0).repeat(batch_size, 1)

# Correct assignment — no trailing comma
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a non-UTF-8 character at this line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants