Fix[bug] ONNX models generated by llm_export.py are missing some i/o #1157
Fix[bug] ONNX models generated by llm_export.py are missing some i/o #1157Ratheesh1104 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/torch_onnx/llm_export.py`:
- Around line 374-384: extra_inputs currently supplies attention_mask and
position_ids which don't exist in WrapperModelForCausalLM.forward (only
input_ids and past_key_values), causing unexpected kwargs and duplicate
input_ids; either update WrapperModelForCausalLM.forward and the
llm_to_onnx/torch_to_onnx plumbing to accept attention_mask/position_ids, or
minimally fix extra_inputs to only include input_ids (and adjust extra_dyn_axes
to remove attention_mask/position_ids) so the call that expands **extra_inputs
matches the wrapper signature and does not pass input_ids twice; update
references where extra_inputs and extra_dyn_axes are used in the
torch_to_onnx/llm_to_onnx flow accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9361d85b-0639-4da1-a683-67ddefd86246
📒 Files selected for processing (1)
examples/torch_onnx/llm_export.py
| extra_inputs = { | ||
| "input_ids": dummy_input_ids, | ||
| "attention_mask": dummy_attention_mask, | ||
| "position_ids": dummy_position_ids, | ||
| } | ||
| extra_dyn_axes = { | ||
| "input_ids": {0: "batch", 1: "seq_len"}, | ||
| "attention_mask": {0: "batch", 1: "seq_len"}, | ||
| "position_ids": {0: "batch", 1: "seq_len"}, | ||
| "logits": {0: "batch", 1: "seq_len"}, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification of signature/call-path mismatch
rg -n -C3 'def forward\(self, input_ids.*past_key_values' modelopt/onnx/llm_export_utils/export_utils.py
rg -n -C8 'torch_to_onnx\(' modelopt/onnx/llm_export_utils/export_utils.py
rg -n -C8 'extra_inputs = \{|extra_dyn_axes = \{' examples/torch_onnx/llm_export.pyRepository: NVIDIA/Model-Optimizer
Length of output: 2743
extra_inputs breaks the export call contract and causes runtime failure.
The wrapper's forward signature only accepts input_ids and past_key_values, but extra_inputs passes input_ids, attention_mask, and position_ids as kwargs. When expanded via **extra_inputs in the torch_to_onnx call at line 134, the model's forward receives unexpected keyword arguments (attention_mask, position_ids), causing a TypeError. Additionally, input_ids would be passed twice — once positionally and once as a kwarg, which is invalid.
Suggested minimal fix
- extra_inputs = {
- "input_ids": dummy_input_ids,
- "attention_mask": dummy_attention_mask,
- "position_ids": dummy_position_ids,
- }
- extra_dyn_axes = {
- "input_ids": {0: "batch", 1: "seq_len"},
- "attention_mask": {0: "batch", 1: "seq_len"},
- "position_ids": {0: "batch", 1: "seq_len"},
- "logits": {0: "batch", 1: "seq_len"},
- }
+ extra_inputs = {}
+ extra_dyn_axes = {"logits": {0: "batch_size", 1: "seq_len"}}If attention_mask and position_ids are required as ONNX inputs, coordinate updates in WrapperModelForCausalLM.forward signature and llm_to_onnx plumbing first.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/torch_onnx/llm_export.py` around lines 374 - 384, extra_inputs
currently supplies attention_mask and position_ids which don't exist in
WrapperModelForCausalLM.forward (only input_ids and past_key_values), causing
unexpected kwargs and duplicate input_ids; either update
WrapperModelForCausalLM.forward and the llm_to_onnx/torch_to_onnx plumbing to
accept attention_mask/position_ids, or minimally fix extra_inputs to only
include input_ids (and adjust extra_dyn_axes to remove
attention_mask/position_ids) so the call that expands **extra_inputs matches the
wrapper signature and does not pass input_ids twice; update references where
extra_inputs and extra_dyn_axes are used in the torch_to_onnx/llm_to_onnx flow
accordingly.
| dummy_attention_mask = torch.ones((batch_size, seq_len), dtype=torch.long) | ||
| dummy_position_ids = torch.arange(seq_len).unsqueeze(0).repeat(batch_size, 1) | ||
|
|
||
| # Correct assignment — no trailing comma |
There was a problem hiding this comment.
There is a non-UTF-8 character at this line
What does this PR do?
Fixes the missing input and output nodes in ONNX models generated by
llm_export.py(#1147).Now the following nodes are properly included:
attention_maskposition_idspast_key_values*This ensures ONNX models are fully compatible with downstream TensorRT workflows and standard LLM inference pipelines.
Type of change: Bug fix
Usage