Skip to content

fix: TRT-LLM multimodal preprocessor - remove default_multimodal_input_loader from the embedding paths#6924

Merged
moraxu merged 3 commits intoai-dynamo:mainfrom
moraxu:trtllm-multimodal-processor-fix-the-path-for-embeddings
Mar 6, 2026
Merged

fix: TRT-LLM multimodal preprocessor - remove default_multimodal_input_loader from the embedding paths#6924
moraxu merged 3 commits intoai-dynamo:mainfrom
moraxu:trtllm-multimodal-processor-fix-the-path-for-embeddings

Conversation

@moraxu
Copy link
Contributor

@moraxu moraxu commented Mar 5, 2026

Overview:

A follow up to #6840 to remove default_multimodal_input_loader calls from the embedding paths and instead pass txt prompt from the Rust frontend (where the chat template has already been applied to it) to TRT-LLM.

Details:

Where should the reviewer start?

Tested

  • llava-v1.6-mistral-7b-hf:
    • E/P/D: embeddings & image URL
    • E/PD: image URL only, embeddings don't work
    • Aggregated: embeddings & image URL & text
  • Qwen3-VL-2B-Instruct:
    • E/P/D: image URL

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Support for pre-computed embeddings in multimodal requests, including explicit handling when a formatted prompt is required.
    • Support for propagating a formatted prompt through multimodal request flow.
  • Refactor

    • Unified multimodal data handling across request types with improved embedding/image loading, stricter validation, and clearer warnings/logging.

@moraxu moraxu requested review from a team as code owners March 5, 2026 04:57
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

👋 Hi moraxu! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added fix external-contribution Pull request is from an external contributor backend::trtllm Relates to the trtllm backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` multimodal labels Mar 5, 2026
@moraxu moraxu force-pushed the trtllm-multimodal-processor-fix-the-path-for-embeddings branch from 62d4bbb to 61ff322 Compare March 5, 2026 05:03
@pull-request-size pull-request-size bot added size/M and removed size/L labels Mar 5, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Walkthrough

Adds optional formatted_prompt propagation and explicit embedding handling across Python and Rust preprocessing: Python multimodal paths now load/attach embeddings (structured under multi_modal_embeddings) and require formatted_prompt for embeddings; Rust gather_multi_modal_data gains a formatted_prompt parameter passed through preprocess flows.

Changes

Cohort / File(s) Summary
Multimodal Processor (Python)
components/src/dynamo/trtllm/multimodal_processor.py
Removed early token_id extraction in PD/EPD paths; added handling for extra_args.formatted_prompt; EPD-NIXL embeddings are structured as {"image": [...]}; PD flow now separates image URLs vs embedding files (.pt/.pth/.bin), loads embedding tensors, attaches formatted_prompt when present, warns/returns None if required prompt missing; added stricter size/access checks and more verbose logging.
Preprocessor (Rust)
lib/llm/src/preprocessor.rs
Added formatted_prompt: Option<String> parameter to gather_multi_modal_data and threaded it through gather_tokens, preprocess_request, and related call sites; extra_args may now include formatted_prompt when multimodal data is present; preserved prior behavior when formatted_prompt is None.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hop through tensors, soft and light,
I tuck prompts where vectors take flight,
From Python burrow to Rusty glen,
Embeddings hum — we sing again,
A tiny rabbit, coding delight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning PR description lacks required details. Overview references PR #6840 but Details section is empty; Where should reviewer start is missing; specific files to review are not called out. Complete the Details section with specific changes made, and add a 'Where should the reviewer start' section naming critical files (components/src/dynamo/trtllm/multimodal_processor.py and lib/llm/src/preprocessor.rs) for focused review.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: TRT-LLM multimodal preprocessor - remove default_multimodal_input_loader from the embedding paths' is specific and directly reflects the main change: removing default_multimodal_input_loader calls from embedding paths and passing formatted prompts instead.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@moraxu moraxu requested review from 2ez4bz and indrajit96 March 5, 2026 05:14
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/trtllm/multimodal_processor.py`:
- Line 326: The current logging line logs the entire processed_inputs (including
raw prompt text and tensors) which can leak sensitive prompts and create huge
logs; change the logging in the block that emits logging.info(f"Processed
inputs: {processed_inputs}") to instead log a sanitized summary: redact or omit
the prompt text and only log safe metadata such as tensor shapes/dtypes and
masked prompt length (or a boolean indicating presence of prompt), or call a
sanitizer function (e.g., sanitize_processed_inputs) before logging; ensure you
update the logging level to debug if needed and retain enough info for debugging
without printing full prompt content or tensor data.

In `@lib/llm/src/preprocessor.rs`:
- Around line 1278-1281: Formatting is failing around the call to
self.gather_multi_modal_data in preprocessor.rs; run rustfmt (cargo fmt) to
format the file and commit the updated formatting so the call site and
surrounding block conform to rustfmt rules (e.g., adjust spacing/indentation
around the builder invocation and the await? operator). Ensure the formatted
changes that include the gather_multi_modal_data(&request, &mut builder,
None).await? line are committed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 486b9c66-0b40-4d2e-bca7-fa7976520e2b

📥 Commits

Reviewing files that changed from the base of the PR and between 869e733 and 61ff322.

📒 Files selected for processing (2)
  • components/src/dynamo/trtllm/multimodal_processor.py
  • lib/llm/src/preprocessor.rs

Signed-off-by: Michal Guzek <mguzek@nvidia.com>
@moraxu moraxu force-pushed the trtllm-multimodal-processor-fix-the-path-for-embeddings branch from 61ff322 to 3ec32fa Compare March 5, 2026 17:21
@pull-request-size pull-request-size bot added size/L and removed size/M labels Mar 5, 2026
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Copy link
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmccorm4
@KrishnanPrash
@grahamking
For rust side changes

@moraxu worker changes LGTM except minor changes
Ran my local tests ALL PASS

Signed-off-by: Michal Guzek <mguzek@nvidia.com>
@moraxu moraxu requested a review from indrajit96 March 5, 2026 20:50
@moraxu moraxu enabled auto-merge (squash) March 5, 2026 21:45
Copy link
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@indrajit96
Copy link
Contributor

/ok to test a8e8d25

@moraxu moraxu merged commit e6ddf0e into ai-dynamo:main Mar 6, 2026
142 of 147 checks passed
indrajit96 pushed a commit that referenced this pull request Mar 6, 2026
…t_loader from the embedding paths (#6924)

Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
saturley-hall pushed a commit that referenced this pull request Mar 7, 2026
…loader (#6924) (#6993)

Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::trtllm Relates to the trtllm backend external-contribution Pull request is from an external contributor fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` multimodal size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants