Add tp.attention op by yizhuoz004 · Pull Request #709 · NVIDIA/TensorRT-Incubator

yizhuoz004 · 2025-11-04T01:12:01Z

No description provided.

yizhuoz004 · 2025-11-04T01:17:58Z

tripy/nvtripy/frontend/ops/attention.py

+        query: The query tensor with shape ``[batchSize, numHeadsQuery, sequenceLengthQuery, dimHead]``.
+        key: The key tensor with shape ``[batchSize, numHeadsKeyValue, sequenceLengthKeyValue, dimHead]``.
+        value: The value tensor with shape ``[batchSize, numHeadsKeyValue, sequenceLengthKeyValue, dimHead]``.


I need to verify why in TRT we use numHeadsQuery and numHeadsKeyValue separately.

pranavm-nvidia · 2025-11-04T19:56:41Z

tripy/nvtripy/frontend/ops/attention.py

+    5. Matrix multiplication with value (BMM2)
+
+    Args:
+        query: The query tensor with shape ``[batchSize, numHeadsQuery, sequenceLengthQuery, dimHead]``.


nit: can we use snake_case to be consistent with the rest of the documentation?

pranavm-nvidia · 2025-11-04T19:58:02Z

tripy/nvtripy/trace/ops/utils.py

+##
+
+
+def get_trt_dtype_enum_str(dtype: "nvtripy.dtype") -> str:


I think we should make this a property of dtype so we don't have to update multiple places when adding new dtypes.

akhilg-nv · 2025-11-13T05:24:31Z

tripy/nvtripy/frontend/ops/attention.py

+
+        assert output.shape == (batch_size, num_heads, seq_len, head_dim)
+
+    .. code-block:: python


Since the inputs to all 3 examples are the same, can we omit the input initialization in the docs so that it is easier to tell what is changing between the samples? Also, can we have the quantization sample omit the mask?

I'm conflicted on this - on one hand, it will make the examples much cleaner, but on the other, it'll mean that you can't just copy-paste the example code and have it work.

If all the tensors are the same shape, maybe a compromise could be:

query = key = value = tp.iota(...)

although we would need to clarify that it's only being done for the sake of brevity and they don't all need to be the same tensor.

yizhuoz004 requested a review from pranavm-nvidia November 4, 2025 01:12

yizhuoz004 force-pushed the tp-attention branch from 198b12c to cf6e04e Compare November 4, 2025 01:14

yizhuoz004 commented Nov 4, 2025

View reviewed changes

yizhuoz004 changed the title ~~Add AttentionOp~~ Add tp.attention op Nov 4, 2025

pranavm-nvidia reviewed Nov 4, 2025

View reviewed changes

yizhuoz004 force-pushed the tp-attention branch from cf6e04e to a58916a Compare November 13, 2025 05:04

akhilg-nv reviewed Nov 13, 2025

View reviewed changes

Add tp.attention op

d3949d7

yizhuoz004 force-pushed the tp-attention branch from a58916a to d3949d7 Compare November 13, 2025 23:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tp.attention op#709

Add tp.attention op#709
yizhuoz004 wants to merge 1 commit intomainfrom
tp-attention

yizhuoz004 commented Nov 4, 2025

Uh oh!

yizhuoz004 Nov 4, 2025

Uh oh!

pranavm-nvidia Nov 4, 2025

Uh oh!

pranavm-nvidia Nov 4, 2025

Uh oh!

akhilg-nv Nov 13, 2025

Uh oh!

pranavm-nvidia Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		##


		def get_trt_dtype_enum_str(dtype: "nvtripy.dtype") -> str:


		assert output.shape == (batch_size, num_heads, seq_len, head_dim)

		.. code-block:: python

Conversation

yizhuoz004 commented Nov 4, 2025

Uh oh!

yizhuoz004 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

pranavm-nvidia Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

pranavm-nvidia Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

akhilg-nv Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

pranavm-nvidia Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants