Skip to content

Sync with Microsoft ONNX Runtime - 04042026#1024

Open
ai-fw-intg wants to merge 6 commits intoovep-developfrom
sync_msft_04042026
Open

Sync with Microsoft ONNX Runtime - 04042026#1024
ai-fw-intg wants to merge 6 commits intoovep-developfrom
sync_msft_04042026

Conversation

@ai-fw-intg
Copy link
Copy Markdown

Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.

edgchen1 and others added 6 commits April 2, 2026 22:57
…crosoft#27945)

### Description
<!-- Describe your changes. -->

Add support for specifying dynamic plugin EP configuration via a JSON file path in the ORT_UNIT_TEST_MAIN_DYNAMIC_PLUGIN_EP_CONFIG_JSON_FILE environment variable. This is mutually exclusive with the specifying inline JSON using the existing ORT_UNIT_TEST_MAIN_DYNAMIC_PLUGIN_EP_CONFIG_JSON environment variable.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Allow more flexibility in specifying configuration. It may be impractical to put everything in an environment variable.
### Description
Address some good leftover comments from PR that added EP APIs to
retrieve operator schemas:
microsoft#27713



### Motivation and Context
Clean up as promised
### Description
<!-- Describe your changes. -->

This pull request adds support for Conv3D operations to the WebGPU
execution provider in ONNX Runtime. The main changes include
implementing a new naive Conv3D shader, updating the convolution logic
to handle 3D convolutions, and enabling relevant tests for Conv3D on
WebGPU. Grouped Conv3D is not yet supported.

**Conv3D WebGPU support:**

* Added a new `Conv3DNaiveProgram` class (`conv3d_naive.h`,
`conv3d_naive.cc`) that implements a per-element Conv3D shader for
WebGPU, supporting both "channels last" and "channels first" layouts,
with optional bias and activation.
* Updated the main convolution logic in `conv.cc` to detect 5D tensors
(Conv3D), construct the appropriate shader program, and pass
spatial/stride/dilation parameters as uniforms. Grouped Conv3D is
explicitly disallowed for now.
* Included the new `conv3d_naive.h` header in the main convolution
implementation.

**Test coverage:**

* Enabled Conv3D tests for the WebGPU provider by removing it from the
excluded execution providers in several Conv3D test cases
(`conv_op_test.cc`).
* Added a note to the Conv3D fp16 test indicating that enabling it for
WebGPU will require additional infrastructure to conditionally skip
based on device capabilities.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Support additional cases in WebGPU EP Conv kernel.
…ernel::UseSharePrePackedBuffers` (microsoft#27924)

### Description

Consolidate `OpKernel::UseSharedPrePackedBuffers` and
`OpKernel::UseSharedPrePackedBuffers_V2` into a single virtual method,
resolving the TODO in `op_kernel.h`.

#### Background

The `OpKernel` class previously had two virtual methods for consuming
shared pre-packed weight buffers:

- **`UseSharedPrePackedBuffers`** (V1) — 3 params: `prepacked_buffers`,
`input_idx`, `used_shared_buffers`
- **`UseSharedPrePackedBuffers_V2`** — 4 params: added
`prepacked_buffer_sizes` (a `gsl::span<const size_t>`)

V2 was introduced to pass buffer sizes alongside the buffers. Its
default implementation forwarded to V1 for backward compatibility. The
framework (`session_state.cc`) only ever called V2.

#### Changes

Merged both methods into a single `UseSharedPrePackedBuffers` using the
V2 signature:

```cpp
virtual Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& prepacked_buffers,
                                         gsl::span<const size_t> prepacked_buffer_sizes,
                                         int input_idx,
                                         /*out*/ bool& used_shared_buffers);
```

Updated **27 files** across the codebase:

| Category | Files | Change |
|----------|-------|--------|
| Base class | `op_kernel.h` | Removed V1 + V2; single 4-param method |
| Framework | `session_state.cc` | Renamed `_V2` call |
| Plugin EP bridge | `ep_kernel_registration.cc` | Renamed override |
| QMoECPU | `moe_quantization_cpu.h/.cc` | Renamed V2 override +
template instantiations |
| CPU provider (8 kernels) | `gemm`, `matmul`, `conv_transpose`,
`fp16_conv`, `qlinearconv`, `matmul_integer_base`, `deep_cpu_lstm`,
`deep_cpu_gru` | Added `prepacked_buffer_sizes` param |
| ACL provider (2 kernels) | `acl/conv`, `acl/matmul` | Added param |
| Contrib ops (4 kernels) | `matmul_nbits`, `dynamic_quantize_lstm`,
`attention_quant`, `bert/attention` | Added param |
| Tests | `session_state_test.cc` | Updated test kernel override |

#### Notes

- Existing V1 overrides add the new `prepacked_buffer_sizes` parameter
as **unnamed/unused** (`/*prepacked_buffer_sizes*/`) — no logic changes
in those kernels.
- The C API (`SetSharedPrePackedWeight` in `onnxruntime_ep_c_api.h`)
already passes buffer sizes, so **no C API changes** were needed.
- Private helper functions (e.g., `UseSharedPrePackedBuffersImpl` in
LSTM/GRU) are not virtual overrides and were **not modified**.

### Motivation and Context

Addresses the TODO at
`include/onnxruntime/core/framework/op_kernel.h:139`:

> TODO: Consolidate UseSharedPrePackedBuffers and
UseSharedPrePackedBuffers_V2 into a single function, which will require
updating kernel-based provider-bridge EPs (cpu, cuda, webgpu).
### Description
Update the Attention Fusion optimizer to help fuse the Attention
subgraph pattern in MobileClip model. The perf gain from this itself is
paltry (mostly from not having to launch many kernels) but the real gain
will be AFTER this fusion (i.e.) tuning the performance of the MHA
kernel for the problem shapes seen in this model.

There are 2 Attention blocks found in the model and this update fuses
both of them.



### Motivation and Context
Improve performance of MobileClip model

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants