Sync with Microsoft ONNX Runtime - 04042026 by ai-fw-intg · Pull Request #1024 · intel/onnxruntime

ai-fw-intg · 2026-04-03T21:05:51Z

Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.

…crosoft#27945) ### Description  Add support for specifying dynamic plugin EP configuration via a JSON file path in the ORT_UNIT_TEST_MAIN_DYNAMIC_PLUGIN_EP_CONFIG_JSON_FILE environment variable. This is mutually exclusive with the specifying inline JSON using the existing ORT_UNIT_TEST_MAIN_DYNAMIC_PLUGIN_EP_CONFIG_JSON environment variable. ### Motivation and Context  Allow more flexibility in specifying configuration. It may be impractical to put everything in an environment variable.

### Description Address some good leftover comments from PR that added EP APIs to retrieve operator schemas: microsoft#27713 ### Motivation and Context Clean up as promised

### Description  This pull request adds support for Conv3D operations to the WebGPU execution provider in ONNX Runtime. The main changes include implementing a new naive Conv3D shader, updating the convolution logic to handle 3D convolutions, and enabling relevant tests for Conv3D on WebGPU. Grouped Conv3D is not yet supported. **Conv3D WebGPU support:** * Added a new `Conv3DNaiveProgram` class (`conv3d_naive.h`, `conv3d_naive.cc`) that implements a per-element Conv3D shader for WebGPU, supporting both "channels last" and "channels first" layouts, with optional bias and activation. * Updated the main convolution logic in `conv.cc` to detect 5D tensors (Conv3D), construct the appropriate shader program, and pass spatial/stride/dilation parameters as uniforms. Grouped Conv3D is explicitly disallowed for now. * Included the new `conv3d_naive.h` header in the main convolution implementation. **Test coverage:** * Enabled Conv3D tests for the WebGPU provider by removing it from the excluded execution providers in several Conv3D test cases (`conv_op_test.cc`). * Added a note to the Conv3D fp16 test indicating that enabling it for WebGPU will require additional infrastructure to conditionally skip based on device capabilities. ### Motivation and Context  Support additional cases in WebGPU EP Conv kernel.

…ernel::UseSharePrePackedBuffers` (microsoft#27924) ### Description Consolidate `OpKernel::UseSharedPrePackedBuffers` and `OpKernel::UseSharedPrePackedBuffers_V2` into a single virtual method, resolving the TODO in `op_kernel.h`. #### Background The `OpKernel` class previously had two virtual methods for consuming shared pre-packed weight buffers: - **`UseSharedPrePackedBuffers`** (V1) — 3 params: `prepacked_buffers`, `input_idx`, `used_shared_buffers` - **`UseSharedPrePackedBuffers_V2`** — 4 params: added `prepacked_buffer_sizes` (a `gsl::span<const size_t>`) V2 was introduced to pass buffer sizes alongside the buffers. Its default implementation forwarded to V1 for backward compatibility. The framework (`session_state.cc`) only ever called V2. #### Changes Merged both methods into a single `UseSharedPrePackedBuffers` using the V2 signature: ```cpp virtual Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& prepacked_buffers, gsl::span<const size_t> prepacked_buffer_sizes, int input_idx, /*out*/ bool& used_shared_buffers); ``` Updated **27 files** across the codebase: | Category | Files | Change | |----------|-------|--------| | Base class | `op_kernel.h` | Removed V1 + V2; single 4-param method | | Framework | `session_state.cc` | Renamed `_V2` call | | Plugin EP bridge | `ep_kernel_registration.cc` | Renamed override | | QMoECPU | `moe_quantization_cpu.h/.cc` | Renamed V2 override + template instantiations | | CPU provider (8 kernels) | `gemm`, `matmul`, `conv_transpose`, `fp16_conv`, `qlinearconv`, `matmul_integer_base`, `deep_cpu_lstm`, `deep_cpu_gru` | Added `prepacked_buffer_sizes` param | | ACL provider (2 kernels) | `acl/conv`, `acl/matmul` | Added param | | Contrib ops (4 kernels) | `matmul_nbits`, `dynamic_quantize_lstm`, `attention_quant`, `bert/attention` | Added param | | Tests | `session_state_test.cc` | Updated test kernel override | #### Notes - Existing V1 overrides add the new `prepacked_buffer_sizes` parameter as **unnamed/unused** (`/*prepacked_buffer_sizes*/`) — no logic changes in those kernels. - The C API (`SetSharedPrePackedWeight` in `onnxruntime_ep_c_api.h`) already passes buffer sizes, so **no C API changes** were needed. - Private helper functions (e.g., `UseSharedPrePackedBuffersImpl` in LSTM/GRU) are not virtual overrides and were **not modified**. ### Motivation and Context Addresses the TODO at `include/onnxruntime/core/framework/op_kernel.h:139`: > TODO: Consolidate UseSharedPrePackedBuffers and UseSharedPrePackedBuffers_V2 into a single function, which will require updating kernel-based provider-bridge EPs (cpu, cuda, webgpu).

### Description Update the Attention Fusion optimizer to help fuse the Attention subgraph pattern in MobileClip model. The perf gain from this itself is paltry (mostly from not having to launch many kernels) but the real gain will be AFTER this fusion (i.e.) tuning the performance of the MHA kernel for the problem shapes seen in this model. There are 2 Attention blocks found in the model and this update fuses both of them. ### Motivation and Context Improve performance of MobileClip model --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

edgchen1 and others added 6 commits April 2, 2026 22:57

Cleanup for op schema API tests for plugin EPs (microsoft#27921)

a18e5b9

### Description Address some good leftover comments from PR that added EP APIs to retrieve operator schemas: microsoft#27713 ### Motivation and Context Clean up as promised

Merge remote-tracking branch 'origin/master' into sync_msft_04042026

6b5e80f

ai-fw-intg requested review from Jaswanth51, ankitm3k, jatinwadhwa921 and vthaniel April 3, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with Microsoft ONNX Runtime - 04042026#1024

Sync with Microsoft ONNX Runtime - 04042026#1024
ai-fw-intg wants to merge 6 commits intoovep-developfrom
sync_msft_04042026

ai-fw-intg commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ai-fw-intg commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants