Sync with Microsoft ONNX Runtime - 04042026#1024
Open
ai-fw-intg wants to merge 6 commits intoovep-developfrom
Open
Sync with Microsoft ONNX Runtime - 04042026#1024ai-fw-intg wants to merge 6 commits intoovep-developfrom
ai-fw-intg wants to merge 6 commits intoovep-developfrom
Conversation
…crosoft#27945) ### Description <!-- Describe your changes. --> Add support for specifying dynamic plugin EP configuration via a JSON file path in the ORT_UNIT_TEST_MAIN_DYNAMIC_PLUGIN_EP_CONFIG_JSON_FILE environment variable. This is mutually exclusive with the specifying inline JSON using the existing ORT_UNIT_TEST_MAIN_DYNAMIC_PLUGIN_EP_CONFIG_JSON environment variable. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Allow more flexibility in specifying configuration. It may be impractical to put everything in an environment variable.
### Description Address some good leftover comments from PR that added EP APIs to retrieve operator schemas: microsoft#27713 ### Motivation and Context Clean up as promised
### Description <!-- Describe your changes. --> This pull request adds support for Conv3D operations to the WebGPU execution provider in ONNX Runtime. The main changes include implementing a new naive Conv3D shader, updating the convolution logic to handle 3D convolutions, and enabling relevant tests for Conv3D on WebGPU. Grouped Conv3D is not yet supported. **Conv3D WebGPU support:** * Added a new `Conv3DNaiveProgram` class (`conv3d_naive.h`, `conv3d_naive.cc`) that implements a per-element Conv3D shader for WebGPU, supporting both "channels last" and "channels first" layouts, with optional bias and activation. * Updated the main convolution logic in `conv.cc` to detect 5D tensors (Conv3D), construct the appropriate shader program, and pass spatial/stride/dilation parameters as uniforms. Grouped Conv3D is explicitly disallowed for now. * Included the new `conv3d_naive.h` header in the main convolution implementation. **Test coverage:** * Enabled Conv3D tests for the WebGPU provider by removing it from the excluded execution providers in several Conv3D test cases (`conv_op_test.cc`). * Added a note to the Conv3D fp16 test indicating that enabling it for WebGPU will require additional infrastructure to conditionally skip based on device capabilities. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Support additional cases in WebGPU EP Conv kernel.
…ernel::UseSharePrePackedBuffers` (microsoft#27924) ### Description Consolidate `OpKernel::UseSharedPrePackedBuffers` and `OpKernel::UseSharedPrePackedBuffers_V2` into a single virtual method, resolving the TODO in `op_kernel.h`. #### Background The `OpKernel` class previously had two virtual methods for consuming shared pre-packed weight buffers: - **`UseSharedPrePackedBuffers`** (V1) — 3 params: `prepacked_buffers`, `input_idx`, `used_shared_buffers` - **`UseSharedPrePackedBuffers_V2`** — 4 params: added `prepacked_buffer_sizes` (a `gsl::span<const size_t>`) V2 was introduced to pass buffer sizes alongside the buffers. Its default implementation forwarded to V1 for backward compatibility. The framework (`session_state.cc`) only ever called V2. #### Changes Merged both methods into a single `UseSharedPrePackedBuffers` using the V2 signature: ```cpp virtual Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& prepacked_buffers, gsl::span<const size_t> prepacked_buffer_sizes, int input_idx, /*out*/ bool& used_shared_buffers); ``` Updated **27 files** across the codebase: | Category | Files | Change | |----------|-------|--------| | Base class | `op_kernel.h` | Removed V1 + V2; single 4-param method | | Framework | `session_state.cc` | Renamed `_V2` call | | Plugin EP bridge | `ep_kernel_registration.cc` | Renamed override | | QMoECPU | `moe_quantization_cpu.h/.cc` | Renamed V2 override + template instantiations | | CPU provider (8 kernels) | `gemm`, `matmul`, `conv_transpose`, `fp16_conv`, `qlinearconv`, `matmul_integer_base`, `deep_cpu_lstm`, `deep_cpu_gru` | Added `prepacked_buffer_sizes` param | | ACL provider (2 kernels) | `acl/conv`, `acl/matmul` | Added param | | Contrib ops (4 kernels) | `matmul_nbits`, `dynamic_quantize_lstm`, `attention_quant`, `bert/attention` | Added param | | Tests | `session_state_test.cc` | Updated test kernel override | #### Notes - Existing V1 overrides add the new `prepacked_buffer_sizes` parameter as **unnamed/unused** (`/*prepacked_buffer_sizes*/`) — no logic changes in those kernels. - The C API (`SetSharedPrePackedWeight` in `onnxruntime_ep_c_api.h`) already passes buffer sizes, so **no C API changes** were needed. - Private helper functions (e.g., `UseSharedPrePackedBuffersImpl` in LSTM/GRU) are not virtual overrides and were **not modified**. ### Motivation and Context Addresses the TODO at `include/onnxruntime/core/framework/op_kernel.h:139`: > TODO: Consolidate UseSharedPrePackedBuffers and UseSharedPrePackedBuffers_V2 into a single function, which will require updating kernel-based provider-bridge EPs (cpu, cuda, webgpu).
### Description Update the Attention Fusion optimizer to help fuse the Attention subgraph pattern in MobileClip model. The perf gain from this itself is paltry (mostly from not having to launch many kernels) but the real gain will be AFTER this fusion (i.e.) tuning the performance of the MHA kernel for the problem shapes seen in this model. There are 2 Attention blocks found in the model and this update fuses both of them. ### Motivation and Context Improve performance of MobileClip model --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.