[ROCm] Use BFCAllocator for scratch allocations needed for MIOpen aut… by draganmladjenovic · Pull Request #756 · ROCm/xla

draganmladjenovic · 2026-03-25T12:48:00Z

…otuning

Motivation

Technical Details

…otuning

claude · 2026-03-26T15:38:19Z

xla/backends/gpu/autotuner/miopen.h

 private:
  bool IsSupported(const HloInstruction& instr) override;
  bool do_not_autotune_;
+  stream_executor::DeviceAddressAllocator* allocator_;


nit: The allocator_ member is a raw non-owning DeviceAddressAllocator* that can potentially be null. In GpuCompiler::AddConvAndGemmAutotuningPass (gpu_compiler.cc:3293), options.device_allocator is passed through, and CompileOptions::device_allocator defaults to nullptr (compiler.h:129). The old code was self-contained (created its own StreamExecutorMemoryAllocator locally), so this path was never null-unsafe before.

Consider adding a CHECK(allocator != nullptr) in the MIOpenBackend constructor, or at minimum documenting the non-null precondition.

claude · 2026-03-26T15:38:22Z

xla/backends/gpu/autotuner/factory.h

 struct GetCodegenBackends {
  using Type = std::function<std::vector<std::unique_ptr<CodegenBackend>>(
-      stream_executor::StreamExecutor*, const DebugOptions*, Compiler*,
+      stream_executor::StreamExecutor*,


nit (IWYU): factory.h uses DeviceAddressAllocator* in this signature but doesn't directly include xla/stream_executor/device_address_allocator.h. It relies on a transitive include through compiler.h. If compiler.h ever stops including that header, this file would break. Consider adding a direct include.

claude · 2026-03-26T15:38:25Z

xla/backends/gpu/autotuner/miopen.cc

-  size_t result_buffers_count = instr->shape().tuple_shapes().size();
-  result_buffers.reserve(result_buffers_count);
+  absl::InlinedVector<se::DeviceAddressBase, 1> result_buffers;
+  size_t result_buffers_count = instr->shape().tuple_shapes().size() - 1;


nit: The fix from .size() to .size() - 1 (to exclude the workspace u8[0] element) is correct, but it's a logically independent bug fix from the allocator refactoring. Consider separating this into its own commit for cleaner bisect history.

claude · 2026-03-26T15:38:26Z

xla/backends/gpu/autotuner/factory_cuda.cc


 std::vector<std::unique_ptr<CodegenBackend>> GetCodegenBackendsForCuda(
    stream_executor::StreamExecutor* stream_executor,
+    stream_executor::DeviceAddressAllocator* device_allocator,


nit: device_allocator is unused in the CUDA path. Consider adding [[maybe_unused]] or (void)device_allocator; to suppress potential compiler warnings and make the intent explicit.

claude · 2026-03-26T15:38:28Z

xla/backends/gpu/autotuner/miopen.cc

      GetDNNDataTypeFromPrimitiveType(gpu_conv_config.output_type));
  se::dnn::DnnSupport* dnn = stream_executor->AsDnn();
-  se::StreamExecutorMemoryAllocator allocator(stream_executor);
+  std::unique_ptr<se::Stream> owned_stream;


Note: The stream creation path changed from allocator.GetStream() (which reused a cached stream) to stream_executor->CreateStream() (which creates a fresh stream per call). This is functionally correct and the lifetime is properly managed, but it's a behavioral change worth being aware of — previously streams could be reused across calls.

claude · 2026-03-26T15:38:40Z

Claude Code Review Summary

Overall: Clean refactoring that threads DeviceAddressAllocator through the MIOpen autotuning path to enable BFC allocation for scratch memory. The change is well-structured and the allocator lifetime management is correct.

Key items flagged inline:

Potential null-pointer risk on the new allocator_ member in MIOpenBackend (previously self-contained, now depends on caller-provided allocator which defaults to nullptr)
Missing direct include for DeviceAddressAllocator in factory.h (IWYU)
Independent bug fix (result buffer count) mixed into the refactoring commit
Unused parameter in CUDA factory path
Behavioral change in stream management (creation vs. reuse)

See inline comments for details.

🤖 Generated with Claude Code

[ROCm] Use BFCAllocator for scratch allocations needed for MIOpen aut…

9ffc786

…otuning

draganmladjenovic force-pushed the draganm/rocm-jaxlib-v0.9.1-conv-scratch branch from 0341010 to 9ffc786 Compare March 25, 2026 12:52

i-chaochen self-requested a review March 25, 2026 13:16

i-chaochen added cherry-pick-candidate Mark a PR to be cherry-picked into the next ROCm JAX. Remove IIF the latest upstream contain the PR. rocm-jaxlib-v0.9.1 labels Mar 25, 2026

draganmladjenovic added sanitizers Execute sanitizer pipelines for that PR claude-review Request a Claude AI code review for this PR labels Mar 26, 2026

claude bot reviewed Mar 26, 2026

View reviewed changes

github-actions bot removed the claude-review Request a Claude AI code review for this PR label Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Use BFCAllocator for scratch allocations needed for MIOpen aut…#756

[ROCm] Use BFCAllocator for scratch allocations needed for MIOpen aut…#756
draganmladjenovic wants to merge 1 commit intorocm-jaxlib-v0.9.1from
draganm/rocm-jaxlib-v0.9.1-conv-scratch

draganmladjenovic commented Mar 25, 2026 •

edited by i-chaochen

Loading

Uh oh!

claude bot Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Uh oh!

claude bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

draganmladjenovic commented Mar 25, 2026 • edited by i-chaochen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 26, 2026

Claude Code Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

draganmladjenovic commented Mar 25, 2026 •

edited by i-chaochen

Loading