CUDA kernel registration supports by jiannanWang · Pull Request #198 · meta-pytorch/BackendBench

jiannanWang · 2025-10-28T21:14:09Z

This pull request introduces the following changes:

Adds load_inline functionality and enables support for CUDA kernel registration.
Introduces a script for simple CUDA kernel creation
Adds tests for CUDA kernel registration

msaroufim · 2025-10-30T16:53:15Z

BackendBench/backends/directory.py

+            cpp_sources=cpp_source,
+            cuda_sources=cuda_source,
+            functions=[folder_name],
+            verbose=True,


check the no implicit headers mode, otherwise this function will take 90s per call

I set no_implicit headers to Ture and added the header to CUDA files. As a result, the running time for TestDirectoryBackendCUDA decreased from approximately 50 seconds to around 10 seconds.

jiannanWang · 2025-10-30T20:44:25Z

Update:

set no_implicit_headers to True and the running time for TestDirectoryBackendCUDA dropped from 50s to 10s.
CI env doesn't have CUDA_HOME and thus cannot run TestDirectoryBackendCUDA. TestDirectoryBackendCUDA is skipped in CI.

msaroufim · 2025-12-16T22:51:04Z

You should be able to set CUDA_HOME on the CI runners though, it's a T4 machine with a GPU

jiannanWang · 2025-12-18T16:04:45Z

You should be able to set CUDA_HOME on the CI runners though, it's a T4 machine with a GPU

Yeah I found that I can install the CUDA toolkit directly on the CI runner. Now the tests are running successfully:

test/test_directory_backend.py::TestDirectoryBackendCUDA::test_add_operation PASSED [ 18%]
test/test_directory_backend.py::TestDirectoryBackendCUDA::test_backend_loading PASSED [ 19%]
test/test_directory_backend.py::TestDirectoryBackendCUDA::test_kernel_directories_exist PASSED [ 20%]

msaroufim · 2026-01-05T19:46:31Z

Is this ready for a review?

jiannanWang · 2026-01-05T20:33:08Z

Is this ready for a review?

Yes!

backend_bench_problems.parquet

msaroufim · 2026-01-07T05:28:29Z

BackendBench/backends/directory.py

+        cuda_source = ""
+
+        # Read both files if they exist
+        if os.path.exists(cu_file):


I'm wondering if we can simplify this a a bit and only make the LLM spit out the .cu file - the cpp file should typically be quite simple for us to provide. see this as an example https://github.com/gpu-mode/reference-kernels/blob/main/problems/pmpp/vectoradd_py/solutions/correct/submission_cuda_inline.py#L48

Solved! Added a new parameter load_cpp_source which by default set to false. It controls whether to load cpp source from the .cpp file (load_cpp_source=true) or to generate the cpp source content from the cuda source (oad_cpp_source=false)

msaroufim

mostly looking good, some minor questions

add loadinline to support cuda kernel

fd9109c

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 28, 2025

jiannanWang marked this pull request as ready for review October 28, 2025 21:17

jiannanWang marked this pull request as draft October 28, 2025 21:26

jiannanWang added 11 commits October 28, 2025 15:57

update

93e5d4c

Merge branch 'main' into jiannanwang/loadinline

e896275

solve conflict

4a8d599

Merge branch 'main' into jiannanwang/loadinline

b5f441d

ruff

222a7fd

fix

d7f8074

fix

d1c8649

add ninja to ci

334a39e

set CUDA_HOME

08b7054

add skip

44a427b

fix

1cf7b4d

jiannanWang marked this pull request as ready for review October 30, 2025 06:21

msaroufim reviewed Oct 30, 2025

View reviewed changes

jiannanWang added 11 commits October 30, 2025 11:09

add no_implicit_headers

401473e

test

39eb648

test cuda version

e417006

update

e65f8e9

install cuda toolkit in ci

945353d

install cuda toolkit in ci

c35add9

install cuda toolkit in ci

e99eb6a

fix

05fe3b8

fix

734c933

fix

42fbe36

skip cuda testing in CI since no CUDA_HOME

f75fbaf

jiannanWang added 5 commits December 17, 2025 17:12

merge

c2f179d

find cuda path

27af6f5

install cuda toolkit

4f95593

install cuda toolkit 12.8

eb594ae

fix

50e99c3

msaroufim reviewed Jan 7, 2026

View reviewed changes

backend_bench_problems.parquet Outdated Show resolved Hide resolved

msaroufim reviewed Jan 7, 2026

View reviewed changes

msaroufim approved these changes Jan 7, 2026

View reviewed changes

jiannanWang added 2 commits January 7, 2026 21:02

generate cpp source from cu source

ee2c8c0

add comment

6d74881

jiannanWang merged commit 2a8d7e1 into main Jan 8, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA kernel registration supports#198

CUDA kernel registration supports#198
jiannanWang merged 30 commits intomainfrom
jiannanwang/loadinline

jiannanWang commented Oct 28, 2025

Uh oh!

msaroufim Oct 30, 2025

Uh oh!

jiannanWang Oct 30, 2025

Uh oh!

jiannanWang commented Oct 30, 2025

Uh oh!

msaroufim commented Dec 16, 2025

Uh oh!

jiannanWang commented Dec 18, 2025

Uh oh!

msaroufim commented Jan 5, 2026

Uh oh!

jiannanWang commented Jan 5, 2026

Uh oh!

Uh oh!

msaroufim Jan 7, 2026

Uh oh!

jiannanWang Jan 8, 2026

Uh oh!

msaroufim left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiannanWang commented Oct 28, 2025

Uh oh!

msaroufim Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jiannanWang Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jiannanWang commented Oct 30, 2025

Uh oh!

msaroufim commented Dec 16, 2025

Uh oh!

jiannanWang commented Dec 18, 2025

Uh oh!

msaroufim commented Jan 5, 2026

Uh oh!

jiannanWang commented Jan 5, 2026

Uh oh!

Uh oh!

msaroufim Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

jiannanWang Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

msaroufim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants