Skip to content

CUDA kernel registration supports#198

Merged
jiannanWang merged 30 commits intomainfrom
jiannanwang/loadinline
Jan 8, 2026
Merged

CUDA kernel registration supports#198
jiannanWang merged 30 commits intomainfrom
jiannanwang/loadinline

Conversation

@jiannanWang
Copy link
Copy Markdown
Contributor

This pull request introduces the following changes:

  1. Adds load_inline functionality and enables support for CUDA kernel registration.
  2. Introduces a script for simple CUDA kernel creation
  3. Adds tests for CUDA kernel registration

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 28, 2025
@jiannanWang jiannanWang marked this pull request as ready for review October 28, 2025 21:17
@jiannanWang jiannanWang marked this pull request as draft October 28, 2025 21:26
@jiannanWang jiannanWang marked this pull request as ready for review October 30, 2025 06:21
cpp_sources=cpp_source,
cuda_sources=cuda_source,
functions=[folder_name],
verbose=True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the no implicit headers mode, otherwise this function will take 90s per call

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set no_implicit headers to Ture and added the header to CUDA files. As a result, the running time for TestDirectoryBackendCUDA decreased from approximately 50 seconds to around 10 seconds.

@jiannanWang
Copy link
Copy Markdown
Contributor Author

Update:

  • set no_implicit_headers to True and the running time for TestDirectoryBackendCUDA dropped from 50s to 10s.
  • CI env doesn't have CUDA_HOME and thus cannot run TestDirectoryBackendCUDA. TestDirectoryBackendCUDA is skipped in CI.

@msaroufim
Copy link
Copy Markdown
Contributor

You should be able to set CUDA_HOME on the CI runners though, it's a T4 machine with a GPU

@jiannanWang
Copy link
Copy Markdown
Contributor Author

You should be able to set CUDA_HOME on the CI runners though, it's a T4 machine with a GPU

Yeah I found that I can install the CUDA toolkit directly on the CI runner. Now the tests are running successfully:

test/test_directory_backend.py::TestDirectoryBackendCUDA::test_add_operation PASSED [ 18%]
test/test_directory_backend.py::TestDirectoryBackendCUDA::test_backend_loading PASSED [ 19%]
test/test_directory_backend.py::TestDirectoryBackendCUDA::test_kernel_directories_exist PASSED [ 20%]

@msaroufim
Copy link
Copy Markdown
Contributor

Is this ready for a review?

@jiannanWang
Copy link
Copy Markdown
Contributor Author

Is this ready for a review?

Yes!

cuda_source = ""

# Read both files if they exist
if os.path.exists(cu_file):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can simplify this a a bit and only make the LLM spit out the .cu file - the cpp file should typically be quite simple for us to provide. see this as an example https://github.com/gpu-mode/reference-kernels/blob/main/problems/pmpp/vectoradd_py/solutions/correct/submission_cuda_inline.py#L48

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved! Added a new parameter load_cpp_source which by default set to false. It controls whether to load cpp source from the .cpp file (load_cpp_source=true) or to generate the cpp source content from the cuda source (oad_cpp_source=false)

Copy link
Copy Markdown
Contributor

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looking good, some minor questions

@jiannanWang jiannanWang merged commit 2a8d7e1 into main Jan 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants