Tracking issue for targeting AMDGPU devices

This issue tracks progress/roadmap for what needs to be done to codegen for targets like AMDGPUs. Personally, I am working on AMDGPU codegen as it would be used for HSA. Specifically, I am aiming for the `amdgcn-amd-amdhsa-amdgiz` LLVM target. Note that I’m still learning, so this issue will likely change as guided by experience.

Here are the pieces that will be needed to make this work to a MVP level (ie not providing access to most GPU specific stuff):

- [x] Initialize the LLVM target machine https://github.com/rust-lang/rust/pull/51548
- [ ] Teach the LLVM codegen backend to be mindful of target machine imposed address spaces PR: https://github.com/rust-lang/rust/pull/51576. E.g. allocas are in address space 5 for the target triple I mentioned above.
- [x] Add the `amdgpu-kernel` ABI (PR #52032).
- [ ] Add a mechanism to delegate virtual function calls (meaning call by pointer value) to runtime libraries.
- [ ] Required metadata ??

The address space changes are pretty general. However, in order to not require sweeping changes to how Rust is codegen-ed for LLVM, any target must support a flat address space. Flat meaning an addr space which is a superset of all others.

`amdgpu-kernel` requires its return type be `void`. There are two ways I see to do this:
* compile-time checks (somewhere in `rustc`), ie disallow any return type except `!` and `()`.
* rewriting returns to use an `sret`-like style: promote the return to be an indirect first argument of the function.

As I recall, Rust inserts wrapper functions for functions with `extern “abi”` which call the real rust abi function. My current impl went with the magical rewriting, but I think forcing the user to acknowledge this with an error is better long term.

Privately, I've made it to errors stemming from # 4 on general Rust code (ie `std`/`core` code). See [this repo/crate](https://github.com/DiamondLovesYou/rust-mir-hsa/tree/master/runtime). Regarding virtual function calls, in principle, it’s possible to support, if using HSA, completely GPU side. `amdgpu-kernel`s have access to two different `hsa_queue_t`s (one for the host and the device), setup by the GPU’s hardware command processor. When a virtual call is encountered, the trick is to have the GPU write to its own `hsa_queue_t` then wait on the completion signal. Foreign functions can also be supported in this way, by writing to the host `hsa_queue_t` instead.


#### Post-MVP

TBD(TODO) Discuss?

#### Informational links
- [User Guilde for AMDGPU Backend](https://llvm.org/docs/AMDGPUUsage.html)
- [AMD’s HIP Clang branch](https://github.com/RadeonOpenCompute/clang/tree/amd-hip-upstream)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue for targeting AMDGPU devices #51575

Post-MVP

Informational links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Tracking issue for targeting AMDGPU devices #51575

Description

Post-MVP

Informational links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions