-
Notifications
You must be signed in to change notification settings - Fork 822
[SYCL][Doc] Update --device-compiler option and remove FPGA support from OffloadDesign.md #21037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
108dd3a
2fb7eec
79be1c1
ece55ed
9314eb9
008d3a2
f27d7c5
bb81be6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -107,7 +107,7 @@ to be taken. | |
| For example, when an embedded device binary is of the `OFK_SYCL` kind and of | ||
| the `spir64_gen` architecture triple, the resulting extracted binary is linked, | ||
| post-link processed and converted to SPIR-V before being passed to `ocloc` to | ||
| generate the final device binary. Options passed via `--gpu-tool-arg=` will | ||
| generate the final device binary. Options passed via `--device-compiler=sycl:spir64_gen-unknown-unknown=<arg>` will | ||
| be applied to the `ocloc` step as well. | ||
|
|
||
| Binaries generated during the offload compilation will be 'bundled' together | ||
|
|
@@ -233,25 +233,32 @@ to create the image. | |
|
|
||
| To support the needed option passing triggered by use of the | ||
| `-Xsycl-target-backend` option and implied options based on the optional | ||
| device behaviors for AOT compilations for GPU new command line interfaces | ||
| device behaviors for AOT compilations for GPU and CPU, new command line interfaces | ||
| are needed to pass along this information. | ||
|
|
||
| | Target | Triple | Offline Tool | Option for Additional Args | | ||
| |--------|---------------|----------------|----------------------------| | ||
| | CPU | spir64_x86_64 | opencl-aot | `--cpu-tool-arg=<arg>` | | ||
| | GPU | spir64_gen | ocloc | `--gpu-tool-arg=<arg>` | | ||
| | FPGA | spir64_fpga | aoc/opencl-aot | `--fpga-tool-arg=<arg>` | | ||
| | Target | Triple | Offline Tool | Option for Additional Args | | ||
| |--------|---------------|----------------|------------------------------------------------------------------| | ||
| | CPU | spir64_x86_64 | opencl-aot | `--device-compiler=sycl:spir64_x86_64-unknown-unknown=<arg>` | | ||
| | GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=<arg>` | | ||
|
|
||
| *Table: Ahead of Time Info* | ||
|
|
||
| #### Format of the --device-compiler Option | ||
| The `--device-compiler` option uses the format `--device-compiler=[<kind>:][<triple>=]<value>` where: | ||
| - `<kind>` : specifies the offloading kind (e.g., sycl, hip, openmp) and is optional. | ||
| - `<triple>` : specifies the target triple (e.g., `spir64_gen-unknown-unknown`, `spir64_x86_64-unknown-unknown`) and is optional. | ||
| - `<value>` : contains the arguments to be passed to the backend compiler. | ||
|
|
||
| In clang-linker-wrapper, the `<kind>` and `<triple>` are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the backend compiler. If `<kind>` is not specified, the arguments will match any offloading kind; if `<triple>` is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets. | ||
|
|
||
| #### Other Supported Options | ||
| To complete the support needed for the various targets using the | ||
| `clang-linker-wrapper` as the main interface, a few additional options will | ||
| be needed to communicate from the driver to the tool. Further details of usage | ||
| are given further below. | ||
|
|
||
| | Option Name | Purpose | | ||
| |------------------------------|----------------------------------------------| | ||
| | `--fpga-link-type=<arg>` | Tells the link step to perform 'early' or 'image' processing to create archives for FPGA | | ||
| | `--parallel-link-sycl=<arg>` | Provide the number of parallel jobs that will be used when processing split jobs | | ||
|
|
||
| *Table: Additional Options for clang-linker-wrapper* | ||
|
|
@@ -282,8 +289,8 @@ list to be passed along. | |
|
|
||
| *Example: spir64_gen enabling options* | ||
|
|
||
| > --gpu-tool-arg="-device pvc -options extraopt_pvc" | ||
| --gpu-tool-arg="-options -extraopt_skl" | ||
| > "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options extraopt_pvc" | ||
| "--device-compiler=sycl:spir64_gen-unknown-unknown=-options -extraopt_skl" | ||
|
|
||
| *Example: clang-linker-wrapper options* | ||
|
|
||
|
|
@@ -292,7 +299,7 @@ individually wrapped and linked into the final executable. | |
|
|
||
| Additionally, the syntax can be expanded to enable the ability to pass specific | ||
| options to a specific device GPU target for spir64_gen. The syntax will | ||
| resemble `--gpu-tool-arg=<arch> <arg>`. This corresponds to the existing | ||
| resemble `--device-compiler=sycl:spir64_gen-unknown-unknown=<arch> <arg>`. This corresponds to the existing | ||
| option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed | ||
| set of targets. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if this is still what we want to support, because currently, the backend compiler arguments for all architectures are passed together through a single the clang-linker-wrapper command right now looks like: Then in If we still want to keep the original proposed solution of separating the arguments for different architectures in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting, should not we call ocloc specifying both pvc and skl as
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to retain this capability to allow for passing along specific values for each potential arch target. Each individual target arch provided performs a separate ocloc call.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But does it make sense to you that we are calling ocloc with such options?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in other words, it looks like we are calling ocloc to compile for pvc target, while inital clang++ command line asks to compile for 2 targets: pvc and skl.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried modifying the The ocloc commands got called is shown below. I think we may still need to implement filtering logic in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, as we discussed on the meeting, we also need to do more experiments to better understand implemented behavior for old offloading model as well.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have looked into the behavior of the old offloading model for multiple devices. The argument passing into the ocloc command is different for old and new offloading models. For example, we run the following clang command with the old offloading model: The ocloc commands run for the old offloading model are: @YuriPlyakhin @mdtoguchi I don't think the ocloc commands are correct for the old offloading model, because the backend option that was passed for dg1 is also passed to pvc as well (however, the options passed to ocloc for dg1 is correct).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, how is
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at the behaviors with when mixing Due to the fact that |
||
|
|
||
|
|
@@ -418,64 +425,14 @@ lists the accepted values. | |
| | GCN GFX12 (RDNA 4) architecture | gfx1200 | | ||
| | GCN GFX12 (RDNA 4) architecture | gfx1201 | | ||
|
|
||
| #### spir64_fpga support | ||
|
|
||
| Compilation behaviors involving AOT for FPGA involve an additional call to | ||
| either `aoc` (for Hardware/Simulation) or `opencl-aot` (for Emulation). This | ||
| call occurs after the post-link step performed by `sycl-post-link` and the | ||
| SPIR-V translation step performed by `llvm-spirv`. Additional options passed | ||
| by the user via the `-Xsycl-target-backend=spir64_fpga <opts>` command will be | ||
| processed by a new options to the wrapper, | ||
| `--fpga-tool-arg=<arg>` | ||
|
|
||
| The FPGA target also has support for additional generated binaries that | ||
| contain intermediate files specific for FPGA. These binaries (aoco, aocr and | ||
| aocx) can reside in archives and are treated differently than traditional | ||
| device binaries. | ||
|
|
||
| Generation of the AOCR and AOCX type binary is triggered by the command line | ||
| option `-fsycl-link`, where `-fsycl-link=image` creates AOCX archives and | ||
| `-fsycl-link=early` generates AOCR archives. The files generated by these | ||
| options are handled in a specific manner when encountered. | ||
|
|
||
| Any archive with an AOCR type device binary will have the AOCR binary | ||
| extracted and passed to `aoc` to produce an AOCX final image. This final | ||
| image is wrapped and added to the final binary during the host link. The use | ||
| of `-fsycl-link=image` with an AOCR binary will create an AOCX based archive | ||
| instead of completing the host link. Any archive with an AOCX type device | ||
| binary skips the `aoc` step and is wrapped and added to the final binary during | ||
| the host link. Archives with any AOCO device binaries are extracted and passed | ||
| through to `aoc -library-list=<listfile>` | ||
|
|
||
| As the `clang-linker-wrapper` is responsible for understanding the archives | ||
| that are added on the command line, it will need to know when to look for | ||
| these unique device binaries based on the expected compilation targets. The | ||
| behavior of creating the AOCX/AOCR type archive will be triggered via an | ||
| additional command line option specified by the driver when `-fsycl-link` | ||
| options are used. The `--fpga-link=<type>` option will tell the wrapper when | ||
| these handlings need to occur. | ||
|
|
||
| When using the `-fintelfpga` option to enable AOT for FPGA, there are | ||
| additional expectations during the compilation. Use of the option will enable | ||
| debug generation and also generate dependency information. The dependency | ||
| generation should be packaged along with the device binary for use during | ||
| the link phase. It is expected that the full fat object, containing host, | ||
| device and dependency file is generated before being passed to the link phase. | ||
| The dependency information is only used when compiling for hardware. | ||
|
|
||
| The `clang-linker-wrapper` tool will be responsible to determine which FPGA | ||
| tool is being used during the AOT device compilation phase. The use of | ||
| `-simulation` or `-hardware` as passed in by `--fpga-tool-arg` signifies | ||
| which tool is used. | ||
|
|
||
| #### spir64_x86_64 support | ||
|
|
||
| Compilation behaviors involving AOT for CPU involve an additional call to | ||
| `opencl-aot`. This call occurs after the post-link step performed by | ||
| `sycl-post-link` and the SPIR-V translation step performed by `llvm-spirv`. | ||
| Additional options passed by the user via the | ||
| `-Xsycl-target-backend=spir64_x86_64 <opts>` command will be processed by a new | ||
| option to the wrapper, `--cpu-tool-arg=<arg>` | ||
| option to the wrapper, `--device-compiler=sycl:spir64_x86_64-unknown-unknown=<arg>` | ||
|
|
||
| Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs | ||
| will also leverage the `--offload-arch` option. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.