From 108dd3a8502cac9ae90b4b06fdb79d2f7df48bb1 Mon Sep 17 00:00:00 2001 From: Yixing Zhang Date: Mon, 12 Jan 2026 10:00:28 -0800 Subject: [PATCH 1/8] Add initial draft of change --- sycl/doc/design/OffloadDesign.md | 77 ++++++-------------------------- 1 file changed, 13 insertions(+), 64 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 516a2c4b4323c..80fda0e69fbf8 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -107,7 +107,7 @@ to be taken. For example, when an embedded device binary is of the `OFK_SYCL` kind and of the `spir64_gen` architecture triple, the resulting extracted binary is linked, post-link processed and converted to SPIR-V before being passed to `ocloc` to -generate the final device binary. Options passed via `--gpu-tool-arg=` will +generate the final device binary. Options passed via `--device-compiler=` will be applied to the `ocloc` step as well. Binaries generated during the offload compilation will be 'bundled' together @@ -238,12 +238,20 @@ are needed to pass along this information. | Target | Triple | Offline Tool | Option for Additional Args | |--------|---------------|----------------|----------------------------| -| CPU | spir64_x86_64 | opencl-aot | `--cpu-tool-arg=` | -| GPU | spir64_gen | ocloc | `--gpu-tool-arg=` | -| FPGA | spir64_fpga | aoc/opencl-aot | `--fpga-tool-arg=` | +| CPU | spir64_x86_64 | opencl-aot | `--device-compiler=sycl:spir64_x86_64-unknown-unknown=` | +| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=` | *Table: Ahead of Time Info* +#### Device Compiler Option +The `--device-compiler` option uses the format `--device-compiler=[:][=]` where: + : specifies the offloading kind (e.g., sycl, hip, openmp) and is optional. + : specifies the target triple (e.g., `spir64_gen-unknown-unknown`, `spir64_x86_64-unknown-unknown`) and is optional. + : contains the arguments to be passed to the backend compiler. + +In clang-linker-wrapper, the kind and triple are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the appropriate backend compiler (such as ocloc for GPU targets or opencl-aot for CPU targets). If is not specified, the arguments will match any offloading kind; if is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets. + +#### Other Available Options To complete the support needed for the various targets using the `clang-linker-wrapper` as the main interface, a few additional options will be needed to communicate from the driver to the tool. Further details of usage @@ -251,7 +259,6 @@ are given further below. | Option Name | Purpose | |------------------------------|----------------------------------------------| -| `--fpga-link-type=` | Tells the link step to perform 'early' or 'image' processing to create archives for FPGA | | `--parallel-link-sycl=` | Provide the number of parallel jobs that will be used when processing split jobs | *Table: Additional Options for clang-linker-wrapper* @@ -260,7 +267,6 @@ The `clang-linker-wrapper` provides an existing option named `-wrapper-jobs` that may be useful for our usage. #### spir64_gen support - Compilation behaviors involving AOT for GPU involve an additional call to the OpenCL Offline compiler (OCLOC). This call occurs after the post-link step performed by `sycl-post-link` and the SPIR-V translation step which is done @@ -282,20 +288,13 @@ list to be passed along. *Example: spir64_gen enabling options* -> --gpu-tool-arg="-device pvc -options extraopt_pvc" ---gpu-tool-arg="-options -extraopt_skl" +> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc -options -extraopt_skl" *Example: clang-linker-wrapper options* Each OCLOC call will be represented as a separate device binary that is individually wrapped and linked into the final executable. -Additionally, the syntax can be expanded to enable the ability to pass specific -options to a specific device GPU target for spir64_gen. The syntax will -resemble `--gpu-tool-arg= `. This corresponds to the existing -option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed -set of targets. - #### --offload-arch For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using ``--offload-arch`` option. For instance @@ -418,56 +417,6 @@ lists the accepted values. | GCN GFX12 (RDNA 4) architecture | gfx1200 | | GCN GFX12 (RDNA 4) architecture | gfx1201 | -#### spir64_fpga support - -Compilation behaviors involving AOT for FPGA involve an additional call to -either `aoc` (for Hardware/Simulation) or `opencl-aot` (for Emulation). This -call occurs after the post-link step performed by `sycl-post-link` and the -SPIR-V translation step performed by `llvm-spirv`. Additional options passed -by the user via the `-Xsycl-target-backend=spir64_fpga ` command will be -processed by a new options to the wrapper, -`--fpga-tool-arg=` - -The FPGA target also has support for additional generated binaries that -contain intermediate files specific for FPGA. These binaries (aoco, aocr and -aocx) can reside in archives and are treated differently than traditional -device binaries. - -Generation of the AOCR and AOCX type binary is triggered by the command line -option `-fsycl-link`, where `-fsycl-link=image` creates AOCX archives and -`-fsycl-link=early` generates AOCR archives. The files generated by these -options are handled in a specific manner when encountered. - -Any archive with an AOCR type device binary will have the AOCR binary -extracted and passed to `aoc` to produce an AOCX final image. This final -image is wrapped and added to the final binary during the host link. The use -of `-fsycl-link=image` with an AOCR binary will create an AOCX based archive -instead of completing the host link. Any archive with an AOCX type device -binary skips the `aoc` step and is wrapped and added to the final binary during -the host link. Archives with any AOCO device binaries are extracted and passed -through to `aoc -library-list=` - -As the `clang-linker-wrapper` is responsible for understanding the archives -that are added on the command line, it will need to know when to look for -these unique device binaries based on the expected compilation targets. The -behavior of creating the AOCX/AOCR type archive will be triggered via an -additional command line option specified by the driver when `-fsycl-link` -options are used. The `--fpga-link=` option will tell the wrapper when -these handlings need to occur. - -When using the `-fintelfpga` option to enable AOT for FPGA, there are -additional expectations during the compilation. Use of the option will enable -debug generation and also generate dependency information. The dependency -generation should be packaged along with the device binary for use during -the link phase. It is expected that the full fat object, containing host, -device and dependency file is generated before being passed to the link phase. -The dependency information is only used when compiling for hardware. - -The `clang-linker-wrapper` tool will be responsible to determine which FPGA -tool is being used during the AOT device compilation phase. The use of -`-simulation` or `-hardware` as passed in by `--fpga-tool-arg` signifies -which tool is used. - #### spir64_x86_64 support Compilation behaviors involving AOT for CPU involve an additional call to From 2fb7eec98d4505e29e3cb4f220c9ae980fa56cf2 Mon Sep 17 00:00:00 2001 From: Yixing Zhang Date: Tue, 13 Jan 2026 13:44:55 -0800 Subject: [PATCH 2/8] update the description for AOT compilation --- sycl/doc/design/OffloadDesign.md | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 80fda0e69fbf8..d90a91479c042 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -233,7 +233,7 @@ to create the image. To support the needed option passing triggered by use of the `-Xsycl-target-backend` option and implied options based on the optional -device behaviors for AOT compilations for GPU new command line interfaces +device behaviors for AOT compilations for GPU and CPU, new command line interfaces are needed to pass along this information. | Target | Triple | Offline Tool | Option for Additional Args | @@ -243,13 +243,13 @@ are needed to pass along this information. *Table: Ahead of Time Info* -#### Device Compiler Option +#### Format of the --device-compiler Option The `--device-compiler` option uses the format `--device-compiler=[:][=]` where: - : specifies the offloading kind (e.g., sycl, hip, openmp) and is optional. - : specifies the target triple (e.g., `spir64_gen-unknown-unknown`, `spir64_x86_64-unknown-unknown`) and is optional. - : contains the arguments to be passed to the backend compiler. +- `` : specifies the offloading kind (e.g., sycl, hip, openmp) and is optional. +- `` : specifies the target triple (e.g., `spir64_gen-unknown-unknown`, `spir64_x86_64-unknown-unknown`) and is optional. +- `` : contains the arguments to be passed to the backend compiler. -In clang-linker-wrapper, the kind and triple are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the appropriate backend compiler (such as ocloc for GPU targets or opencl-aot for CPU targets). If is not specified, the arguments will match any offloading kind; if is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets. +In clang-linker-wrapper, the `` and `` are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the backend compiler. If `` is not specified, the arguments will match any offloading kind; if `` is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets. #### Other Available Options To complete the support needed for the various targets using the @@ -267,6 +267,7 @@ The `clang-linker-wrapper` provides an existing option named `-wrapper-jobs` that may be useful for our usage. #### spir64_gen support + Compilation behaviors involving AOT for GPU involve an additional call to the OpenCL Offline compiler (OCLOC). This call occurs after the post-link step performed by `sycl-post-link` and the SPIR-V translation step which is done @@ -288,13 +289,20 @@ list to be passed along. *Example: spir64_gen enabling options* -> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc -options -extraopt_skl" +> "--device-compiler=sycl:spir64_gen-unknown-unknown= -device pvc -options extraopt_pvc" +"--device-compiler=sycl:spir64_gen-unknown-unknown= -options -extraopt_skl" *Example: clang-linker-wrapper options* Each OCLOC call will be represented as a separate device binary that is individually wrapped and linked into the final executable. +Additionally, the syntax can be expanded to enable the ability to pass specific +options to a specific device GPU target for spir64_gen. The syntax will +resemble `--device-compiler=sycl:spir64_gen-unknown-unknown== `. This corresponds to the existing +option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed +set of targets. + #### --offload-arch For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using ``--offload-arch`` option. For instance @@ -424,7 +432,7 @@ Compilation behaviors involving AOT for CPU involve an additional call to `sycl-post-link` and the SPIR-V translation step performed by `llvm-spirv`. Additional options passed by the user via the `-Xsycl-target-backend=spir64_x86_64 ` command will be processed by a new -option to the wrapper, `--cpu-tool-arg=` +option to the wrapper, `--device-compiler=sycl:spir64_gen-unknown-unknown=` Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs will also leverage the `--offload-arch` option. From 79be1c1deaa3aafa7c907b48437cbf44ef9d0d10 Mon Sep 17 00:00:00 2001 From: Yixing Zhang Date: Tue, 13 Jan 2026 14:15:42 -0800 Subject: [PATCH 3/8] code clean up --- sycl/doc/design/OffloadDesign.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index d90a91479c042..2317cdb3774a9 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -251,7 +251,7 @@ The `--device-compiler` option uses the format `--device-compiler=[:][` and `` are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the backend compiler. If `` is not specified, the arguments will match any offloading kind; if `` is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets. -#### Other Available Options +#### Other Supported Options To complete the support needed for the various targets using the `clang-linker-wrapper` as the main interface, a few additional options will be needed to communicate from the driver to the tool. Further details of usage @@ -289,8 +289,8 @@ list to be passed along. *Example: spir64_gen enabling options* -> "--device-compiler=sycl:spir64_gen-unknown-unknown= -device pvc -options extraopt_pvc" -"--device-compiler=sycl:spir64_gen-unknown-unknown= -options -extraopt_skl" +> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options extraopt_pvc" +"--device-compiler=sycl:spir64_gen-unknown-unknown=-options -extraopt_skl" *Example: clang-linker-wrapper options* @@ -432,7 +432,7 @@ Compilation behaviors involving AOT for CPU involve an additional call to `sycl-post-link` and the SPIR-V translation step performed by `llvm-spirv`. Additional options passed by the user via the `-Xsycl-target-backend=spir64_x86_64 ` command will be processed by a new -option to the wrapper, `--device-compiler=sycl:spir64_gen-unknown-unknown=` +option to the wrapper, `--device-compiler=sycl:spir64_x86_64-unknown-unknown=` Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs will also leverage the `--offload-arch` option. From ece55ed64db7791f7608bc4d1a7f58d4a2adcf33 Mon Sep 17 00:00:00 2001 From: Yixing Zhang Date: Tue, 13 Jan 2026 14:50:53 -0800 Subject: [PATCH 4/8] code clean up --- sycl/doc/design/OffloadDesign.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 2317cdb3774a9..b84e97158bb54 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -236,10 +236,10 @@ To support the needed option passing triggered by use of the device behaviors for AOT compilations for GPU and CPU, new command line interfaces are needed to pass along this information. -| Target | Triple | Offline Tool | Option for Additional Args | -|--------|---------------|----------------|----------------------------| +| Target | Triple | Offline Tool | Option for Additional Args | +|--------|---------------|----------------|------------------------------------------------------------------| | CPU | spir64_x86_64 | opencl-aot | `--device-compiler=sycl:spir64_x86_64-unknown-unknown=` | -| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=` | +| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=` | *Table: Ahead of Time Info* From 9314eb92c702a9fb7956780a966e9f6a3c651800 Mon Sep 17 00:00:00 2001 From: Yixing Zhang Date: Wed, 14 Jan 2026 09:09:22 -0800 Subject: [PATCH 5/8] resolve comment --- sycl/doc/design/OffloadDesign.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index b84e97158bb54..2f2a2905ce475 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -107,7 +107,7 @@ to be taken. For example, when an embedded device binary is of the `OFK_SYCL` kind and of the `spir64_gen` architecture triple, the resulting extracted binary is linked, post-link processed and converted to SPIR-V before being passed to `ocloc` to -generate the final device binary. Options passed via `--device-compiler=` will +generate the final device binary. Options passed via `--device-compiler=sycl:spir64_gen-unknown-unknown=` will be applied to the `ocloc` step as well. Binaries generated during the offload compilation will be 'bundled' together From 008d3a2253f32384902e9d76ec7bac8269471d6e Mon Sep 17 00:00:00 2001 From: "yixing.zhang" Date: Wed, 4 Feb 2026 23:41:16 +0100 Subject: [PATCH 6/8] update the description for --device-compiler --- sycl/doc/design/OffloadDesign.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 2f2a2905ce475..89ec86dd5f3eb 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -107,7 +107,7 @@ to be taken. For example, when an embedded device binary is of the `OFK_SYCL` kind and of the `spir64_gen` architecture triple, the resulting extracted binary is linked, post-link processed and converted to SPIR-V before being passed to `ocloc` to -generate the final device binary. Options passed via `--device-compiler=sycl:spir64_gen-unknown-unknown=` will +generate the final device binary. Options passed via `--device--device ` will be applied to the `ocloc` step as well. Binaries generated during the offload compilation will be 'bundled' together @@ -239,7 +239,7 @@ are needed to pass along this information. | Target | Triple | Offline Tool | Option for Additional Args | |--------|---------------|----------------|------------------------------------------------------------------| | CPU | spir64_x86_64 | opencl-aot | `--device-compiler=sycl:spir64_x86_64-unknown-unknown=` | -| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=` | +| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=-device ` | *Table: Ahead of Time Info* @@ -289,8 +289,8 @@ list to be passed along. *Example: spir64_gen enabling options* -> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options extraopt_pvc" -"--device-compiler=sycl:spir64_gen-unknown-unknown=-options -extraopt_skl" +> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc" +"--device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options -extraopt_skl" *Example: clang-linker-wrapper options* @@ -299,7 +299,7 @@ individually wrapped and linked into the final executable. Additionally, the syntax can be expanded to enable the ability to pass specific options to a specific device GPU target for spir64_gen. The syntax will -resemble `--device-compiler=sycl:spir64_gen-unknown-unknown== `. This corresponds to the existing +resemble `--device-compiler=sycl:spir64_gen-unknown-unknown==-device `. This corresponds to the existing option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed set of targets. From f27d7c531d7bb0acbaf07fca60a53d8efdd85257 Mon Sep 17 00:00:00 2001 From: "yixing.zhang" Date: Tue, 7 Apr 2026 22:01:02 +0200 Subject: [PATCH 7/8] remove the doc update for passing for multiple device architecture --- sycl/doc/design/OffloadDesign.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 89ec86dd5f3eb..986815d13f2f5 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -107,7 +107,7 @@ to be taken. For example, when an embedded device binary is of the `OFK_SYCL` kind and of the `spir64_gen` architecture triple, the resulting extracted binary is linked, post-link processed and converted to SPIR-V before being passed to `ocloc` to -generate the final device binary. Options passed via `--device--device ` will +generate the final device binary. Options passed via `--device-compiler=sycl:spir64_gen-unknown-unknown=` will be applied to the `ocloc` step as well. Binaries generated during the offload compilation will be 'bundled' together @@ -239,7 +239,7 @@ are needed to pass along this information. | Target | Triple | Offline Tool | Option for Additional Args | |--------|---------------|----------------|------------------------------------------------------------------| | CPU | spir64_x86_64 | opencl-aot | `--device-compiler=sycl:spir64_x86_64-unknown-unknown=` | -| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=-device ` | +| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=` | *Table: Ahead of Time Info* @@ -289,8 +289,8 @@ list to be passed along. *Example: spir64_gen enabling options* -> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc" -"--device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options -extraopt_skl" +> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options extraopt_pvc" +"--device-compiler=sycl:spir64_gen-unknown-unknown=-options -extraopt_skl" *Example: clang-linker-wrapper options* From bb81be692cf45c13db0078ec0c51682828e748ce Mon Sep 17 00:00:00 2001 From: "yixing.zhang" Date: Tue, 7 Apr 2026 22:10:43 +0200 Subject: [PATCH 8/8] fix bug --- sycl/doc/design/OffloadDesign.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 986815d13f2f5..6d64a53b4e4e6 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -299,7 +299,7 @@ individually wrapped and linked into the final executable. Additionally, the syntax can be expanded to enable the ability to pass specific options to a specific device GPU target for spir64_gen. The syntax will -resemble `--device-compiler=sycl:spir64_gen-unknown-unknown==-device `. This corresponds to the existing +resemble `--device-compiler=sycl:spir64_gen-unknown-unknown= `. This corresponds to the existing option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed set of targets.