Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 19 additions & 62 deletions sycl/doc/design/OffloadDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ to be taken.
For example, when an embedded device binary is of the `OFK_SYCL` kind and of
the `spir64_gen` architecture triple, the resulting extracted binary is linked,
post-link processed and converted to SPIR-V before being passed to `ocloc` to
generate the final device binary. Options passed via `--gpu-tool-arg=` will
generate the final device binary. Options passed via `--device-compiler=sycl:spir64_gen-unknown-unknown=<arg>` will
be applied to the `ocloc` step as well.

Binaries generated during the offload compilation will be 'bundled' together
Expand Down Expand Up @@ -233,25 +233,32 @@ to create the image.

To support the needed option passing triggered by use of the
`-Xsycl-target-backend` option and implied options based on the optional
device behaviors for AOT compilations for GPU new command line interfaces
device behaviors for AOT compilations for GPU and CPU, new command line interfaces
are needed to pass along this information.

| Target | Triple | Offline Tool | Option for Additional Args |
|--------|---------------|----------------|----------------------------|
| CPU | spir64_x86_64 | opencl-aot | `--cpu-tool-arg=<arg>` |
| GPU | spir64_gen | ocloc | `--gpu-tool-arg=<arg>` |
| FPGA | spir64_fpga | aoc/opencl-aot | `--fpga-tool-arg=<arg>` |
| Target | Triple | Offline Tool | Option for Additional Args |
|--------|---------------|----------------|------------------------------------------------------------------|
| CPU | spir64_x86_64 | opencl-aot | `--device-compiler=sycl:spir64_x86_64-unknown-unknown=<arg>` |
| GPU | spir64_gen | ocloc | `--device-compiler=sycl:spir64_gen-unknown-unknown=<arg>` |

*Table: Ahead of Time Info*

#### Format of the --device-compiler Option
The `--device-compiler` option uses the format `--device-compiler=[<kind>:][<triple>=]<value>` where:
- `<kind>` : specifies the offloading kind (e.g., sycl, hip, openmp) and is optional.
- `<triple>` : specifies the target triple (e.g., `spir64_gen-unknown-unknown`, `spir64_x86_64-unknown-unknown`) and is optional.
- `<value>` : contains the arguments to be passed to the backend compiler.

In clang-linker-wrapper, the `<kind>` and `<triple>` are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the backend compiler. If `<kind>` is not specified, the arguments will match any offloading kind; if `<triple>` is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets.

#### Other Supported Options
To complete the support needed for the various targets using the
`clang-linker-wrapper` as the main interface, a few additional options will
be needed to communicate from the driver to the tool. Further details of usage
are given further below.

| Option Name | Purpose |
|------------------------------|----------------------------------------------|
| `--fpga-link-type=<arg>` | Tells the link step to perform 'early' or 'image' processing to create archives for FPGA |
| `--parallel-link-sycl=<arg>` | Provide the number of parallel jobs that will be used when processing split jobs |

*Table: Additional Options for clang-linker-wrapper*
Expand Down Expand Up @@ -282,8 +289,8 @@ list to be passed along.

*Example: spir64_gen enabling options*

> --gpu-tool-arg="-device pvc -options extraopt_pvc"
--gpu-tool-arg="-options -extraopt_skl"
> "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options extraopt_pvc"
"--device-compiler=sycl:spir64_gen-unknown-unknown=-options -extraopt_skl"

*Example: clang-linker-wrapper options*

Expand All @@ -292,7 +299,7 @@ individually wrapped and linked into the final executable.

Additionally, the syntax can be expanded to enable the ability to pass specific
options to a specific device GPU target for spir64_gen. The syntax will
resemble `--gpu-tool-arg=<arch> <arg>`. This corresponds to the existing
resemble `--device-compiler=sycl:spir64_gen-unknown-unknown=<arch> <arg>`. This corresponds to the existing
option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed
set of targets.
Copy link
Copy Markdown
Contributor Author

@YixingZhang007 YixingZhang007 Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is still what we want to support, because currently, the backend compiler arguments for all architectures are passed together through a single --device-compiler= argument. For the example shown earlier in this file, if we have the following:

clang++ -fsycl -fsycl-targets=intel_gpu_skl,spir64_gen \
  -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc" \
  -Xsycl-target-backend=intel_gpu_skl "-options -extraopt_skl"

the clang-linker-wrapper command right now looks like:

clang-linker-wrapper ... \
  --device-compiler=sycl:spir64_gen-unknown-unknown \
  =-device pvc -options -extraopt_pvc -options -extraopt_skl ...

Then in clang-linker-wrapper, it will execute ocloc with both -device pvc -options -extraopt_pvc and -options -extraopt_skl for both PVC and SKL.

If we still want to keep the original proposed solution of separating the arguments for different architectures in clang-linker-wrapper, this will be something we need to implement next.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, should not we call ocloc specifying both pvc and skl as -device options?
What does old offloading model do for this scenario?
@mdtoguchi , I believe, original design came from you, could you please comment?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to retain this capability to allow for passing along specific values for each potential arch target. Each individual target arch provided performs a separate ocloc call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does it make sense to you that we are calling ocloc with such options? -device pvc -options -extraopt_pvc -options -extraopt_skl
should not it be something like: -device pvc -options -extraopt_pvc -device skl -options -extraopt_skl?
or maybe 2 calls to ocloc?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other words, it looks like we are calling ocloc to compile for pvc target, while inital clang++ command line asks to compile for 2 targets: pvc and skl.

Copy link
Copy Markdown
Contributor Author

@YixingZhang007 YixingZhang007 Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried modifying the clang-linker-wrapper with two separate --device-compiler options, one for each architecture, as shown below (right now the arguments for both arch are passed through a single --device-compiler option) :

clang-linker-wrapper ... \
  "--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc" \
  "--device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options -extraopt_skl"

The ocloc commands got called is shown below.

ocloc ... -device skl -device_options pvc -device pvc -options -extraopt_pvc -device skl -options -extraopt_skl ...
ocloc ... -device pvc -device_options pvc -device pvc -options -extraopt_pvc -device skl -options -extraopt_skl ...

I think we may still need to implement filtering logic in clang-linker-wrapper so that each --device-compiler option is only applied to its corresponding architecture @YuriPlyakhin

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as we discussed on the meeting, we also need to do more experiments to better understand implemented behavior for old offloading model as well.

Copy link
Copy Markdown
Contributor Author

@YixingZhang007 YixingZhang007 Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked into the behavior of the old offloading model for multiple devices. The argument passing into the ocloc command is different for old and new offloading models.

For example, we run the following clang command with the old offloading model:

clang++ ... -fsycl-targets=intel_gpu_dg1,spir64_gen -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc" -Xsycl-target-backend=intel_gpu_dg1 "-options -extraopt_dg1" ...

The ocloc commands run for the old offloading model are:

ocloc ... -device dg1 -device_options pvc ... -options -extraopt_dg1 ...
ocloc ... -device_options pvc -device pvc ... -options -extraopt_pvc -options -extraopt_dg1 ...

@YuriPlyakhin @mdtoguchi I don't think the ocloc commands are correct for the old offloading model, because the backend option that was passed for dg1 is also passed to pvc as well (however, the options passed to ocloc for dg1 is correct).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, how is -device_options pvc correct for dg1?
If the old offloading model is broken, I guess we can just make new offloading model to work correctly then. And we should not break any old-offloading model scenarios. So, could we implement something like what I proposed in #21037 (comment)?
and yes, for that solution additional filtering will be needed in clang-linker-wrapper based on -device ... value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the behaviors with when mixing -fsycl-targets=spir64_gen and -fsycl-targets=intel_gpu_dg1 in your example, the driver doesn't seem to differentiate things that when assigned to spir64_gen should only go to spir64_gen explicit targets and is applying to all spir64_gen targets. Underlying triple target with intel_gpu_dg1 is spir64_gen so the driver looks to be generalizing the options at that point and passing the -Xsycl-target-backend=spir64_gen to all of the related ocloc calls.

Due to the fact that spir64_gen is more of a 'generic' value it's not clear to me if what we are doing is correct or if we should be more explicit in option passing management.


Expand Down Expand Up @@ -418,64 +425,14 @@ lists the accepted values.
| GCN GFX12 (RDNA 4) architecture | gfx1200 |
| GCN GFX12 (RDNA 4) architecture | gfx1201 |

#### spir64_fpga support

Compilation behaviors involving AOT for FPGA involve an additional call to
either `aoc` (for Hardware/Simulation) or `opencl-aot` (for Emulation). This
call occurs after the post-link step performed by `sycl-post-link` and the
SPIR-V translation step performed by `llvm-spirv`. Additional options passed
by the user via the `-Xsycl-target-backend=spir64_fpga <opts>` command will be
processed by a new options to the wrapper,
`--fpga-tool-arg=<arg>`

The FPGA target also has support for additional generated binaries that
contain intermediate files specific for FPGA. These binaries (aoco, aocr and
aocx) can reside in archives and are treated differently than traditional
device binaries.

Generation of the AOCR and AOCX type binary is triggered by the command line
option `-fsycl-link`, where `-fsycl-link=image` creates AOCX archives and
`-fsycl-link=early` generates AOCR archives. The files generated by these
options are handled in a specific manner when encountered.

Any archive with an AOCR type device binary will have the AOCR binary
extracted and passed to `aoc` to produce an AOCX final image. This final
image is wrapped and added to the final binary during the host link. The use
of `-fsycl-link=image` with an AOCR binary will create an AOCX based archive
instead of completing the host link. Any archive with an AOCX type device
binary skips the `aoc` step and is wrapped and added to the final binary during
the host link. Archives with any AOCO device binaries are extracted and passed
through to `aoc -library-list=<listfile>`

As the `clang-linker-wrapper` is responsible for understanding the archives
that are added on the command line, it will need to know when to look for
these unique device binaries based on the expected compilation targets. The
behavior of creating the AOCX/AOCR type archive will be triggered via an
additional command line option specified by the driver when `-fsycl-link`
options are used. The `--fpga-link=<type>` option will tell the wrapper when
these handlings need to occur.

When using the `-fintelfpga` option to enable AOT for FPGA, there are
additional expectations during the compilation. Use of the option will enable
debug generation and also generate dependency information. The dependency
generation should be packaged along with the device binary for use during
the link phase. It is expected that the full fat object, containing host,
device and dependency file is generated before being passed to the link phase.
The dependency information is only used when compiling for hardware.

The `clang-linker-wrapper` tool will be responsible to determine which FPGA
tool is being used during the AOT device compilation phase. The use of
`-simulation` or `-hardware` as passed in by `--fpga-tool-arg` signifies
which tool is used.

#### spir64_x86_64 support

Compilation behaviors involving AOT for CPU involve an additional call to
`opencl-aot`. This call occurs after the post-link step performed by
`sycl-post-link` and the SPIR-V translation step performed by `llvm-spirv`.
Additional options passed by the user via the
`-Xsycl-target-backend=spir64_x86_64 <opts>` command will be processed by a new
option to the wrapper, `--cpu-tool-arg=<arg>`
option to the wrapper, `--device-compiler=sycl:spir64_x86_64-unknown-unknown=<arg>`

Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs
will also leverage the `--offload-arch` option.
Expand Down
Loading