Skip to content

[NewOffloadModel] Add support for passing backend options for multiple device architectures#21140

Open
YixingZhang007 wants to merge 21 commits intointel:syclfrom
YixingZhang007:modify_device_compiler_opt
Open

[NewOffloadModel] Add support for passing backend options for multiple device architectures#21140
YixingZhang007 wants to merge 21 commits intointel:syclfrom
YixingZhang007:modify_device_compiler_opt

Conversation

@YixingZhang007
Copy link
Copy Markdown
Contributor

@YixingZhang007 YixingZhang007 commented Jan 27, 2026

This patch adds the support of passing different backend options through -Xsycl-target-backend for each of multiple device architectures.
An example is like the following, where we are passing different options for pvc and dg1.

clang++ ... -fsycl-targets=intel_gpu_dg1,spir64_gen \
  -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc" \
  -Xsycl-target-backend=intel_gpu_dg1 "-options -extraopt_dg1" ...

The following changes are made to support this feature:

  1. In SYCL.cpp, add a new function extractDeviceFromArg for extracting the device from the backend option when the SYCL target is spir64_gen. This allows finding the device architecture from arguments such as: -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc"
  2. In Clang.cpp, we construct a --device-compiler=... argument to pass into clang-linker-wrapper for each SYCL target. The --device-compiler contains a -device <arch> to differentiate the device that the backend option is specified for. When the SYCL target is spir64_gen (not intel_gpu_arch), we call extractDeviceFromArg to extract the device architecture.
  3. clang-linker-wrapper is updated to filter OPT_device_compiler_args_EQ based on the architecture, and specify each architecture when calling sycl-post-link.
  4. sycl-post-link is updated to support multiple architectures being passed in, such that cases like: -Xsycl-target-backend=spir64_gen "-device arch1,arch2"
  5. Added a new test in sycl-offload-new-driver.cpp to verify the changes.
  6. A few SYCL E2E tests are updated because, when compiling AOT with
    -fsycl-targets=spir64_gen and -Xsycl-target-backend=spir64_gen "-device arch",
    the compilation now targets only the specified architecture, rather than GPUs in general.

@YixingZhang007
Copy link
Copy Markdown
Contributor Author

YixingZhang007 commented Feb 17, 2026

The current SYCL E2E test failure in CI (KernelAndProgram/kernel-bundle-merge-options-env.cpp) is expected and will be resolved once PR #21055 is merged.

// since not all the information for all the targets is filled out
// right now, we return true, having the affect that unrecognized
// targets have no filtering applied to them.
if (!is_contained(DeviceConfigFile::TargetTable, Target))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably someone from the language team should review this part

Copy link
Copy Markdown
Contributor Author

@YixingZhang007 YixingZhang007 Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree with this. Do you happen to know someone from the language team who would be a good fit to review this part, or do you have any suggestions on who I should reach out to? Thank you :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dm-vodopyanov Are you the best person to review the device config file stuff? Thx

@@ -1,7 +1,7 @@
// REQUIRES: opencl || level_zero, gpu, ocloc
// REQUIRES: opencl || level_zero, gpu, ocloc, gpu-intel-dg2
// UNSUPPORTED: arch-intel_gpu_dg2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UNSUPPORTED line doesn't run the test on DG2 so I'm not sure this test is running after this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I’ve updated this test to run on PVC instead, by changing the option to -Xsycl-target-backend=spir64_gen "-device pvc".

// RUN: -fsycl-targets=spir64_gen \
// RUN: -Xsycl-target-backend=spir64_gen \
// RUN: "-device dg2" -I %S/Inputs -o %t.out \
// RUN: "-device dg2_g10" -I %S/Inputs -o %t.out \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the dg2->dg2_g10?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed from dg2 to dg2_g10 because intel_gpu_dg2 is not a valid target according to the https://github.com/intel/llvm/blob/c99bd6faeaa42ccefeda4a1954231cc9b51d7540/sycl/doc/UsersManual.md. Only intel_gpu_dg2_g12, intel_gpu_dg2_g11, and intel_gpu_dg2_g10 are available. Therefore, since we currently pass the architecture explicitly to sycl-post-link when the user provides -Xsycl-target-backend=spir64_gen "-device arch" (before we only passed if the sycl target is in the form of intel_gpu_*) , we need to ensure that the architecture specified with -device is valid.

Copy link
Copy Markdown
Contributor

@sarnex sarnex Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two separate dimensions here. The SYCL compiler flags definitely do not accept intel_gpu_dg2 based on the manual you linked.
However ocloc does seem to accept dg2:

nsarnie@machine:/$ ocloc -spirv_input -device dg2 -file foo.spv
Compilation from IR - skipping loading of FCL
Build succeeded.

So if this PR means users who used to pass -Xsycl-target-backend=spir64_gen "-device dg2" will now get an error, we probably want to make sure that's okay. If we have to be able to match up the -device argument to ocloc with one of the intel_gpu compiler targets that probably won't work. @mdtoguchi might have an opinion on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any of the arguments passed via -Xsycl-target-backend=spir64_gen should be scrutinized by ocloc and not be intercepted in the driver or the clang-linker-wrapper unless the tools are doing some kind of target recognition (like what we do for pvc). I don't think we should be doing any matching up of arguments that are passed via -device and intel_gpu_* variants. Passing of -device dg2 is valid as is -device unknown_arch, leaving it up to ocloc to tell the user what they passed is incorrect.

As for how we handle intel_gpu_dg2 vs intel_gpu_dg2_g10 what we have is fine, as we should be passing valid arguments to ocloc.

std::swap(Target, Filename);
Val = {Target.str(), Filename.str()};
SmallVector<StringRef, 8> ArgList;
ArgValue.split(ArgList, ",");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the difference in input to sycl-post-link after this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are mainly two difference:
(1) Before the change, in the case of clang ... -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device arch", where we are compiling AOT for a specific arch, we did not pass the arch to sycl-post-link. Only in the case of clang ... -fsycl-targets=intel_gpu_arch, we will pass intel_gpu_arch to sycl-post-link. After the change, for both cases, we will pass intel_gpu_arch to sycl-post-link.
(2) We also add support for passing multiple device architectures to sycl-post-link together. There may be cases such as -Xsycl-target-backend=spir64_gen "-device arch1,arch2". Previously, only one architecture could be passed to sycl-post-link. Now, we will pass intel_gpu_arch1,intel_gpu_arch2 to sycl-post-link.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, but what sycl-post-link does with whatever arch(s) it is passed is not related to this PR right? It's just changing when/what arches are passed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right, this patch doesn't change any functionality inside of sycl-post-link :)

Copy link
Copy Markdown
Contributor

@srividya-sundaram srividya-sundaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver changes LGTM, just 1 comment left.

@YixingZhang007
Copy link
Copy Markdown
Contributor Author

@intel/dpcpp-tools-reviewers and @intel/llvm-reviewers-runtime Could you please help review this PR? Thank you! :)

@sarnex
Copy link
Copy Markdown
Contributor

sarnex commented Feb 25, 2026

Added some new comments, thanks

Arg.split(Arglist, ' ');
for (size_t i = 0; i + 1 < Arglist.size(); ++i) {
if (Arglist[i] == "-device")
return Arglist[i + 1];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a common occurance, but if a user says -device pvc -device bdw the ocloc parsing will actually take -device bdw. Grabbing the first instance will cause a mismatch of target expectations.

Copy link
Copy Markdown
Contributor Author

@YixingZhang007 YixingZhang007 Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought if a user wants to specify more than one architecture for spir64_gen, they would need to use a format like -device arch1,arch2,arch3,... rather than repeating -device arch1 -device arch2 .... This is based on some tests I’ve seen; I haven’t come across any examples using the second approach. The code works for the first approach. Please let me know if you’re aware if the second one is being allowed. Thank you!

Copy link
Copy Markdown
Contributor

@YuriPlyakhin YuriPlyakhin Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot rely on tests only to decide what is the correct behavior. We don't know, what our customers are using actually.
And if compiler doesn't issue an error, when -device pvc -device bdw is used, then it is allowed.
I agree with Mike that inconsistency in behavior between SYCL driver and ocloc driver may cause confusion.
I provided detailed comment above on how I think this problem should be handled.

llvm::SmallVector<StringRef, 8> Arglist;
Arg.split(Arglist, ' ');
for (size_t i = 0; i + 1 < Arglist.size(); ++i) {
if (Arglist[i] == "-device")
Copy link
Copy Markdown
Contributor

@YuriPlyakhin YuriPlyakhin Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched through SYCL.cpp and it seems we parse -device in several places. Could you please double check. Several things to consider:

  1. Can we avoid parsing -device at all? What are all use cases, where we have to parse -device?
  2. If we have to parse -device, can we limit parsing to simple cases (e.g. 1 device option with one value) and for the rest issue an error, that such case is not supported, and document clearly, which -device syntax is and is not supported by SYCL driver?
  3. If we have to parse multiple cases (or all cases), should we check, what ocloc accepts and parse this option the same way as ocloc?

For #2 and #3 above, should we make 1 single function that parses -device option and we use it everywhere, so that behavior is consistent and not different for different use cases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for pointing this out! Here are some of my finding for the three question, based on all the places that are parsing -device in SYCL.cpp.

Question 1: (Use case for parsing -device)
Case 1:
getDeviceArg() + hasPVCDevice() , used in AddImpliedTargetArgs. It is to decide whether to inject the default pvc:auto GRF register allocation mode based on where PVC device is specified

if (IsJIT || Device == "pvc" || hasPVCDevice(TargArgs, DevArg)) {

Case 2:
getDeviceArg() , used in addSYCLDeviceSanitizerLibs. It is used to pick the right sanitizer device library (JIT, CPU, DG2, or PVC variant) because on the device specified.

std::string DeviceArg = getDeviceArg(CmdArgs);

Case 3:
selectBfloatLibs parse -device for bfloat16 library selection

size_t DevicesPos = Params.find("-device ");
// "-device xxx" is used to specify AOT target device, so user must apply
// -Xs "-device xxx" or -Xsycl-target-backend=spir64_gen "-device xxx"
if (DevicesPos != std::string::npos) {
UseNative = true;
std::istringstream Devices(Params.substr(DevicesPos + 8));

I don't think we would be able to avoid parsing -device in these case, because these operation are dependent on the device. When the user pass in -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device arch" we have to parse the backend options to find the device name (this is the same case as why we are parsing -device in our PRs #21493 and #21495)

Question 2:
I am a little bit not sure about this question. Do you mean we should only allow for one device passing at a time, so only -device arch but not -device arch1, arch2? Or we should only allow one way to represent device, so only -device pvc, but not -device 12.60.7 nor -device 0x0BD0 ?

Question 3:
Yes, I have checked the bahavior of ocloc for device name being passed in through different form.

Case 1: -device pvc

clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc"  ...

clang-linker-wrapper --device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc  ...

ocloc -device pvc  -options -extraopt_pvc  ...

Case 2: -device 12.60.7 where 12.60.7 is the version number

clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device 12.60.7 -options -extraopt_pvc" ...

clang-linker-wrapper "--device-compiler=sycl:spir64_gen-unknown-unknown=-device 12.60.7 -options -extraopt_pvc"  ...
 
ocloc -device 12.60.7  -options -extraopt_pvc ...

Case 3: -device 0x0BD0, 0x0BD0 is the hex for PVC

clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device 0x0BD0 -options -extraopt_pvc" ...

clang-linker-wrapper "--device-compiler=sycl:spir64_gen-unknown-unknown=-device 0x0BD0 -options -extraopt_pvc" ...

ocloc -device 0x0BD0 -options -extraopt_pvc ...

As we can see, the device name being passed into ocloc is the device being passed to -Xsycl-target-backend=spir64_gen "-device arch". If a invalid device name (either version number, hex or something else), error will be raise at ocloc.

For the really last question, yes, there is a function getDeviceArg can be used to parse and find the device name. We can refactor the code to use getDeviceArg().

I have updated the PRs for our new approaches (#21493 and #21495) to use getDeviceArg() to parse the - device, and I have checked they handle the device name passing to ocloc the same way as mentioned in Question 3.

@maksimsab
Copy link
Copy Markdown
Contributor

maksimsab commented Mar 26, 2026

I am a bit lost in added changes.
Could you please summarize the problem that you are solving here?

I don't understand the reason to change sycl-post-link anyhow. Could you please elaborate on that?

UPD: resolved

@YixingZhang007
Copy link
Copy Markdown
Contributor Author

I am a bit lost in added changes. Could you please summarize the problem that you are solving here?

I don't understand the reason to change sycl-post-link anyhow. Could you please elaborate on that?

UPD: resolved

Thank you for the question! After an offline discussion with Maksim, as mentioned sycl-post-link modifications will be dropped from this PR. The new approach can be found in PRs: #21493 and #21495.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-offload-model Enables testing with NewOffloadModel.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants