[NewOffloadModel] Add support for passing backend options for multiple device architectures by YixingZhang007 · Pull Request #21140 · intel/llvm

YixingZhang007 · 2026-01-27T15:28:22Z

This patch adds the support of passing different backend options through -Xsycl-target-backend for each of multiple device architectures.
An example is like the following, where we are passing different options for pvc and dg1.

clang++ ... -fsycl-targets=intel_gpu_dg1,spir64_gen \
  -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc" \
  -Xsycl-target-backend=intel_gpu_dg1 "-options -extraopt_dg1" ...

The following changes are made to support this feature:

In SYCL.cpp, add a new function extractDeviceFromArg for extracting the device from the backend option when the SYCL target is spir64_gen. This allows finding the device architecture from arguments such as: -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc"
In Clang.cpp, we construct a --device-compiler=... argument to pass into clang-linker-wrapper for each SYCL target. The --device-compiler contains a -device <arch> to differentiate the device that the backend option is specified for. When the SYCL target is spir64_gen (not intel_gpu_arch), we call extractDeviceFromArg to extract the device architecture.
clang-linker-wrapper is updated to filter OPT_device_compiler_args_EQ based on the architecture, and specify each architecture when calling sycl-post-link.
sycl-post-link is updated to support multiple architectures being passed in, such that cases like: -Xsycl-target-backend=spir64_gen "-device arch1,arch2"
Added a new test in sycl-offload-new-driver.cpp to verify the changes.
A few SYCL E2E tests are updated because, when compiling AOT with
-fsycl-targets=spir64_gen and -Xsycl-target-backend=spir64_gen "-device arch",
the compilation now targets only the specified architecture, rather than GPUs in general.

YixingZhang007 · 2026-02-17T21:03:35Z

The current SYCL E2E test failure in CI (KernelAndProgram/kernel-bundle-merge-options-env.cpp) is expected and will be resolved once PR #21055 is merged.

clang/lib/Driver/ToolChains/Clang.cpp

sarnex · 2026-02-19T20:02:19Z

llvm/tools/sycl-post-link/sycl-post-link.cpp

+    // since not all the information for all the targets is filled out
+    // right now, we return true, having the affect that unrecognized
+    // targets have no filtering applied to them.
+    if (!is_contained(DeviceConfigFile::TargetTable, Target))


Probably someone from the language team should review this part

I definitely agree with this. Do you happen to know someone from the language team who would be a good fit to review this part, or do you have any suggestions on who I should reach out to? Thank you :)

@dm-vodopyanov Are you the best person to review the device config file stuff? Thx

sarnex · 2026-02-19T20:04:22Z

sycl/test-e2e/Basic/build_log.cpp

@@ -1,7 +1,7 @@
-// REQUIRES: opencl || level_zero, gpu, ocloc
+// REQUIRES: opencl || level_zero, gpu, ocloc, gpu-intel-dg2
 // UNSUPPORTED: arch-intel_gpu_dg2


The UNSUPPORTED line doesn't run the test on DG2 so I'm not sure this test is running after this change.

Thanks for pointing this out! I’ve updated this test to run on PVC instead, by changing the option to -Xsycl-target-backend=spir64_gen "-device pvc".

sarnex · 2026-02-19T20:05:31Z

sycl/test-e2e/DeviceCodeSplit/aot-gpu.cpp

 // RUN:   -fsycl-targets=spir64_gen \
 // RUN:   -Xsycl-target-backend=spir64_gen \
-// RUN:   "-device dg2" -I %S/Inputs -o %t.out \
+// RUN:   "-device dg2_g10" -I %S/Inputs -o %t.out \


why the dg2->dg2_g10?

This was changed from dg2 to dg2_g10 because intel_gpu_dg2 is not a valid target according to the https://github.com/intel/llvm/blob/c99bd6faeaa42ccefeda4a1954231cc9b51d7540/sycl/doc/UsersManual.md. Only intel_gpu_dg2_g12, intel_gpu_dg2_g11, and intel_gpu_dg2_g10 are available. Therefore, since we currently pass the architecture explicitly to sycl-post-link when the user provides -Xsycl-target-backend=spir64_gen "-device arch" (before we only passed if the sycl target is in the form of intel_gpu_*) , we need to ensure that the architecture specified with -device is valid.

I think there are two separate dimensions here. The SYCL compiler flags definitely do not accept intel_gpu_dg2 based on the manual you linked.
However ocloc does seem to accept dg2:

nsarnie@machine:/$ ocloc -spirv_input -device dg2 -file foo.spv Compilation from IR - skipping loading of FCL Build succeeded.

So if this PR means users who used to pass -Xsycl-target-backend=spir64_gen "-device dg2" will now get an error, we probably want to make sure that's okay. If we have to be able to match up the -device argument to ocloc with one of the intel_gpu compiler targets that probably won't work. @mdtoguchi might have an opinion on this.

Any of the arguments passed via -Xsycl-target-backend=spir64_gen should be scrutinized by ocloc and not be intercepted in the driver or the clang-linker-wrapper unless the tools are doing some kind of target recognition (like what we do for pvc). I don't think we should be doing any matching up of arguments that are passed via -device and intel_gpu_* variants. Passing of -device dg2 is valid as is -device unknown_arch, leaving it up to ocloc to tell the user what they passed is incorrect.

As for how we handle intel_gpu_dg2 vs intel_gpu_dg2_g10 what we have is fine, as we should be passing valid arguments to ocloc.

sarnex · 2026-02-19T20:06:18Z

llvm/tools/sycl-post-link/sycl-post-link.cpp

-      std::swap(Target, Filename);
-    Val = {Target.str(), Filename.str()};
+    SmallVector<StringRef, 8> ArgList;
+    ArgValue.split(ArgList, ",");


Can you explain the difference in input to sycl-post-link after this change?

There are mainly two difference:
(1) Before the change, in the case of clang ... -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device arch", where we are compiling AOT for a specific arch, we did not pass the arch to sycl-post-link. Only in the case of clang ... -fsycl-targets=intel_gpu_arch, we will pass intel_gpu_arch to sycl-post-link. After the change, for both cases, we will pass intel_gpu_arch to sycl-post-link.
(2) We also add support for passing multiple device architectures to sycl-post-link together. There may be cases such as -Xsycl-target-backend=spir64_gen "-device arch1,arch2". Previously, only one architecture could be passed to sycl-post-link. Now, we will pass intel_gpu_arch1,intel_gpu_arch2 to sycl-post-link.

Ok cool, but what sycl-post-link does with whatever arch(s) it is passed is not related to this PR right? It's just changing when/what arches are passed?

Yes, that's right, this patch doesn't change any functionality inside of sycl-post-link :)

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/SYCL.h

clang/lib/Driver/ToolChains/Clang.cpp

srividya-sundaram

Driver changes LGTM, just 1 comment left.

YixingZhang007 · 2026-02-25T18:17:20Z

@intel/dpcpp-tools-reviewers and @intel/llvm-reviewers-runtime Could you please help review this PR? Thank you! :)

sarnex · 2026-02-25T20:27:25Z

Added some new comments, thanks

mdtoguchi · 2026-02-26T16:34:22Z

clang/lib/Driver/ToolChains/SYCL.cpp

+  Arg.split(Arglist, ' ');
+  for (size_t i = 0; i + 1 < Arglist.size(); ++i) {
+    if (Arglist[i] == "-device")
+      return Arglist[i + 1];


Probably not a common occurance, but if a user says -device pvc -device bdw the ocloc parsing will actually take -device bdw. Grabbing the first instance will cause a mismatch of target expectations.

I thought if a user wants to specify more than one architecture for spir64_gen, they would need to use a format like -device arch1,arch2,arch3,... rather than repeating -device arch1 -device arch2 .... This is based on some tests I’ve seen; I haven’t come across any examples using the second approach. The code works for the first approach. Please let me know if you’re aware if the second one is being allowed. Thank you!

I think we cannot rely on tests only to decide what is the correct behavior. We don't know, what our customers are using actually.
And if compiler doesn't issue an error, when -device pvc -device bdw is used, then it is allowed.
I agree with Mike that inconsistency in behavior between SYCL driver and ocloc driver may cause confusion.
I provided detailed comment above on how I think this problem should be handled.

YuriPlyakhin · 2026-03-26T00:52:23Z

clang/lib/Driver/ToolChains/SYCL.cpp

+  llvm::SmallVector<StringRef, 8> Arglist;
+  Arg.split(Arglist, ' ');
+  for (size_t i = 0; i + 1 < Arglist.size(); ++i) {
+    if (Arglist[i] == "-device")


I searched through SYCL.cpp and it seems we parse -device in several places. Could you please double check. Several things to consider:

Can we avoid parsing -device at all? What are all use cases, where we have to parse -device?

If we have to parse -device, can we limit parsing to simple cases (e.g. 1 device option with one value) and for the rest issue an error, that such case is not supported, and document clearly, which -device syntax is and is not supported by SYCL driver?

If we have to parse multiple cases (or all cases), should we check, what ocloc accepts and parse this option the same way as ocloc?

For #2 and #3 above, should we make 1 single function that parses -device option and we use it everywhere, so that behavior is consistent and not different for different use cases?

Thank for pointing this out! Here are some of my finding for the three question, based on all the places that are parsing -device in SYCL.cpp.

Question 1: (Use case for parsing -device)
Case 1:
getDeviceArg() + hasPVCDevice() , used in AddImpliedTargetArgs. It is to decide whether to inject the default pvc:auto GRF register allocation mode based on where PVC device is specified

llvm/clang/lib/Driver/ToolChains/SYCL.cpp

Line 1702 in 5e9d65d

if (IsJIT || Device == "pvc" || hasPVCDevice(TargArgs, DevArg)) {

Case 2:
getDeviceArg() , used in addSYCLDeviceSanitizerLibs. It is used to pick the right sanitizer device library (JIT, CPU, DG2, or PVC variant) because on the device specified.

llvm/clang/lib/Driver/ToolChains/SYCL.cpp

Line 439 in 5e9d65d

std::string DeviceArg = getDeviceArg(CmdArgs);

Case 3:
selectBfloatLibs parse -device for bfloat16 library selection

llvm/clang/lib/Driver/ToolChains/SYCL.cpp

Lines 296 to 301 in 5e9d65d

size_t DevicesPos = Params.find("-device ");

// "-device xxx" is used to specify AOT target device, so user must apply

// -Xs "-device xxx" or -Xsycl-target-backend=spir64_gen "-device xxx"

if (DevicesPos != std::string::npos) {

UseNative = true;

std::istringstream Devices(Params.substr(DevicesPos + 8));

I don't think we would be able to avoid parsing -device in these case, because these operation are dependent on the device. When the user pass in -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device arch" we have to parse the backend options to find the device name (this is the same case as why we are parsing -device in our PRs #21493 and #21495)

Question 2:
I am a little bit not sure about this question. Do you mean we should only allow for one device passing at a time, so only -device arch but not -device arch1, arch2? Or we should only allow one way to represent device, so only -device pvc, but not -device 12.60.7 nor -device 0x0BD0 ?

Question 3:
Yes, I have checked the bahavior of ocloc for device name being passed in through different form.

Case 1: -device pvc

clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc" ... clang-linker-wrapper --device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc ... ocloc -device pvc -options -extraopt_pvc ...

Case 2: -device 12.60.7 where 12.60.7 is the version number

clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device 12.60.7 -options -extraopt_pvc" ... clang-linker-wrapper "--device-compiler=sycl:spir64_gen-unknown-unknown=-device 12.60.7 -options -extraopt_pvc" ... ocloc -device 12.60.7 -options -extraopt_pvc ...

Case 3: -device 0x0BD0, 0x0BD0 is the hex for PVC

clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device 0x0BD0 -options -extraopt_pvc" ... clang-linker-wrapper "--device-compiler=sycl:spir64_gen-unknown-unknown=-device 0x0BD0 -options -extraopt_pvc" ... ocloc -device 0x0BD0 -options -extraopt_pvc ...

As we can see, the device name being passed into ocloc is the device being passed to -Xsycl-target-backend=spir64_gen "-device arch". If a invalid device name (either version number, hex or something else), error will be raise at ocloc.

For the really last question, yes, there is a function getDeviceArg can be used to parse and find the device name. We can refactor the code to use getDeviceArg().

I have updated the PRs for our new approaches (#21493 and #21495) to use getDeviceArg() to parse the - device, and I have checked they handle the device name passing to ocloc the same way as mentioned in Question 3.

maksimsab · 2026-03-26T18:30:44Z

I am a bit lost in added changes.
Could you please summarize the problem that you are solving here?

I don't understand the reason to change sycl-post-link anyhow. Could you please elaborate on that?

UPD: resolved

YixingZhang007 · 2026-03-31T13:35:48Z

I am a bit lost in added changes. Could you please summarize the problem that you are solving here?

I don't understand the reason to change sycl-post-link anyhow. Could you please elaborate on that?

UPD: resolved

Thank you for the question! After an offline discussion with Maksim, as mentioned sycl-post-link modifications will be dropped from this PR. The new approach can be found in PRs: #21493 and #21495.

YixingZhang007 had a problem deploying to WindowsCILock January 27, 2026 15:28 — with GitHub Actions Failure

YixingZhang007 temporarily deployed to WindowsCILock January 27, 2026 16:04 — with GitHub Actions Inactive

YixingZhang007 had a problem deploying to WindowsCILock February 1, 2026 18:07 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 1, 2026 18:40 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 1, 2026 22:26 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 1, 2026 22:55 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 2, 2026 03:59 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 2, 2026 04:33 — with GitHub Actions Failure

YixingZhang007 temporarily deployed to WindowsCILock February 2, 2026 04:33 — with GitHub Actions Inactive

YixingZhang007 had a problem deploying to WindowsCILock February 2, 2026 04:33 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 15:57 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 16:36 — with GitHub Actions Error

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 16:44 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 17:22 — with GitHub Actions Failure

YixingZhang007 temporarily deployed to WindowsCILock February 4, 2026 17:22 — with GitHub Actions Inactive

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 17:22 — with GitHub Actions Failure

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 20:19 — with GitHub Actions Error

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 21:16 — with GitHub Actions Error

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 21:27 — with GitHub Actions Failure

YixingZhang007 temporarily deployed to WindowsCILock February 4, 2026 22:04 — with GitHub Actions Inactive

YixingZhang007 had a problem deploying to WindowsCILock February 4, 2026 22:04 — with GitHub Actions Failure

yixing.zhang and others added 6 commits February 16, 2026 07:33

fix bug

6fd3da4

sycl-offload-new-driver test is now passing

bb3ccee

resolve test failure 2/3

42ff7d2

code clean up and resolve test failure

1e9e2a5

resolve build failure

2f7bcb2

resolve test failure

f7965cb

sarnex reviewed Feb 19, 2026

View reviewed changes

srividya-sundaram reviewed Feb 19, 2026

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Outdated Show resolved Hide resolved

srividya-sundaram reviewed Feb 19, 2026

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Show resolved Hide resolved

srividya-sundaram reviewed Feb 19, 2026

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Outdated Show resolved Hide resolved

resolve comment 1/4

e185487