[NewOffloadModel] Add support for passing backend options for multiple device architectures#21140
[NewOffloadModel] Add support for passing backend options for multiple device architectures#21140YixingZhang007 wants to merge 21 commits intointel:syclfrom
Conversation
|
The current SYCL E2E test failure in CI ( |
| // since not all the information for all the targets is filled out | ||
| // right now, we return true, having the affect that unrecognized | ||
| // targets have no filtering applied to them. | ||
| if (!is_contained(DeviceConfigFile::TargetTable, Target)) |
There was a problem hiding this comment.
Probably someone from the language team should review this part
There was a problem hiding this comment.
I definitely agree with this. Do you happen to know someone from the language team who would be a good fit to review this part, or do you have any suggestions on who I should reach out to? Thank you :)
There was a problem hiding this comment.
@dm-vodopyanov Are you the best person to review the device config file stuff? Thx
| @@ -1,7 +1,7 @@ | |||
| // REQUIRES: opencl || level_zero, gpu, ocloc | |||
| // REQUIRES: opencl || level_zero, gpu, ocloc, gpu-intel-dg2 | |||
| // UNSUPPORTED: arch-intel_gpu_dg2 | |||
There was a problem hiding this comment.
The UNSUPPORTED line doesn't run the test on DG2 so I'm not sure this test is running after this change.
There was a problem hiding this comment.
Thanks for pointing this out! I’ve updated this test to run on PVC instead, by changing the option to -Xsycl-target-backend=spir64_gen "-device pvc".
| // RUN: -fsycl-targets=spir64_gen \ | ||
| // RUN: -Xsycl-target-backend=spir64_gen \ | ||
| // RUN: "-device dg2" -I %S/Inputs -o %t.out \ | ||
| // RUN: "-device dg2_g10" -I %S/Inputs -o %t.out \ |
There was a problem hiding this comment.
This was changed from dg2 to dg2_g10 because intel_gpu_dg2 is not a valid target according to the https://github.com/intel/llvm/blob/c99bd6faeaa42ccefeda4a1954231cc9b51d7540/sycl/doc/UsersManual.md. Only intel_gpu_dg2_g12, intel_gpu_dg2_g11, and intel_gpu_dg2_g10 are available. Therefore, since we currently pass the architecture explicitly to sycl-post-link when the user provides -Xsycl-target-backend=spir64_gen "-device arch" (before we only passed if the sycl target is in the form of intel_gpu_*) , we need to ensure that the architecture specified with -device is valid.
There was a problem hiding this comment.
I think there are two separate dimensions here. The SYCL compiler flags definitely do not accept intel_gpu_dg2 based on the manual you linked.
However ocloc does seem to accept dg2:
nsarnie@machine:/$ ocloc -spirv_input -device dg2 -file foo.spv
Compilation from IR - skipping loading of FCL
Build succeeded.
So if this PR means users who used to pass -Xsycl-target-backend=spir64_gen "-device dg2" will now get an error, we probably want to make sure that's okay. If we have to be able to match up the -device argument to ocloc with one of the intel_gpu compiler targets that probably won't work. @mdtoguchi might have an opinion on this.
There was a problem hiding this comment.
Any of the arguments passed via -Xsycl-target-backend=spir64_gen should be scrutinized by ocloc and not be intercepted in the driver or the clang-linker-wrapper unless the tools are doing some kind of target recognition (like what we do for pvc). I don't think we should be doing any matching up of arguments that are passed via -device and intel_gpu_* variants. Passing of -device dg2 is valid as is -device unknown_arch, leaving it up to ocloc to tell the user what they passed is incorrect.
As for how we handle intel_gpu_dg2 vs intel_gpu_dg2_g10 what we have is fine, as we should be passing valid arguments to ocloc.
| std::swap(Target, Filename); | ||
| Val = {Target.str(), Filename.str()}; | ||
| SmallVector<StringRef, 8> ArgList; | ||
| ArgValue.split(ArgList, ","); |
There was a problem hiding this comment.
Can you explain the difference in input to sycl-post-link after this change?
There was a problem hiding this comment.
There are mainly two difference:
(1) Before the change, in the case of clang ... -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device arch", where we are compiling AOT for a specific arch, we did not pass the arch to sycl-post-link. Only in the case of clang ... -fsycl-targets=intel_gpu_arch, we will pass intel_gpu_arch to sycl-post-link. After the change, for both cases, we will pass intel_gpu_arch to sycl-post-link.
(2) We also add support for passing multiple device architectures to sycl-post-link together. There may be cases such as -Xsycl-target-backend=spir64_gen "-device arch1,arch2". Previously, only one architecture could be passed to sycl-post-link. Now, we will pass intel_gpu_arch1,intel_gpu_arch2 to sycl-post-link.
There was a problem hiding this comment.
Ok cool, but what sycl-post-link does with whatever arch(s) it is passed is not related to this PR right? It's just changing when/what arches are passed?
There was a problem hiding this comment.
Yes, that's right, this patch doesn't change any functionality inside of sycl-post-link :)
srividya-sundaram
left a comment
There was a problem hiding this comment.
Driver changes LGTM, just 1 comment left.
|
@intel/dpcpp-tools-reviewers and @intel/llvm-reviewers-runtime Could you please help review this PR? Thank you! :) |
|
Added some new comments, thanks |
| Arg.split(Arglist, ' '); | ||
| for (size_t i = 0; i + 1 < Arglist.size(); ++i) { | ||
| if (Arglist[i] == "-device") | ||
| return Arglist[i + 1]; |
There was a problem hiding this comment.
Probably not a common occurance, but if a user says -device pvc -device bdw the ocloc parsing will actually take -device bdw. Grabbing the first instance will cause a mismatch of target expectations.
There was a problem hiding this comment.
I thought if a user wants to specify more than one architecture for spir64_gen, they would need to use a format like -device arch1,arch2,arch3,... rather than repeating -device arch1 -device arch2 .... This is based on some tests I’ve seen; I haven’t come across any examples using the second approach. The code works for the first approach. Please let me know if you’re aware if the second one is being allowed. Thank you!
There was a problem hiding this comment.
I think we cannot rely on tests only to decide what is the correct behavior. We don't know, what our customers are using actually.
And if compiler doesn't issue an error, when -device pvc -device bdw is used, then it is allowed.
I agree with Mike that inconsistency in behavior between SYCL driver and ocloc driver may cause confusion.
I provided detailed comment above on how I think this problem should be handled.
| llvm::SmallVector<StringRef, 8> Arglist; | ||
| Arg.split(Arglist, ' '); | ||
| for (size_t i = 0; i + 1 < Arglist.size(); ++i) { | ||
| if (Arglist[i] == "-device") |
There was a problem hiding this comment.
I searched through SYCL.cpp and it seems we parse -device in several places. Could you please double check. Several things to consider:
- Can we avoid parsing
-deviceat all? What are all use cases, where we have to parse-device? - If we have to parse
-device, can we limit parsing to simple cases (e.g. 1 device option with one value) and for the rest issue an error, that such case is not supported, and document clearly, which -device syntax is and is not supported by SYCL driver? - If we have to parse multiple cases (or all cases), should we check, what
oclocaccepts and parse this option the same way as ocloc?
For #2 and #3 above, should we make 1 single function that parses -device option and we use it everywhere, so that behavior is consistent and not different for different use cases?
There was a problem hiding this comment.
Thank for pointing this out! Here are some of my finding for the three question, based on all the places that are parsing -device in SYCL.cpp.
Question 1: (Use case for parsing -device)
Case 1:
getDeviceArg() + hasPVCDevice() , used in AddImpliedTargetArgs. It is to decide whether to inject the default pvc:auto GRF register allocation mode based on where PVC device is specified
llvm/clang/lib/Driver/ToolChains/SYCL.cpp
Line 1702 in 5e9d65d
Case 2:
getDeviceArg() , used in addSYCLDeviceSanitizerLibs. It is used to pick the right sanitizer device library (JIT, CPU, DG2, or PVC variant) because on the device specified.
llvm/clang/lib/Driver/ToolChains/SYCL.cpp
Line 439 in 5e9d65d
Case 3:
selectBfloatLibs parse -device for bfloat16 library selection
llvm/clang/lib/Driver/ToolChains/SYCL.cpp
Lines 296 to 301 in 5e9d65d
I don't think we would be able to avoid parsing -device in these case, because these operation are dependent on the device. When the user pass in -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device arch" we have to parse the backend options to find the device name (this is the same case as why we are parsing -device in our PRs #21493 and #21495)
Question 2:
I am a little bit not sure about this question. Do you mean we should only allow for one device passing at a time, so only -device arch but not -device arch1, arch2? Or we should only allow one way to represent device, so only -device pvc, but not -device 12.60.7 nor -device 0x0BD0 ?
Question 3:
Yes, I have checked the bahavior of ocloc for device name being passed in through different form.
Case 1: -device pvc
clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc" ...
clang-linker-wrapper --device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options -extraopt_pvc ...
ocloc -device pvc -options -extraopt_pvc ...
Case 2: -device 12.60.7 where 12.60.7 is the version number
clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device 12.60.7 -options -extraopt_pvc" ...
clang-linker-wrapper "--device-compiler=sycl:spir64_gen-unknown-unknown=-device 12.60.7 -options -extraopt_pvc" ...
ocloc -device 12.60.7 -options -extraopt_pvc ...
Case 3: -device 0x0BD0, 0x0BD0 is the hex for PVC
clang++ -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device 0x0BD0 -options -extraopt_pvc" ...
clang-linker-wrapper "--device-compiler=sycl:spir64_gen-unknown-unknown=-device 0x0BD0 -options -extraopt_pvc" ...
ocloc -device 0x0BD0 -options -extraopt_pvc ...
As we can see, the device name being passed into ocloc is the device being passed to -Xsycl-target-backend=spir64_gen "-device arch". If a invalid device name (either version number, hex or something else), error will be raise at ocloc.
For the really last question, yes, there is a function getDeviceArg can be used to parse and find the device name. We can refactor the code to use getDeviceArg().
I have updated the PRs for our new approaches (#21493 and #21495) to use getDeviceArg() to parse the - device, and I have checked they handle the device name passing to ocloc the same way as mentioned in Question 3.
|
I am a bit lost in added changes. I don't understand the reason to change UPD: resolved |
Thank you for the question! After an offline discussion with Maksim, as mentioned |
This patch adds the support of passing different backend options through
-Xsycl-target-backendfor each of multiple device architectures.An example is like the following, where we are passing different options for
pvcanddg1.The following changes are made to support this feature:
SYCL.cpp, add a new functionextractDeviceFromArgfor extracting the device from the backend option when the SYCL target isspir64_gen. This allows finding the device architecture from arguments such as:-Xsycl-target-backend=spir64_gen "-device pvc -options -extraopt_pvc"Clang.cpp, we construct a--device-compiler=...argument to pass intoclang-linker-wrapperfor each SYCL target. The--device-compilercontains a-device <arch>to differentiate the device that the backend option is specified for. When the SYCL target isspir64_gen(notintel_gpu_arch), we callextractDeviceFromArgto extract the device architecture.clang-linker-wrapperis updated to filterOPT_device_compiler_args_EQbased on the architecture, and specify each architecture when callingsycl-post-link.sycl-post-linkis updated to support multiple architectures being passed in, such that cases like:-Xsycl-target-backend=spir64_gen "-device arch1,arch2"sycl-offload-new-driver.cppto verify the changes.-fsycl-targets=spir64_gen and -Xsycl-target-backend=spir64_gen "-device arch",the compilation now targets only the specified architecture, rather than GPUs in general.