[SYCL] Improve enqueue function host task#21679
[SYCL] Improve enqueue function host task#21679KornevNikita wants to merge 1 commit intointel:syclfrom
Conversation
Spec: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc SYCL headers patch: intel#21456 This is the second part of the host_task enqueue function implementation. L0 provides an API to launch host tasks - zeCommandListAppendHostFunction. This is API is used by the urEnqueueHostTaskExp UR function. This patch switch enqueue function host_task to use this API if it's possible.
There was a problem hiding this comment.
Pull request overview
This PR updates the SYCL experimental sycl_ext_oneapi_enqueue_functions host_task path to prefer the Unified Runtime urEnqueueHostTaskExp API (backed by Level Zero zeCommandListAppendHostFunction) when supported, and adds an e2e test intended to validate the UR call path via tracing output.
Changes:
- Route enqueue-functions
host_tasksubmissions through a new internal handler entry point that tags host tasks as originating from the enqueue-functions API. - In host-task dispatch, query
UR_DEVICE_INFO_ENQUEUE_HOST_TASK_SUPPORT_EXPand callurEnqueueHostTaskExpwhen available. - Add an e2e test that checks UR tracing for the device-info query and
urEnqueueHostTaskExp.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sycl/test-e2e/EnqueueFunctions/native_host_task.cpp | Adds an e2e trace-based test for urEnqueueHostTaskExp usage. |
| sycl/source/handler.cpp | Adds SetHostTaskFromExtEnqueueFunctions to tag host tasks created by enqueue-functions. |
| sycl/source/detail/scheduler/commands.cpp | Attempts to enqueue ext host tasks via urEnqueueHostTaskExp when supported. |
| sycl/source/detail/host_task.hpp | Tracks host-task origin (core API vs enqueue-functions API). |
| sycl/include/sycl/handler.hpp | Adds internal plumbing (HandlerAccess) for enqueue-functions host task submission. |
| sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp | Switches enqueue-functions host_task to use the new internal handler entry point. |
| // REQUIRES: level_zero | ||
|
|
||
| // RUN: %{build} -o %t.out | ||
| // RUN: %{run} SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %t.out | FileCheck %s |
There was a problem hiding this comment.
The RUN line is passing SYCL_UR_USE_LEVEL_ZERO_V2=1 / SYCL_UR_TRACE=2 as command-line arguments to %{run}, not as environment variables. Other e2e tests set UR tracing via env ... %{run} ... and also redirect stderr to stdout so FileCheck can see trace output. Update this RUN line to use env SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %{run} %t.out 2>&1 | FileCheck %s (or the repo’s established equivalent).
| // RUN: %{run} SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %t.out | FileCheck %s | |
| // RUN: env SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %{run} %t.out 2>&1 | FileCheck %s |
| if (NativeHostTaskSupport) | ||
| Queue->getAdapter().call<UrApiKind::urEnqueueHostTaskExp>( | ||
| Queue->getHandleRef(), NativeHostTask, | ||
| &HostTask.MHostTask->MHostTask, nullptr, 0, nullptr, nullptr); | ||
| else |
There was a problem hiding this comment.
urEnqueueHostTaskExp is being given &HostTask.MHostTask->MHostTask as the callback data, but HostTask.MHostTask.reset() is executed at the end of DispatchHostTask::operator(), which can destroy that std::function while the backend may still call the host callback asynchronously. This is a likely use-after-free. Ensure the callable’s storage outlives the backend callback (e.g., heap-allocate/capture a shared state and free it after the UR host-task event completes).
| Queue->getAdapter().call<UrApiKind::urEnqueueHostTaskExp>( | ||
| Queue->getHandleRef(), NativeHostTask, | ||
| &HostTask.MHostTask->MHostTask, nullptr, 0, nullptr, nullptr); | ||
| else | ||
| HostTask.MHostTask->call(MThisCmd->MEvent->getHostProfilingInfo()); | ||
| } else { | ||
| HostTask.MHostTask->call(MThisCmd->MEvent->getHostProfilingInfo()); | ||
| } |
There was a problem hiding this comment.
This path enqueues urEnqueueHostTaskExp without requesting an output event (phEvent == nullptr) and then immediately proceeds to NotifyHostTaskCompletion later in DispatchHostTask::operator(). If the UR host task executes asynchronously (as implied by UR’s API + conformance tests), the scheduler may treat the host task as complete too early, breaking dependency ordering across queues and making host-task profiling/async-exception behavior diverge. Request a UR event here and delay completion notification until that event signals (either urEventWait in this thread or an event callback-based completion path).
| void NativeHostTask(void *funcPtr) { | ||
| auto *func = static_cast<std::function<void()> *>(funcPtr); | ||
| (*func)(); | ||
| } |
There was a problem hiding this comment.
NativeHostTask directly invokes the user callable with no exception boundary. If the callable throws, the exception will unwind through the UR/Level Zero callback boundary (undefined behavior / likely std::terminate) rather than being captured and reported as an async exception like the existing host-task path. Wrap the invocation in a catch-all and propagate/report the exception via the same async-exception mechanism used in DispatchHostTask (e.g., store exception_ptr in shared state and report after the UR host-task event completes).
Spec: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc
SYCL headers patch: #21456
This is the second part of the host_task enqueue function implementation. L0 provides an API to launch host tasks - zeCommandListAppendHostFunction. This is API is used by the urEnqueueHostTaskExp UR function. This patch switches enqueue function host_task to use this API if it's possible.