Skip to content

[SYCL] Improve enqueue function host task#21679

Open
KornevNikita wants to merge 1 commit intointel:syclfrom
KornevNikita:use-l0-native-host-task
Open

[SYCL] Improve enqueue function host task#21679
KornevNikita wants to merge 1 commit intointel:syclfrom
KornevNikita:use-l0-native-host-task

Conversation

@KornevNikita
Copy link
Copy Markdown
Contributor

Spec: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc
SYCL headers patch: #21456

This is the second part of the host_task enqueue function implementation. L0 provides an API to launch host tasks - zeCommandListAppendHostFunction. This is API is used by the urEnqueueHostTaskExp UR function. This patch switches enqueue function host_task to use this API if it's possible.

Spec: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc
SYCL headers patch: intel#21456

This is the second part of the host_task enqueue function implementation.
L0 provides an API to launch host tasks - zeCommandListAppendHostFunction.
This is API is used by the urEnqueueHostTaskExp UR function.
This patch switch enqueue function host_task to use this API if it's
possible.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SYCL experimental sycl_ext_oneapi_enqueue_functions host_task path to prefer the Unified Runtime urEnqueueHostTaskExp API (backed by Level Zero zeCommandListAppendHostFunction) when supported, and adds an e2e test intended to validate the UR call path via tracing output.

Changes:

  • Route enqueue-functions host_task submissions through a new internal handler entry point that tags host tasks as originating from the enqueue-functions API.
  • In host-task dispatch, query UR_DEVICE_INFO_ENQUEUE_HOST_TASK_SUPPORT_EXP and call urEnqueueHostTaskExp when available.
  • Add an e2e test that checks UR tracing for the device-info query and urEnqueueHostTaskExp.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sycl/test-e2e/EnqueueFunctions/native_host_task.cpp Adds an e2e trace-based test for urEnqueueHostTaskExp usage.
sycl/source/handler.cpp Adds SetHostTaskFromExtEnqueueFunctions to tag host tasks created by enqueue-functions.
sycl/source/detail/scheduler/commands.cpp Attempts to enqueue ext host tasks via urEnqueueHostTaskExp when supported.
sycl/source/detail/host_task.hpp Tracks host-task origin (core API vs enqueue-functions API).
sycl/include/sycl/handler.hpp Adds internal plumbing (HandlerAccess) for enqueue-functions host task submission.
sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp Switches enqueue-functions host_task to use the new internal handler entry point.

// REQUIRES: level_zero

// RUN: %{build} -o %t.out
// RUN: %{run} SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %t.out | FileCheck %s
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RUN line is passing SYCL_UR_USE_LEVEL_ZERO_V2=1 / SYCL_UR_TRACE=2 as command-line arguments to %{run}, not as environment variables. Other e2e tests set UR tracing via env ... %{run} ... and also redirect stderr to stdout so FileCheck can see trace output. Update this RUN line to use env SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %{run} %t.out 2>&1 | FileCheck %s (or the repo’s established equivalent).

Suggested change
// RUN: %{run} SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %t.out | FileCheck %s
// RUN: env SYCL_UR_USE_LEVEL_ZERO_V2=1 SYCL_UR_TRACE=2 %{run} %t.out 2>&1 | FileCheck %s

Copilot uses AI. Check for mistakes.
Comment on lines +377 to +381
if (NativeHostTaskSupport)
Queue->getAdapter().call<UrApiKind::urEnqueueHostTaskExp>(
Queue->getHandleRef(), NativeHostTask,
&HostTask.MHostTask->MHostTask, nullptr, 0, nullptr, nullptr);
else
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urEnqueueHostTaskExp is being given &HostTask.MHostTask->MHostTask as the callback data, but HostTask.MHostTask.reset() is executed at the end of DispatchHostTask::operator(), which can destroy that std::function while the backend may still call the host callback asynchronously. This is a likely use-after-free. Ensure the callable’s storage outlives the backend callback (e.g., heap-allocate/capture a shared state and free it after the UR host-task event completes).

Copilot uses AI. Check for mistakes.
Comment on lines +378 to +385
Queue->getAdapter().call<UrApiKind::urEnqueueHostTaskExp>(
Queue->getHandleRef(), NativeHostTask,
&HostTask.MHostTask->MHostTask, nullptr, 0, nullptr, nullptr);
else
HostTask.MHostTask->call(MThisCmd->MEvent->getHostProfilingInfo());
} else {
HostTask.MHostTask->call(MThisCmd->MEvent->getHostProfilingInfo());
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path enqueues urEnqueueHostTaskExp without requesting an output event (phEvent == nullptr) and then immediately proceeds to NotifyHostTaskCompletion later in DispatchHostTask::operator(). If the UR host task executes asynchronously (as implied by UR’s API + conformance tests), the scheduler may treat the host task as complete too early, breaking dependency ordering across queues and making host-task profiling/async-exception behavior diverge. Request a UR event here and delay completion notification until that event signals (either urEventWait in this thread or an event callback-based completion path).

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +248
void NativeHostTask(void *funcPtr) {
auto *func = static_cast<std::function<void()> *>(funcPtr);
(*func)();
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NativeHostTask directly invokes the user callable with no exception boundary. If the callable throws, the exception will unwind through the UR/Level Zero callback boundary (undefined behavior / likely std::terminate) rather than being captured and reported as an async exception like the existing host-task path. Wrap the invocation in a catch-all and propagate/report the exception via the same async-exception mechanism used in DispatchHostTask (e.g., store exception_ptr in shared state and report after the UR host-task event completes).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants