Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions chapters/how_does_opencl-opengl_interop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# OpenCL-OpenGL interop

Both OpenCL and OpenGL have specific extensions targeting resource sharing and synchronizing between the two runtimes. Doing so one may omit fetching data from the device, only to send it immediately back resulting in significant performane gains. Because the way the two APIs work, there are few thing to keep in mind when designing applications that intend interoperating.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Both OpenCL and OpenGL have specific extensions targeting resource sharing and synchronizing between the two runtimes. Doing so one may omit fetching data from the device, only to send it immediately back resulting in significant performane gains. Because the way the two APIs work, there are few thing to keep in mind when designing applications that intend interoperating.
Both OpenCL and OpenGL have specific extensions targeting resource sharing and synchronizing between the two runtimes. Doing so one may omit fetching data from the device, only to send it immediately back resulting in significant performance gains. Because the way the two APIs work, there are few thing to keep in mind when designing applications that intend interoperating.


## How is it different than using OpenGL compute shaders?

OpenGL compute shaders are slightly more restricted than OpenCL compute kernels. This is also reflected in the duality of the intermediate formats they can be compiled to. When using SPIR-V as an intermediate representation (IR), compute shaders are compiled to the graphics flavor of SIPR-V, which must exhibit structured control flow and must not use pointer arithmetic. These two cannot arise when using GLSL or other traditional shading languages. OpenCL C, being a C-derivate is far more liberal in the expressable language constructs than shading languages and as such requires a more feature complete intermediate representation, the so called compute flavor of SPIR-V. Different compiler infrastructure is required behind the scenes to process these two types of workloads, irrespective of ingesting IR or compiling from source.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit of a distraction to discuss the IL formats here. I wonder if we can make this a bit simpler. To me, the key points are:

  1. OpenCL kernels are (generally) more capable than GLSL compute shaders.
  2. OpenCL kernels use a familiar C and C++ syntax compared to GLSL compute shaders.

For (2) we could link to the C++ for OpenCL section of this guide as an example.


Beside the OpenCL ecosystem having far more libraries and utilities tailored toward compute tasks, for applications which are heavier on compute and are graphically less intensive, formulating the majority of the application in a pure compute fashion with a few graphics extensions may be a better solution than having to deal with render pipelines to utilize one pipeline stage almost exclusively.

## Setting up interop

The core of the OpenGL API has remained backward compatible with itself all the way back to it's initial incarnations. This feature of OpenGL imposes some restrictions on how interoperability can be setup.

In layman's terms, OpenCL is the "smarter" API, OpenGL does some part of init unaware of OpenCL, or even before any OpenCL API function has been invoked. Once all the shared resources (buffers and textures) were created in OpenGL, _only then_ is the OpenCL interop context even created. While OpenGL created resources as normal, OpenCL (and only OpenCL) has special functions which take `GLuint` as input to designate which exact OpenGL resource is bing given a corresponding OpenCL handle.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound quit right. Although some information about the OpenGL context / etc. does need to be provided when an OpenCL interop context is created, the shared resources themselves don't necessarily need to be created before creating the OpenCL interop context. See my CL-GL sharing sample as an example:

https://github.com/bashbaug/SimpleOpenCLSamples/tree/master/samples/opengl/00_juliagl

It creates the OpenGL context first, then the OpenCL context (using the OpenGL context), then the shared texture.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, typo: "bing"


## Using shared resources

The asymmetry in responsibilities is visible in how resources are accessed as well. OpenGL rendering (without further extensions) is conducted as normal, once again only OpenCL has specific functionality to note shared resource usage. Shared resources can only be used, if the device (through a commandqueue) has signaled OpenCL use of the resource via explicit acquire/release semantics using `clEnqueueAcquireGLObjects`/`clEnqueueReleaseGLObjects` functions.

## Synchronizing the two APIs

There are a handful of ways the two APIs may be synchronized depending on how your application is designed and what level of OpenCL-OpenGL interoperabiltiy is supported by the runtimes.

The following sections are practical paraphrases of the OpenCL Extensions specification sections [Synchronizing OpenCL and OpenGL Access to Shared Objects](https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_gl_sharing__memobjs-synchronizing-opencl-and-opengl-access-to-shared-objects) and changes to this behavior when event sharing is supported described in section [Additions to the OpenCL Extension Specification](https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_gl_event-additions-to-extension-specification).

### Basic sync

The most basic level of synchronization is when only `cl_khr_gl_sharing` is supported. In such cases the only portable sync pattern uses the most heavyweight sync operations. Rendering in OpenGL and compute in OpenCL shall not overlap and the developer must ensure this using `glFinish()`/`clFinish()`. Both functions signal that all operations have completed in their respective APIs.

_(Note: glFinish() synchronizes the OpenGL client and server too, and as such requries OS intervention. In remote scenarios (such as X-forwarding) this requires network communication as well.)_

### Implicit sync

If `cl_khr_gl_event` is supported, without making use of the added API surface, a faster sync is available is the application is designed in a compatible manner. If the OpenGL context is bound on the thread where acquire/release and compute kernels are enqueued, the OpenCL runtime has a chance to observe the state of the OpenGL context. In such cases, acquiring the OpenGL objects waits for all OpenGL commands to finish that used the acquired resources, _and_ OpenGL calls using these resources which are issued after the release command will not start executing until the effects of release are visible to the OpenGL context.

Implicit sync from the code's perspective resembles that of the previous approach when one does not sync, just flushes the queues instead of finishing them. (Flushing a queue in OpenGL does not involve the OpenGL server.)

_(Note: If in a loop one is calling GL-CL-GL-CL... commands in succession, one blocking sany somewhere will still be required, otherwise such loops on the host may spin faster than rendering and compute commands are processed on the device, leading to spilling the limit of commands in the queues. Blocking can both be done on OpenGL sync objects or OpenCL events.)_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this sentence - there's a typo here ("one blocking sany somewhere") but I'm not quite sure how to fix it.


### Explicit sync

When `cl_khr_gl_event` is supported but the context cannot be made current on the thread enqueueing OpenCL commands, one may still sync faster than invoking `glFinish()`/`clFinish()`. Because the OpenCL runtime cannot directly observe the OpenGL context, some channel of information need be made explicit for syncing to occur. As the name suggests, this extension involves events, specifically one is able to create an OpenCL event from an OpenGL sync object.

By mapping a sync object that is enqueue after a render command using some shared resource to an OpenCL event, one can use such events in the call to `clEnqueueAcquireGLObjects` in the event wait list. That way `glFinish()` may be omitted, as OpenCL can explicitly wait on certain parts of the rendering queue to complete. Note than using only this, `clFinish()` strictly speaking is still required.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is clFinish still required? Could a clFlush be used instead?


The corollary to this extension is `GL_ARB_cl_event` which allows syncing in the "reverse direction" by mapping OpenCL event to OpenGL sync objects. That way OpenGL too gets the chance to invoke a lightweight sync operation to make sure that relevant OpenCL operations have completed.
2 changes: 1 addition & 1 deletion chapters/how_does_opencl_compare.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Khronos has two high-level standards that focus on ease of programming with effe

SYCL and OpenVX implementations can be accelerated over lower level Khronos APIs such as Vulkan and OpenCL – though that is not mandated. Both Vulkan and OpenCL provide lower-level, explicit access to hardware resources for maximum flexibility and control.

[Vulkan](https://www.khronos.org/vulkan/) is a widely used new generation GPU API that can accelerate compute operations on any compatible GPU using compute shaders (shaders are the graphics equivalent of OpenCL's kernels), as well as rendering 3D graphics. When comparing OpenCL and GPU APIs such as Vulkan, some developers that are just interested in compute find that OpenCL provides a more straightforward programming model, a lighter weight runtime, more language flexibility compared to graphics shading languages - for example OpenCL C has pointers - and more rigorously defined numerical precision for math operations that can be critical for many applications. And of course, Vulkan can only be used to program GPUs, whereas OpenCL can be used to program heterogeneous accelerators.
[Vulkan](https://www.vulkan.org/) is a widely used new generation GPU API that can accelerate compute operations on any compatible GPU using compute shaders (shaders are the graphics equivalent of OpenCL's kernels), as well as rendering 3D graphics. When comparing OpenCL and GPU APIs such as Vulkan, some developers that are just interested in compute find that OpenCL provides a more straightforward programming model, a lighter weight runtime, more language flexibility compared to graphics shading languages - for example OpenCL C has pointers - and more rigorously defined numerical precision for math operations that can be critical for many applications. And of course, Vulkan can only be used to program GPUs, whereas OpenCL can be used to program heterogeneous accelerators.

Vulkan and many implementations of OpenCL use Khronos’ [SPIR-V](https://www.khronos.org/spir/) standard as a programming language intermediate representation that enables significant language compiler tooling flexibility.