Feature request: Introduce SyncStatus and InvalidSyncPoint for enhanced synchronization feedback

### Problem statement
Blade’s current `wait_for` API returns a `boolean` to indicate whether a synchronization point was reached within the specified timeout. However, this design has significant limitations:
1. A `false` return value could mean either a **timeout** or an **error**.
2. In scenarios like **suspend/resume**, the GPU may be reinitialized, causing sync points to become **invalid**. The current API cannot distinguish between these cases.
3. Developers lack detailed feedback to handle synchronization outcomes effectively, making debugging and error recovery challenging.

To address these limitations I propose introducing a new API: `SyncPoint::wait_for(timeout) -> SyncStatus`

### Proposed solution

####  Keep the existing API
The existing `wait_for` API will remain unchanged to ensure backward compatibility. It will continue to return a `boolean`:
- `true`: Synchronization completed successfully.
- `false`: Synchronization failed (timeout or error).

####  Introduce a new API
A new API, `sync_point.wait_for(timeout)`, will be introduced to provide detailed feedback through a `SyncStatus` enum. This API will explicitly distinguish between:

1. **Completed**: The synchronization point was reached successfully.
2. **Timeout**: The operation timed out before the synchronization point was reached.
3. **InvalidSyncPoint**: The sync point became invalid (e.g., due to GPU reinitialization).
4. **Error**: Any other errors or unexpected issues during synchronization, with a detailed error message.


#### New `SyncStatus` enum
The `SyncStatus` enum will provide detailed feedback for the new API.

```rust
pub enum SyncStatus {
    Completed,
    Timeout,
    InvalidSyncPoint,
    Error { error_string: String },
}
```

#### New `wait_for`(or `wait_for_detailed`)  method
The new `wait_for` methods to be added to the `SyncPoint` and will return `SyncStatus`.

```rust
trait SyncPoint {
    fn wait_for(&self, timeout_ms: u32) -> bool;
    fn wait_for_detailed(&self, timeout_ms: u32) -> SyncStatus;
}
```

### Example usage

#### Simple usage
```rust
let sync_point = device.create_sync_point();
// ...
if sync_point.wait_for(1000) {
    println!("GPU work completed!");
} else {
    println!("Timeout or error occurred."); // Cannot distinguish between timeout, error, or invalid sync point
}
```

#### Enhanced usage
```rust
let sync_point = device.create_sync_point();
// ...
match sync_point.wait_for_detailed(1000) {
    SyncStatus::Completed => println!("GPU work completed!"),
    SyncStatus::Timeout => println!("Timeout while waiting for GPU work."),
    SyncStatus::InvalidSyncPoint => println!("Sync point is invalid (e.g., GPU reinitialized)."),
    SyncStatus::Error { error_string } => println!("An error occurred: {}", error_string),
}
```


### How different APIs handle invalid sync points

#### 1. Vulkan
In Vulkan, synchronization primitives like semaphores and fences are tied to the logical device. If the device is lost (e.g., due to a GPU crash or driver issue), all synchronization primitives become invalid. Vulkan provides explicit mechanisms to detect device loss:
- **Device Lost Error**: When a device is lost, Vulkan operations return `VK_ERROR_DEVICE_LOST`. This can be used to detect invalid sync points.
- **Timeline Semaphores**: If a timeline semaphore is used, its value may become invalid if the device is lost.

Example:
```rust
unsafe {
    match self.device.timeline_semaphore.wait_semaphores(&wait_info, timeout_ns) {
        Ok(_) => SyncStatus::Completed,
        Err(vk::Result::TIMEOUT) => SyncStatus::Timeout,
        Err(vk::Result::ERROR_DEVICE_LOST) => SyncStatus::InvalidSyncPoint,
        Err(err) => SyncStatus::Error {
            error_string: format!("Vulkan error: {:?}", err),
        },
    }
}
```

#### 2. Metal
In Metal, command buffers and their associated synchronization primitives are tied to the command queue and device. If the GPU is reset or the device is reinitialized, command buffers and their sync points may become invalid. Metal provides status checks for command buffers:
- **Command Buffer Status**: A command buffer can be in states like `NotEnqueued`, `Enqueued`, `Committed`, `Scheduled`, `Completed`, or `Error`.
- **Invalid State**: If a command buffer is in an invalid state (e.g., `NotEnqueued` after GPU reinitialization), it can be treated as an invalid sync point.

Example:
```rust
match sync_point.cmd_buf.status() {
    metal::MTLCommandBufferStatus::Completed => SyncStatus::Completed,
    metal::MTLCommandBufferStatus::Error => {
        let error_message = sync_point.cmd_buf.error()
            .map(|e| e.to_string())
            .unwrap_or_else(|| "Unknown Metal error".to_string());
        SyncStatus::Error {
            error_string: error_message,
        }
    }
    metal::MTLCommandBufferStatus::NotEnqueued => SyncStatus::InvalidSyncPoint,
    _ => SyncStatus::Timeout,
}
```


#### 3. GLES
In GLES, synchronization relies on **sync objects** (e.g., created with `glFenceSync`), which are tied to the GL context. If the context is lost—such as during suspend/resume or GPU reinitialization—all sync objects become invalid. The `glow` crate provides abstractions for working with GLES sync operations. The `glClientWaitSync` function is used to wait for a sync object to be signaled and can return specific statuses: `GL_ALREADY_SIGNALED` or `GL_CONDITION_SATISFIED` indicates the sync completed successfully, `GL_TIMEOUT_EXPIRED` means the wait timed out, and `GL_WAIT_FAILED` signals that the sync object is invalid, often due to context loss. This mechanism allows for explicit handling of synchronization outcomes, including errors and invalid states.

Example:
```rust
impl SyncPoint for GLESSyncPoint {
    fn wait_for(&self, timeout_ms: u32) -> bool {
        matches!(self.wait_for_detailed(timeout_ms), SyncStatus::Completed)
    }

    fn wait_for_detailed(&self, timeout_ms: u32) -> SyncStatus {
        let gl = self.lock();
        let timeout_ns = if timeout_ms == !0 { !0 } else { timeout_ms as u64 * 1_000_000 };
        let timeout_ns_i32 = timeout_ns.min(i32::MAX as u64) as i32;

        let status = unsafe {
            gl.client_wait_sync(self.fence, glow::SYNC_FLUSH_COMMANDS_BIT, timeout_ns_i32)
        };

        match status {
            glow::ALREADY_SIGNALED | glow::CONDITION_SATISFIED => SyncStatus::Completed,
            glow::TIMEOUT_EXPIRED => SyncStatus::Timeout,
            glow::WAIT_FAILED => SyncStatus::InvalidSyncPoint,
            _ => SyncStatus::Error { error_string: "GLES sync failed".to_string() },
        }
    }
}
```

### Is "invalid sync point" a universal concept?
While the term **invalid sync point** isn’t explicitly defined in graphics APIs, the concept exists in practice. Each API has its own way of handling scenarios where synchronization primitives become unusable:
- **Vulkan**: device loss (`VK_ERROR_DEVICE_LOST`)
- **Metal**: command buffer invalidation (`NotEnqueued` or `Error`)
- **GLES**: context loss (glow::WAIT_FAILED)

By introducing an `InvalidSyncPoint` and the `SyncStatus` enum, we provide a unified way to handle these scenarios across all backends.

### Use cases
The `InvalidSyncPoint` is particularly useful for handling **suspend/resume** scenarios, where the GPU may be reinitialized, causing sync points to become invalid. Without this variant, developers cannot distinguish between:
- A legitimate **timeout** (e.g., the GPU is busy but still operational), where we can present user with a choice to continue waiting or terminate (maybe through some Operating System API if the desktop environment is not experiencing the same busy status).
- An **invalid sync point** (e.g., the GPU was reinitialized, and the sync point is no longer valid), where we need to either recover from error state or gracefully shutdown.

By explicitly including `InvalidSyncPoint`, we enable developers to handle these cases appropriately, improving robustness and debuggability.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Introduce SyncStatus and InvalidSyncPoint for enhanced synchronization feedback #248

Problem statement

Proposed solution

Keep the existing API

Introduce a new API

New `SyncStatus` enum

New `wait_for`(or `wait_for_detailed`) method

Example usage

Simple usage

Enhanced usage

How different APIs handle invalid sync points

1. Vulkan

2. Metal

3. GLES

Is "invalid sync point" a universal concept?

Use cases

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: Introduce SyncStatus and InvalidSyncPoint for enhanced synchronization feedback #248

Description

Problem statement

Proposed solution

Keep the existing API

Introduce a new API

New SyncStatus enum

New wait_for(or wait_for_detailed) method

Example usage

Simple usage

Enhanced usage

How different APIs handle invalid sync points

1. Vulkan

2. Metal

3. GLES

Is "invalid sync point" a universal concept?

Use cases

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

New `SyncStatus` enum

New `wait_for`(or `wait_for_detailed`) method