Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Before using AWS Lambda durable functions, verify:
2. **Runtime environment** is ready:
- For TypeScript/JavaScript: Node.js 22+ (`node --version`)
- For Python: Python 3.11+ (`python --version`. Note that currently only Lambda runtime environments 3.13+ come with the Durable Execution SDK pre-installed. 3.11 is the min supported Python version by the Durable SDK itself, however, you could use OCI to bring your own container image with your own Python runtime + Durable SDK.)
- For Java: Java 17+ (`java --version`)

3. **Deployment capability** exists (one of):
- AWS SAM CLI (`sam --version`) 1.153.1 or higher
Expand All @@ -40,6 +41,7 @@ Override syntax:

- "use Python" → Generate Python code
- "use JavaScript" → Generate JavaScript code
- "use Java" → Generate Java code

When not specified, ALWAYS use TypeScript

Expand Down Expand Up @@ -90,6 +92,28 @@ pip install aws-durable-execution-sdk-python
pip install aws-durable-execution-sdk-python-testing
```

**For Java (Maven):**

```xml
<properties>
<aws-durable-execution-sdk-java.version>VERSION</aws-durable-execution-sdk-java.version>
</properties>

<dependencies>
<dependency>
<groupId>software.amazon.lambda.durable</groupId>
<artifactId>aws-durable-execution-sdk-java</artifactId>
<version>${aws-durable-execution-sdk-java.version}</version>
</dependency>
<dependency>
<groupId>software.amazon.lambda.durable</groupId>
<artifactId>aws-durable-execution-sdk-java-testing</artifactId>
<version>${aws-durable-execution-sdk-java.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
```

## When to Load Reference Files

Load the appropriate reference file based on what the user is working on:
Expand Down Expand Up @@ -132,14 +156,31 @@ def handler(event: dict, context: DurableContext) -> dict:
return result
```

**Java:**

```java
import software.amazon.lambda.durable.DurableHandler;
import software.amazon.lambda.durable.DurableContext;

public class MyHandler extends DurableHandler<MyInput, MyOutput> {
@Override
public MyOutput handleRequest(MyInput event, DurableContext ctx) {
var result = ctx.step("process", ProcessResult.class, s -> processData(event));
return result;
}
}
```

### Critical Rules

1. **All non-deterministic code MUST be in steps** (Date.now, Math.random, API calls)
2. **Cannot nest durable operations** - use `runInChildContext` to group operations
3. **Closure mutations are lost on replay** - return values from steps
4. **Side effects outside steps repeat** - use `context.logger` (replay-aware)

### Python API Differences
### Language-Specific API Differences

**Python:**

The Python SDK differs from TypeScript in several key areas:

Expand All @@ -148,6 +189,27 @@ The Python SDK differs from TypeScript in several key areas:
- **Exceptions**: `ExecutionError` (permanent), `InvocationError` (transient), `CallbackError` (callback failures)
- **Testing**: Use `DurableFunctionTestRunner` class directly - instantiate with handler, use context manager, call `run(input=...)`

**Java:**

The Java SDK uses a class-based approach and type-safe patterns:

- **Handler**: Extend `DurableHandler<TInput, TOutput>` and implement `handleRequest(TInput, DurableContext)`
- **Steps**: `ctx.step("name", ResultType.class, stepCtx -> operation())` - type must be specified
- **Wait**: `ctx.wait("name", Duration.ofSeconds(n))` - always name waits for debugging
- **Generic Types**: Use `TypeToken` for generic types like `List<T>`: `ctx.step("name", new TypeToken<List<User>>() {}, stepCtx -> ...)`
- **Exceptions**:
- `StepFailedException` - Step execution failed (permanent, business logic error)
- `StepInterruptedException` - Step was interrupted (transient, can retry)
- `CallbackTimeoutException` - Callback didn't complete within timeout
- `CallbackFailedException` - Callback failed or was rejected
- `WaitForConditionFailedException` - Condition check failed or max attempts exceeded
- `InvokeFailedException` - Lambda invocation failed
- `InvokeTimedOutException` - Lambda invocation timed out
- `DurableExecutionException` - Base class for all SDK exceptions
- **Testing**: Use `DurableFunctionTestRunner` class from testing SDK
- **Async Operations**: `stepAsync()`, `invokeAsync()`, `waitAsync()`, `mapAsync()` return `DurableFuture<T>`
- **Parallel Operations**: Use `parallel()` for heterogeneous operations or `DurableFuture.allOf()` / `DurableFuture.anyOf()`

### Invocation Requirements

Durable functions **require qualified ARNs** (version, alias, or `$LATEST`):
Expand Down Expand Up @@ -201,4 +263,5 @@ Access to sensitive data (like Lambda and API Gateway logs) is **not** enabled b
- [AWS Lambda durable functions Documentation](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html)
- [JavaScript SDK Repository](https://github.com/aws/aws-durable-execution-sdk-js)
- [Python SDK Repository](https://github.com/aws/aws-durable-execution-sdk-python)
- [Java SDK Repository](https://github.com/aws/aws-durable-execution-sdk-java)
- [IAM Policy Reference](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaBasicDurableExecutionRolePolicy.html)
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,18 @@

Advanced error handling patterns for durable functions, including timeout handling, circuit breakers, and conditional retry strategies.

**API Reference Conventions:**

- TypeScript/Python: Method names reference the `context` object (e.g., `waitForCallback` means `context.waitForCallback`)
- Java: Full reference with `ctx` prefix (e.g., `ctx.waitForCallback`) since Java uses `ctx` as the conventional variable name

## Timeout Handling with Callbacks

**Pattern:** Wait for an external callback with a timeout, and implement fallback logic if the timeout is reached.

**Implementation approach:**

1. Use `waitForCallback` (TypeScript) or `wait_for_callback` (Python) with a timeout configuration set in the config argument
1. Use `waitForCallback` (TypeScript), `wait_for_callback` (Python), or `ctx.waitForCallback` (Java) with a timeout configuration set in the config argument
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the java reference include the ctx object while the others don't?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

2. Wrap in try-catch to handle timeout errors
3. Check if the error is a timeout
4. Implement fallback logic in a step (e.g., escalate to manager, use default value, retry with different parameters)
Expand All @@ -32,7 +37,31 @@ Advanced error handling patterns for durable functions, including timeout handli
4. Execute fallback operation in a separate step

**Important limitation:**
In TypeScript, native setTimeout (and patterns like Promise.race using it) will fail during execution replays. To create a reliable timeout that persists across execution (expands over multi invocations), always use the timeout parameter provided by waitForCallback or waitForCondition
In TypeScript, native setTimeout (and patterns like Promise.race using it) will fail during execution replays. To create a reliable timeout that persists across execution (expands over multi invocations), always use the timeout parameter provided by waitForCallback or waitForCondition.

**Java equivalent - DurableFuture.anyOf:**
Java provides `DurableFuture.anyOf()` for racing multiple async operations, similar to `Promise.race()` in TypeScript:

```java
// Race multiple async operations - first to complete wins
var f1 = ctx.stepAsync("primary-api", Result.class, s -> callPrimaryAPI());
var f2 = ctx.stepAsync("backup-api", Result.class, s -> callBackupAPI());

// Wait for first to complete
DurableFuture.anyOf(f1, f2);

// Check which completed first
Result result;
try {
result = f1.get();
ctx.getLogger().info("Primary API completed first");
} catch (Exception e) {
result = f2.get();
ctx.getLogger().info("Backup API completed first");
}
```

For reliable cross-invocation timeouts that persist across replays, always use the timeout configuration in `CallbackConfig` or `WaitForConditionConfig`.

## Conditional Retry Based on Error Type

Expand Down Expand Up @@ -113,3 +142,62 @@ In TypeScript, native setTimeout (and patterns like Promise.race using it) will
- Callback timeouts - external system didn't respond in time
- External system delays - service is slow or unresponsive
- Long-running operations - operation exceeded expected duration

## Exception Type Reference

Complete exception types by category and language:

### TypeScript SDK Exceptions

| Exception Type | Category | Retryable | Use Case |
| ------------------------------ | ---------------- | --------- | ------------------------------------------------- |
| `UnrecoverableInvocationError` | Permanent | No | Business logic failures (validation, not found) |
| `InvocationError` | Transient | Yes | Infrastructure issues (Lambda retries invocation) |
| `CallbackTimeoutError` | Timeout | No | Callback didn't complete within timeout duration |
| `CallbackError` | Callback Failure | No | Callback failed or was explicitly rejected |
| `WaitForConditionTimeoutError` | Timeout | No | Condition polling exceeded timeout |
| `DurableExecutionsError` | Base | — | Base class for all SDK exceptions |

### Python SDK Exceptions

| Exception Type | Category | Retryable | Use Case |
| ------------------------ | ---------------- | --------- | ------------------------------------------------- |
| `ExecutionError` | Permanent | No | Business logic failures (returns FAILED status) |
| `InvocationError` | Transient | Yes | Infrastructure issues (Lambda retries invocation) |
| `CallbackError` | Callback Failure | No | Callback handling failures |
| `DurableExecutionsError` | Base | — | Base class for all SDK exceptions |

### Java SDK Exceptions

| Exception Type | Category | Retryable | Use Case |
| --------------------------------- | ----------------- | --------- | ------------------------------------------------------- |
| `StepFailedException` | Permanent | No | Step execution failed (business logic error) |
| `StepInterruptedException` | Transient | Yes | Step was interrupted (can retry) |
| `CallbackTimeoutException` | Timeout | No | Callback didn't complete within timeout duration |
| `CallbackFailedException` | Callback Failure | No | Callback failed or was explicitly rejected |
| `WaitForConditionFailedException` | Condition Failure | No | Condition check failed or max polling attempts exceeded |
| `InvokeFailedException` | Invoke Failure | No | Lambda invocation failed |
| `InvokeTimedOutException` | Timeout | No | Lambda invocation timed out |
| `DurableExecutionException` | Base | — | Base class for all SDK exceptions |

### Usage Guidelines

**Permanent failures** - Stop execution immediately, no retry:

- Validation errors
- Resource not found
- Authentication failures
- Business rule violations

**Transient failures** - Retry with backoff:

- Network timeouts
- Service unavailable (503)
- Rate limiting (429)
- Database connection failures

**Timeout failures** - Implement fallback logic:

- Callback timeouts → escalate to manager, use default value
- Condition timeouts → return partial results, notify operators
- Wait timeouts → trigger alternative workflow
Loading
Loading