Feature: Per-Agent Retry Policies

## Summary

Add configurable retry policies at the agent level so transient failures (API errors, rate limits, timeouts) don't kill entire workflows.

## Motivation

Production agentic AI demands resilience. Today, one transient API failure kills the entire conductor workflow — the only recovery is checkpoint/resume (all-or-nothing). Production frameworks like LangGraph are positioned as leaders partly due to configurable retry, and real-world incidents (Amazon retail outages from AI agents, "two AIs talked for 2 hours booking nothing") underscore the need for fault tolerance.

## Proposed Design

```yaml
agents:
  - name: analyzer
    model: gpt-5.2
    retry:
      max_attempts: 3              # default: 1 (no retry)
      backoff: exponential         # or "fixed"
      delay_seconds: 2             # base delay
      retry_on:
        - provider_error           # API 500s, rate limits
        - timeout                  # agent-level timeout exceeded
        # NOT: validation_error — output schema mismatches indicate logic bugs, not transience
```

### Behavior

- Retry counter resets per agent execution (not per workflow)
- Exponential backoff: `delay * 2^attempt` with jitter
- Events emitted on each retry attempt (`agent_retry` event)
- Final failure after exhausting retries follows existing error handling (`fail_fast`, `continue_on_error`, etc.)
- Works in both sequential and parallel execution contexts

### What Does NOT Retry

- **Output validation errors** (schema mismatches) — these indicate prompt/schema issues, not transience
- **Human gate timeouts** — these are intentional pauses
- **Workflow-level timeout exceeded** — hard stop

## Why It Fits Conductor

- Declarative YAML config, no imperative code needed
- Per-agent granularity, not per-workflow — fine-grained control
- Pairs naturally with existing `fail_fast`/`continue_on_error` parallel semantics
- Pairs with agent-level timeouts (#82) and fallback models (#84) for a complete resilience story

## Effort Estimate

Low — wraps existing `AgentExecutor.execute()` in a retry loop with backoff logic.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Per-Agent Retry Policies #80

Summary

Motivation

Proposed Design

Behavior

What Does NOT Retry

Why It Fits Conductor

Effort Estimate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: Per-Agent Retry Policies #80

Description

Summary

Motivation

Proposed Design

Behavior

What Does NOT Retry

Why It Fits Conductor

Effort Estimate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions