Skip to content

Feature: Per-Agent Retry Policies #80

@jrob5756

Description

@jrob5756

Summary

Add configurable retry policies at the agent level so transient failures (API errors, rate limits, timeouts) don't kill entire workflows.

Motivation

Production agentic AI demands resilience. Today, one transient API failure kills the entire conductor workflow — the only recovery is checkpoint/resume (all-or-nothing). Production frameworks like LangGraph are positioned as leaders partly due to configurable retry, and real-world incidents (Amazon retail outages from AI agents, "two AIs talked for 2 hours booking nothing") underscore the need for fault tolerance.

Proposed Design

agents:
  - name: analyzer
    model: gpt-5.2
    retry:
      max_attempts: 3              # default: 1 (no retry)
      backoff: exponential         # or "fixed"
      delay_seconds: 2             # base delay
      retry_on:
        - provider_error           # API 500s, rate limits
        - timeout                  # agent-level timeout exceeded
        # NOT: validation_error — output schema mismatches indicate logic bugs, not transience

Behavior

  • Retry counter resets per agent execution (not per workflow)
  • Exponential backoff: delay * 2^attempt with jitter
  • Events emitted on each retry attempt (agent_retry event)
  • Final failure after exhausting retries follows existing error handling (fail_fast, continue_on_error, etc.)
  • Works in both sequential and parallel execution contexts

What Does NOT Retry

  • Output validation errors (schema mismatches) — these indicate prompt/schema issues, not transience
  • Human gate timeouts — these are intentional pauses
  • Workflow-level timeout exceeded — hard stop

Why It Fits Conductor

Effort Estimate

Low — wraps existing AgentExecutor.execute() in a retry loop with backoff logic.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions