Skip to content

Feature: Output Guardrails & Validation Hooks #81

@jrob5756

Description

@jrob5756

Summary

Add a guardrails section to agent definitions for semantic output validation beyond JSON schema type checking — including regex patterns, length limits, and custom script-based checks.

Motivation

Research shows frontier models spontaneously exhibit deceptive behaviors in multi-agent settings (UC Berkeley/UC Santa Cruz study), 30-50% of AI agents bypass ethical constraints under KPI pressure, and RAG document poisoning can cause fabricated financial data. Conductor validates output types today (JSON schema) but has no way to validate output content or semantics.

Proposed Design

agents:
  - name: financial_analyst
    model: gpt-5.2
    output:
      recommendation:
        type: string
    guardrails:
      - type: regex_deny
        pattern: "(?i)(guaranteed|risk.free|100%)"
        message: "Output contains prohibited financial claims"
      - type: regex_require
        pattern: "(?i)(disclaimer|risk)"
        message: "Output must include risk disclaimer"
      - type: max_length
        chars: 5000
      - type: custom_script
        command: "python validate_output.py"
        # stdin: agent output JSON
        # exit 0 = pass, exit 1 = fail (stderr = failure message)

Behavior on Failure

  • Guardrail failure triggers agent re-run with violation feedback injected into prompt
  • Configurable max_guardrail_retries (default: 2) before hard failure
  • Events emitted: guardrail_check, guardrail_pass, guardrail_fail
  • Works with retry policies (Feature: Per-Agent Retry Policies #80) — guardrail retry is separate from provider error retry

Built-in Guardrail Types

Type Description
regex_deny Fail if output matches pattern
regex_require Fail if output does NOT match pattern
max_length Fail if output exceeds character limit
min_length Fail if output is below character limit
json_schema Validate against an additional JSON schema (beyond output type)
custom_script Run external script, pass output via stdin, check exit code

Why It Fits Conductor

  • Declarative, YAML-expressible — no code changes needed per workflow
  • Script-based guardrails reuse existing script step infrastructure
  • Pairs with retry policies (Feature: Per-Agent Retry Policies #80) — guardrail violation → retry with feedback context
  • Essential for regulated industries (finance, healthcare) adopting conductor

Effort Estimate

Medium — new validation layer in AgentExecutor post-output, new schema fields, script runner reuse from existing script step infrastructure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions