Skip to content

Define digest SLOs and incident runbooks #45

@TheTrueAI

Description

@TheTrueAI

Why

Reliability work needs explicit targets and operational response rules.

Scope

  • Define SLOs for digest run success, timeliness, and delivery integrity.
  • Document runbooks for common failures (LLM/provider outages, DB failures, email provider failures).
  • Include rollback and communication protocol.

Success Criteria

  • SLOs documented with measurable thresholds.
  • Runbook exists and is testable by on-call/self-maintainer workflow.
  • Post-incident review template included.

Effort / Impact

  • Effort: Small-Medium
  • Impact: High

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions