Skip to content

feat: harden missionRuns state machine, heartbeats, retries, and run health dashboard#150

Merged
brianorwhatever merged 8 commits intomainfrom
feat/p0-6-missionruns-hardening
Mar 2, 2026
Merged

feat: harden missionRuns state machine, heartbeats, retries, and run health dashboard#150
brianorwhatever merged 8 commits intomainfrom
feat/p0-6-missionruns-hardening

Conversation

@krusty-agent
Copy link
Copy Markdown
Collaborator

@krusty-agent krusty-agent commented Mar 2, 2026

Summary

  • harden missionRuns schema with required P0-6 fields:
    • attempt, parentRunId, durationMs, terminalReason, lastHeartbeatAt, artifactRefs[]
    • plus runtime SLO helpers: heartbeat interval/miss counters + escalation metadata
  • implement mission run lifecycle in missionControlCore:
    • create runs, strict state transitions, heartbeat updates
    • heartbeat timeout monitor (degraded after threshold misses, failed after failure threshold)
    • retry baseline with max auto-retry cap and escalation handoff
    • artifact append API with typed refs (screenshot|log|diff|file|url)
  • add run visibility queries for dashboards:
    • run listing/filtering
    • aggregate run health metrics (success/intervention/timeout rates + degraded run visibility)
  • wire REST v1 endpoints in missionControlApi + router:
    • GET/POST /api/v1/runs
    • POST /api/v1/runs/:runId/{heartbeat|transition|retry|artifacts|pause|kill|escalate|reassign}
    • POST /api/v1/runs/monitor
    • GET /api/v1/dashboard/runs
  • expand Team Dashboard runtime operations:
    • add live run list + operator controls wired to /api/v1/runs/:runId/{pause|kill|escalate|reassign}
    • preserve existing team cards/tree and run-health surfaces
  • upgrade readiness drill script:
    • scripts/mission-control-readiness-drill.mjs now exercises live run creation + runtime control endpoints (with dry-run fallback)
  • expand API key scopes for run controls/visibility:
    • runs:read, runs:write, runs:control, dashboard:read
  • update API.md with mission run + dashboard endpoint docs

Mission Control checklist (running)

Validation

  • git fetch origin --prune && git rebase origin/main ✅ (already up to date)
  • npx eslint convex/schema.ts convex/missionControlCore.ts convex/missionControlApi.ts convex/http.ts API.md src/pages/TeamDashboard.tsx scripts/mission-control-readiness-drill.mjs ⚠️ expected existing mission-control @typescript-eslint/no-explicit-any baseline remains
  • npx tsc -p convex/tsconfig.json --noEmit ⚠️ fails due pre-existing generated API typing drift (didResources missing in generated convex API typings), not introduced by this PR

Notes

This keeps P0-6 missionRuns hardening as one cohesive backend/API/dashboard visibility slice, scoped for merge readiness.

@krusty-agent krusty-agent changed the title feat: mission control launch controls (pause/kill/reassign/escalate) feat: harden missionRuns state machine, heartbeats, retries, and run health dashboard Mar 2, 2026
@brianorwhatever brianorwhatever merged commit 62e6e4c into main Mar 2, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants