Open standard for agent observability.
One schema for how AI agents report work — regardless of runtime, model, or provider.
Code is commoditizing. The durable value in AI agent infrastructure is data, provenance, protocols, and evals. This spec is the protocol layer.
agent-run defines a standard AgentRun object that any agent runtime can emit and any dashboard can consume. It answers: what ran, how it ran, what it cost, and whether it worked.
| Object | Purpose |
|---|---|
| AgentRun | A single agent execution — the atomic unit of observability |
| Step | One action within a run (reasoning, tool call, error, handoff) |
| Cost | Token usage and dollar cost attribution |
| Provenance | Cryptographic proof of how an output was produced |
| EvalResult | Scoring and quality assessment of a run |
pnpm add @agent-run/typesimport type { AgentRun, Step, Provenance } from '@agent-run/types';
const run: AgentRun = {
id: crypto.randomUUID(),
agent_id: 'my-agent',
status: 'completed',
started_at: new Date().toISOString(),
steps: [],
cost: { input_tokens: 1500, output_tokens: 800, cost_usd: 0.012 },
provenance: { run_hash: '...' }
};
// Report to any agent-run compliant server
await fetch('http://localhost:3000/api/v1/runs', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-API-Key': key },
body: JSON.stringify(run)
});[dependencies]
agent-run-types = "0.1"use agent_run_types::AgentRun;Schemas are in schemas/. Validate any JSON against them:
npx ajv validate -s schemas/agent-run.json -r 'schemas/*.json' -d my-run.jsonschemas/
agent-run.json # Root object — the run itself
step.json # Individual steps within a run
cost.json # Token usage and cost
provenance.json # Cryptographic audit trail
eval.json # Evaluation/scoring results
openapi.yaml # Full API spec for compliant servers
Every run gets a run_hash — a SHA-256 of the canonical inputs (agent_id, model, tools, config, trigger). Runs triggered by other runs form a hash chain via lineage. This creates a verifiable audit trail: given the same inputs, anyone can reproduce the hash.
Optional Ed25519 signatures (signed_by + signature) enable tamper detection for enterprise use.
Runs can be scored after completion. The EvalResult tracks:
- pass/fail against acceptance criteria
- score (0-100) for nuanced grading
- metrics: cost, duration, tool calls, retries, convergence
- regression detection via
regression_fromlinking
Benchmark packs in bench/ provide reproducible evaluation scenarios.
| Server | Status |
|---|---|
| Mission Control | Reference implementation |
Want to add yours? Open a PR.
| Runtime | Status |
|---|---|
| Mission Control (spawned agents) | Built-in |
| OneClaw | Planned |
| Claude Code (via MC MCP) | Via adapter |
Building an agent runtime? Emit AgentRun objects to any compliant server.
- Schema-first — JSON Schema is the source of truth. Types are generated.
- Privacy-preserving —
input_previewandoutput_previeware truncated. Full prompts/outputs are never required by the spec. - Runtime-agnostic — Works with any agent framework, model, or provider.
- Incrementally adoptable — Only
id,agent_id,status,started_at,steps,cost, andprovenanceare required. Everything else is optional. - Extensible —
metadatafields on every object for runtime-specific data.
MIT