Agent OS

A fully autonomous software organization — staffed by AI agents, managed by cron, and designed to run indefinitely without human input.

You give it a backlog. It ships product.

Public proof — everything is auditable: Reliability dashboard · Case study · Live discussion

See it work — real task, zero human intervention

Real execution: Issue #115 → agent dispatched → code written → tests pass → PR #122 merged → issue closed. No human touched it.

Goal

Make Agent OS the most credible autonomous software organization for technical founders and solo builders: a system that can reliably turn backlog input into useful shipped work, improve itself from operational evidence, and earn trust through visible results. Prioritize work that increases adoption, reliability, evidence quality, and operator confidence over work that only creates attention.

This README was written by an agent. The CI pipeline was built by an agent. The backlog groomer that generates improvement tickets was written by an agent dispatched from a ticket that was generated by the log analyzer. It's turtles all the way down.

Why Agent OS?

Most AI tools make individual developers faster. Agent OS asks a different question: what if the developers were optional?

Not because humans aren't valuable — but because most engineering work is structured, bounded, and repetitive enough that a well-orchestrated team of AI agents can handle it autonomously. The hard part was never the coding. It was the coordination: task state, routing, context preservation, failure recovery, quality gates, and institutional memory.

Agent OS solves coordination. The agents do the rest.

It's not a copilot. It's not a chatbot. It's a team you deploy.

The Loop

            GitHub Issue (Backlog)
                    │
            Status → Ready
                    │
            ┌───────▼────────┐
            │   Dispatcher   │  LLM-formats task, routes by repo + type
            └───────┬────────┘
                    │
            ┌───────▼────────┐
            │  Queue Engine  │  Worktree → Agent → Result → Retry/Escalate
            └───────┬────────┘
                    │
              Push branch, open PR
                    │
            ┌───────▼────────┐
            │  PR Monitor    │  CI green → merge · Conflict → rebase · Fail → escalate
            └───────┬────────┘
                    │
              Issue closed, board → Done
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
  Log Analyzer            Backlog Groomer
  Files fix tickets ──► back into the backlog

That last arrow is the point. The system files tickets about its own failures. Those tickets enter the backlog. The agents fix them. The fixes get merged. Next week, the system is better. Indefinitely.

Recursive Self-Improvement

This is the part that makes Agent OS different from a task runner.

Every Monday — the log analyzer reads a week of execution metrics, synthesizes failure patterns, and files fix tickets with evidence and reasoning
Every Saturday — the backlog groomer scans for stale issues, risk flags, and undocumented known issues, then generates improvement tasks
Every sprint — the strategic planner evaluates business-outcome metrics, adjusts priorities, and selects the next sprint from the backlog

These generated issues are indistinguishable from human-written ones. They enter the same queue, get dispatched to the same agents, go through the same CI → merge pipeline. The system literally engineers itself.

Get Started in 5 Minutes

Option A: Sandbox demo (2 minutes)

Zero config — creates a test issue, dispatches it to Claude, and shows the full loop:

git clone https://github.com/kai-linux/agent-os && cd agent-os
gh auth login          # only prerequisite besides claude CLI
./demo.sh              # or: make demo

Requirements: gh (authenticated), python3, claude CLI. Works on macOS and Linux.

Option B: Production setup (5 minutes)

Run Agent OS against your own repo with full dispatch, CI gating, and auto-merge.

Step 1 — Clone and install

git clone https://github.com/kai-linux/agent-os && cd agent-os
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Step 2 — Authenticate GitHub

gh auth login
gh auth refresh -s project            # needed for GitHub Projects board access

Step 3 — Configure

cp example.config.yaml config.yaml

Edit config.yaml — the minimum you need to set:

root_dir: "~/agent-os"
worktrees_dir: "/srv/worktrees"       # any writable path for agent worktrees
allowed_repos:
  - /path/to/your/repo                # local clone of the repo agents will work on
default_allow_push: true

Step 4 — Create your first task

Open an issue on your repo with a clear title and body containing:

## Goal
<what you want done>

## Success Criteria
- <measurable outcome>

## Constraints
- <any boundaries>

Then move the issue to Ready on your GitHub Projects board (or add a Status: Ready label).

Step 5 — Dispatch and watch

# Run the dispatcher once to pick up your Ready issue
python3 -m orchestrator.github_dispatcher

# Run the queue to execute the task
python3 -m orchestrator.queue

# Check the result
cat runtime/mailbox/*/result/.agent_result.md

The agent clones a worktree, writes code, runs tests, pushes a branch, and opens a PR.

Step 6 — View results

# See the PR the agent created
gh pr list --repo your-user/your-repo

# Auto-merge when CI passes (run on a loop or cron)
python3 -m orchestrator.pr_monitor

Optional: set up cron for full autonomy

# Add to crontab — see docs/configuration.md for full reference
crontab -l 2>/dev/null; echo "
* * * * * cd $HOME/agent-os && .venv/bin/python3 -m orchestrator.github_dispatcher >> runtime/logs/dispatcher.log 2>&1
* * * * * cd $HOME/agent-os && .venv/bin/python3 -m orchestrator.queue >> runtime/logs/queue.log 2>&1
*/5 * * * * cd $HOME/agent-os && .venv/bin/python3 -m orchestrator.pr_monitor >> runtime/logs/pr_monitor.log 2>&1
"

Once cron is running, the system dispatches, executes, reviews, and merges autonomously.

How It Works

Component	Role	Cadence
`github_dispatcher.py`	Triages backlog, assigns + formats tasks	Every minute
`queue.py`	Routes to best agent, retries, escalates	Per task
`pr_monitor.py`	CI gate, auto-merge, auto-rebase	Every 5 min
`log_analyzer.py`	Failure analysis → fix tickets	Weekly
`agent_scorer.py`	Execution + business-outcome scoring	Weekly
`backlog_groomer.py`	Backlog hygiene + task generation	Config-driven
`strategic_planner.py`	Sprint planning from evidence + objectives	Per sprint

4 agents in the pool: Claude, Codex, Gemini, DeepSeek — routed by task type with automatic fallback chains.

The backlog is GitHub Issues. The sprint board is GitHub Projects. The standup is Telegram. The office is a $5/month VPS.

Key Design Choices

GitHub is the entire control plane — no second system
Markdown files, not message brokers — you can ls the queue
Isolated worktrees — agents never collide
One contract, many agents — .agent_result.md is the only interface
Memory that compounds — CODEBASE.md grows with every completed task

Capability Ladder

Level	What	Status
1	Reliable execution engine	Done
2	Strategic planning + retrospectives	Current
3	Evidence-driven planning (analytics, research, product inspection)	In progress
4	Closed-loop optimization (hypothesis → experiment → measurement)	Next
5+	Self-directed growth across repos and products	Future

Built with Agent OS

Agent OS manages its own development. In 27 days it shipped 75 merged PRs, closed 100 issues, and produced 327 commits — autonomously dispatching tasks, reviewing CI, and merging changes with zero human intervention per task.

Reliability dashboard → Rolling 14-day success rate, per-agent breakdown, blocker categories, completion time, and escalation rate from PRODUCTION_FEEDBACK.md and agent_stats.jsonl. Community case study →

Read the full case study → · GitHub Discussion

Metric	Value
Issues closed	100 of 108 (93%)
PRs merged	75 of 83 (90%)
Commits	327 in 27 days (~12/day)
Agent tasks executed	143 (60.8% first-attempt success)
GitHub stars	2
GitHub forks	0

Documentation

Topic	Link
Deployment guide for solo builders	docs/deployment-guide.md
Architecture, team roles, observability, safety	docs/architecture.md
Task execution, handoff contract, retry logic	docs/execution.md
Configuration, objectives, evidence, cron setup	docs/configuration.md
Roadmap and capability ladder	docs/roadmap.md
Case study: self-managed repo	docs/case-study-agent-os.md
Public reliability dashboard	docs/reliability/README.md
Case study discussion	GitHub Discussions #167

Get Involved

Try it — clone the repo and run ./demo.sh to see an agent ship code in minutes.

Contribute — check open issues or file one. PRs welcome.

Questions? — open a discussion or reach out via the repo.

Name		Name	Last commit message	Last commit date
Latest commit History 335 Commits
.github		.github
bin		bin
docs		docs
objectives		objectives
orchestrator		orchestrator
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CODEBASE.md		CODEBASE.md
CRON.md		CRON.md
LICENSE		LICENSE
Makefile		Makefile
NORTH_STAR.md		NORTH_STAR.md
PLANNING_PRINCIPLES.md		PLANNING_PRINCIPLES.md
PR98_POSTMORTEM_COMMENT.md		PR98_POSTMORTEM_COMMENT.md
README.md		README.md
RUBRIC.md		RUBRIC.md
SPRINT_REPORT.md		SPRINT_REPORT.md
STRATEGY.md		STRATEGY.md
demo.sh		demo.sh
example.config.yaml		example.config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent OS

See it work — real task, zero human intervention

Goal

Why Agent OS?

The Loop

Recursive Self-Improvement

Get Started in 5 Minutes

Option A: Sandbox demo (2 minutes)

Option B: Production setup (5 minutes)

How It Works

Key Design Choices

Capability Ladder

Built with Agent OS

Documentation

Get Involved

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent OS

See it work — real task, zero human intervention

Goal

Why Agent OS?

The Loop

Recursive Self-Improvement

Get Started in 5 Minutes

Option A: Sandbox demo (2 minutes)

Option B: Production setup (5 minutes)

How It Works

Key Design Choices

Capability Ladder

Built with Agent OS

Documentation

Get Involved

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages