autoarc

Local service for benchmark-guided code improvement

日本語 · Why · Run Locally · Job Flow · Code Tour · API · Development

autoarc is an open-source Gleam service for running benchmark-driven code experiments on local Git repositories. It creates isolated worktrees, asks pi to propose changes, runs the benchmark itself, and records jobs plus experiments in SQLite.

autoarc keeps the loop intentionally small and inspectable: one repository, one benchmark contract, repeated measured code improvements.

For a deeper walkthrough, see docs/architecture.md and docs/code-tour.md.

Important

autoarc is designed for trusted local repositories. It runs model-authored edits, git, and bun benchmarks on your machine. Start with the bundled example, use it on repos you are comfortable modifying locally, and make sure you have a working local pi login before running real jobs. The trust model and operating guidance live in SECURITY.md.

What Autoarc Does

autoarc is intentionally narrow. It gives you:

a local HTTP service for running benchmark-guided code improvement jobs
a required human research_direction that sets the direction of work
a model-authored benchmark contract under .autoarc/
isolated experiments in Git worktrees
benchmark-owned promotion decisions
durable job and experiment history in SQLite plus inspectable logs and worktrees

autoarc is not trying to be a hosted platform, a general-purpose agent framework, or a sandbox for untrusted code.

Why autoarc

Benchmark-owned decisions. Models can propose changes, but autoarc runs the metric and decides whether a candidate improves the frontier.
Git isolation by default. Each experiment runs in its own worktree against a clean base commit.
Durable experiment history. Jobs, experiments, commits, metrics, and summaries are stored in SQLite.
Manual or automatic promotion. Improved candidates can wait for review or be rechecked automatically on top of the latest frontier.
Bounded long-running work. Agent calls and benchmark runs have explicit wall-clock time limits so jobs fail clearly instead of hanging forever.

What It Is Not

Not a hosted service. autoarc is built to run on your machine against repos on your machine.
Not a product prioritizer. It improves what your benchmark measures; it does not decide what your benchmark should care about.
Not a sandbox. It executes model-authored code and benchmarks with your local user permissions.
Not a sprawling framework. The scope is intentionally small enough to read, understand, and extend.

Inspiration

autoarc was inspired by two adjacent projects:

autoresearch, which explores autonomous experiment loops against a measurable training setup
symphony, which explores autonomous implementation runs against project work

Start With The Example

The fastest way to understand the workflow is to run the bundled example repo:

make live-test

That command:

starts the local service
copies test/fixtures/example_repo/ into a temp repo
asks pi to design a benchmark contract
runs a few experiments and prints a job summary

It resets ./autoarc-data/ at the start of the run and requires pi, bun, and a working local pi login.

Run Locally

Requirements:

Erlang/OTP 28
Gleam 1.14
git
bun
pi with a working local login
a clean target repository

Start the service:

gleam deps download
gleam run

You can also copy .env.example to .env and edit it locally. By default, runtime data is written to ./autoarc-data/ in this repo. That directory is gitignored so you can inspect autoarc.sqlite, logs, and worktrees without cluttering the repo.

Job Flow

POST /v1/jobs receives a local repo path, a required human research_direction, an experiment count, and a promotion mode.
The design step turns that human direction into .autoarc/config.json, .autoarc/benchmark.ts, and .autoarc/design.md on a job branch.
autoarc runs the baseline benchmark once to establish the starting frontier.
Mutation experiments run in separate worktrees and stay within the allowed editable paths.
Improved candidates are either held for manual promotion or rechecked automatically on top of the latest frontier before merge.

Benchmark Contract

The design step writes these files under .autoarc/:

.autoarc/config.json
.autoarc/benchmark.ts
.autoarc/design.md

config.json contains:

metric_name
direction as minimize or maximize
editable_paths as repo-relative file or directory prefixes

The benchmark entrypoint is .autoarc/benchmark.ts and it runs with:

bun run .autoarc/benchmark.ts

The benchmark prints one JSON object to stdout:

{"metric_name":"score","metric_value":1.23,"summary":"short explanation"}

API

Create a job:

curl -X POST http://127.0.0.1:8000/v1/jobs \
  -H 'content-type: application/json' \
  -d '{
    "repo_path": "/absolute/path/to/repo",
    "research_direction": "Improve the benchmark by simplifying hot-path code and avoiding broad refactors.",
    "num_experiments": 4,
    "promotion_mode": "auto"
  }'

Inspect a job:

curl http://127.0.0.1:8000/v1/jobs/1

Inspect an experiment:

curl http://127.0.0.1:8000/v1/experiments/1

Promote a candidate manually:

curl -X POST http://127.0.0.1:8000/v1/experiments/1/promote

Configuration

Env var	What it controls	Default
`HOST`	Bind host	`127.0.0.1`
`PORT`	HTTP port	`8000`
`DATA_DIR`	Runtime data directory	`./autoarc-data`
`SECRET_KEY_BASE`	Wisp/Mist signing secret	`autoarc-dev-secret`
`API_KEY`	Optional `x-api-key` requirement	unset
`MAX_CONCURRENCY`	Worker concurrency	`2`
`DEFAULT_MODEL`	Default `pi` model override	unset
`AGENT_TIMEOUT_MS`	Wall-clock limit for each `pi` design or experiment command	`900000`
`BENCHMARK_TIMEOUT_MS`	Wall-clock limit for each benchmark run	`300000`

Runtime data lives under DATA_DIR:

autoarc.sqlite lives at the top level of DATA_DIR
per-repo job artifacts live under DATA_DIR/repos/<repo_id>/jobs/<job_id>/
repo_id is Autoarc's stable internal identifier for a canonical repo root
completed and failed jobs keep an inspectable frontier worktree in their job directory
internal experiment worktrees and temporary autoarc/* branches are cleaned up when they are no longer needed

Repository Layout

src/autoarc/runtime/ HTTP entrypoints, coordinator, workers, and message flow
src/autoarc/integration/ shell, git, benchmark, and pi boundaries
src/autoarc/persistence/ SQLite schema setup and queries
src/autoarc/types/ shared records and enums grouped by concern
docs/architecture.md lifecycle, promotion rules, and design notes
docs/code-tour.md reading order, file map, and common change paths
test/ API and workflow tests
test/fixtures/example_repo/ small repo used by tests and the live workflow harness

Code Tour

If you are opening the repo for the first time, start here:

src/autoarc/runtime/app.gleam
src/autoarc/runtime/api.gleam
src/autoarc/runtime/coordinator.gleam
src/autoarc/runtime/worker.gleam

That path gets you from boot, to HTTP entry, to scheduling, to the side-effect work itself.

For the deeper version, including “what file to read for what question,” see docs/code-tour.md.

Development

make help
make deps
make format
make test

You can still run the raw Gleam commands directly:

gleam format
gleam test

For the opt-in live workflow test:

make live-test

You can override the live-test setup from .env or on the command line:

make live-test \
  MODEL=gpt-5.4 \
  EXPERIMENT_COUNT=5 \
  CONCURRENCY=1 \
  RESEARCH_DIRECTION='Focus on parser and benchmark hot paths.'

The live test uses DATA_DIR itself as the runtime data directory, so you can inspect the normal on-disk layout directly in ./autoarc-data by default. It resets that directory at the start of the run, then uses a fresh temp copy of test/fixtures/example_repo/ as the input repo. It requires pi, bun, and a working local pi login.

See SECURITY.md for the trust model, CONTRIBUTING.md for contributor workflow, and AGENTS.md for repo-specific agent instructions.

Built by Arcnem AI in Tokyo.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.ja.md		README.ja.md
README.md		README.md
SECURITY.md		SECURITY.md
arcnem-logo.svg		arcnem-logo.svg
gleam.toml		gleam.toml
manifest.toml		manifest.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoarc

What Autoarc Does

Why autoarc

What It Is Not

Inspiration

Start With The Example

Run Locally

Job Flow

Benchmark Contract

API

Configuration

Repository Layout

Code Tour

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autoarc

What Autoarc Does

Why autoarc

What It Is Not

Inspiration

Start With The Example

Run Locally

Job Flow

Benchmark Contract

API

Configuration

Repository Layout

Code Tour

Development

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages