Graphsmith

AI-native planning and execution for graph-based programs.

Graphsmith turns natural language goals into typed executable graphs. An LLM plans semantic intent; Graphsmith compiles, validates, repairs, executes, and learns reusable skills from the result.

It is no longer just a text-pipeline planner. The current architecture supports typed IR planning, guarded execution, bounded loop lowering, local structural repair, closed-loop skill generation, skill promotion, and both local and remote skill registries.

Core idea

flowchart LR
    A["Goal"] --> B["Goal Policy + Deterministic Decomposition"]
    B --> C["Candidate Retrieval<br/>(Local + Remote Registry)"]
    C --> D["LLM IR Candidates"]
    D --> E["Deterministic Ranking + Selection"]
    E --> F["IR Compile + Normalize"]
    F --> G["Validate + Repair"]
    G --> H["Execute Graph"]
    H --> I["Trace + Outputs"]
    I --> J["Promotion Signals"]
    I --> K["Runtime / Region Repair"]
    K --> F
    I --> L["Closed-Loop Skill Generation"]
    L --> C

The LLM does not emit raw graph edges as the primary interface. It proposes semantic structure in IR, and Graphsmith handles graph mechanics, normalization, validation, and bounded repair deterministically.

What Graphsmith does now

Plans in a typed IR Goals become structured plans with steps, bindings, loop blocks, and guarded execution hints.
Compiles deterministically IR is lowered into executable graphs with explicit nodes, edges, outputs, and validation.
Repairs locally Graphsmith can normalize bad outputs, patch certain local contract issues, and regenerate failing regions instead of always replanning everything.
Executes and traces The runtime records node-level traces, skipped branches, loop iterations, and failures for inspection and repair.
Generates missing skills For bounded capability gaps, Graphsmith can generate, validate, publish, and reuse a new skill inside the same planning loop.
Promotes useful structure Repeated trace patterns can be surfaced as promotion candidates for reuse.
Uses local and remote registries Skills can be published locally or fetched from a hosted remote registry with provenance metadata.

Current capability envelope

Deterministic and LLM-backed text/JSON pipelines
Structural branches with guarded execution
Bounded loop lowering and execution
Closed-loop single-skill generation with repair-aware re-entry
Local subgraph regeneration and runtime-trace-guided region repair
Promotion mining from traces
Local registry, file-backed remote mock, and hosted HTTP remote registry
Frontier and stress harnesses for probing generalization boundaries

What it is not yet:

a general programming language runtime
a general code-editing agent by default
a system that can yet synthesize arbitrary multi-region programs reliably

Quickstart

# Install
pip install -e ".[dev]"

# Create a .env file with API keys (gitignored)
cp .env.example .env

# Publish example skills to a local registry
REG=$(mktemp -d)
for d in examples/skills/*/; do graphsmith publish "$d" --registry "$REG"; done

# Plan
graphsmith plan "normalize text and count words" \
  --registry "$REG" \
  --backend ir \
  --provider anthropic \
  --model claude-haiku-4-5-20251001

# Closed-loop solve
graphsmith solve "compute the median of numbers" --auto-approve

Hosted remote registry

Graphsmith now also supports a hosted remote registry flow.

# Search remote skills
graphsmith search count \
  --remote-registry https://graphsmith-remote-registry.graphsmith.workers.dev

# Publish to a remote registry
export GRAPHSMITH_REMOTE_TOKEN=...
graphsmith remote-publish examples/skills/text.word_count.v1 \
  --remote-registry https://graphsmith-remote-registry.graphsmith.workers.dev

The current remote setup is still an early foundation:

immutable packages
search/fetch/publish
local cache
provenance metadata

Trust-aware ranking, moderation, and richer quality policy are still future work.

Recommended planner configuration

graphsmith eval-planner \
  --backend ir \
  --ir-candidates 3 \
  --decompose \
  --provider anthropic \
  --model claude-haiku-4-5-20251001 \
  --goals evaluation/goals \
  --registry "$REG"

Flag	Purpose
`--backend ir`	Use IR planning and deterministic compilation
`--ir-candidates 3`	Generate multiple candidates and rerank them
`--decompose`	Add deterministic semantic decomposition
`--provider ...`	Use a live provider for planning
`--registry` / `--remote-registry`	Search local and/or remote skills

Project structure

graphsmith/
  planner/          IR planning, decomposition, compiler, repair, policy
  models/           Pydantic graph / package / planner models
  validator/        Deterministic graph + package validation
  runtime/          Execution engine, guards, loops, traces
  ops/              Primitive ops and provider-backed execution
  registry/         Local, aggregate, file-backed remote, HTTP client
  skills/           Closed-loop skill generation and autogen templates
  traces/           Trace storage and promotion mining
  evaluation/       Frontier, stress, and planner evaluation harnesses
  cli/              Typer CLI
examples/skills/    Example reusable skill packages
evaluation/         Goal suites for planner, frontier, and stress testing
docs/               Architecture notes and sprint docs
tests/              Automated regression coverage
cloudflare/         Hosted remote registry worker scaffold

Important CLI commands

Command	Description
`plan`	Generate a plan from a natural language goal
`plan-and-run`	Plan and execute in one step
`run-plan`	Run a saved plan
`solve`	Run the bounded closed-loop generation path
`publish`	Publish a skill to a local registry
`remote-publish`	Publish a skill to a remote registry
`search` / `show`	Search or inspect local/remote skills
`eval-planner`	Planner evaluation on goal sets
`eval-frontier`	Structural frontier evaluation
`eval-stress-frontier`	Isolated vs cumulative stress runs
`promote-candidates`	Surface promotion opportunities from traces

Run graphsmith --help for the full list.

Providers

# Anthropic
export GRAPHSMITH_ANTHROPIC_API_KEY=sk-ant-...

# Groq / OpenAI-compatible
export GRAPHSMITH_GROQ_API_KEY=gsk_...
graphsmith eval-frontier \
  --provider openai \
  --model llama-3.1-8b-instant \
  --base-url https://api.groq.com/openai/v1

Or create a .env file:

GRAPHSMITH_ANTHROPIC_API_KEY=sk-ant-...
GRAPHSMITH_GROQ_API_KEY=gsk_...
GRAPHSMITH_REMOTE_TOKEN=...

Testing

pytest
pytest -v
pytest -x

For live runs, use the evaluation harnesses:

graphsmith eval-frontier --goals evaluation/frontier_goals --registry "$REG" ...
graphsmith eval-stress-frontier --goals evaluation/stress_frontier_goals --registry "$REG" ...

Documentation

Roadmap summary

Graphsmith now has a strong programmable planning substrate, but true general-purpose coding still requires a few major leaps:

Graph-native skill synthesis Generated skills need to become small subgraphs with contracts and tests, not mostly single-step templates.
Region-level repair Failing loops, branches, and subplans need local regeneration as a normal workflow, not an edge case.
Richer runtime semantics More explicit bindings, structured errors, state/effects, and reusable subgraphs are needed to move from workflows toward real programs.
Tool and code environment integration Files, tests, shell commands, APIs, and code edits need to become first-class skill environments so Graphsmith can tackle real software tasks.
Trust-aware skill reuse Remote skill reuse should eventually depend on provenance, validation history, and policy, not just retrieval.

The short version: Graphsmith is already a serious experimental substrate for AI-native program planning, but the next phase is about generalized synthesis and repair rather than adding more domain-specific tricks.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github		.github
cloudflare/remote-registry-worker		cloudflare/remote-registry-worker
docs		docs
evaluation		evaluation
examples		examples
graphsmith		graphsmith
schemas		schemas
scripts		scripts
specs		specs
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graphsmith

Core idea

What Graphsmith does now

Current capability envelope

Quickstart

Hosted remote registry

Recommended planner configuration

Project structure

Important CLI commands

Providers

Testing

Documentation

Roadmap summary

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Graphsmith

Core idea

What Graphsmith does now

Current capability envelope

Quickstart

Hosted remote registry

Recommended planner configuration

Project structure

Important CLI commands

Providers

Testing

Documentation

Roadmap summary

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages