Ratchet

Agentic prompt optimization with high watermarking. Ratchet uses an LLM proposer to iteratively improve a prompt, scores each change with a built-in LLM judge, and only keeps changes that beat the current high watermark. Like a ratchet wrench — it only moves forward.

Then it compresses the prompt for token efficiency while keeping quality above a configurable bar.

How It Works

┌───────────┐     ┌────────┐     ┌─────────┐     ┌───────────┐
│ Proposer  │────▶│ Prompt │────▶│  Judge  │────▶│ Watermark │
│ (Claude)  │     │ (.md)  │     │ (Claude)│     │   check   │
└───────────┘     └────────┘     └─────────┘     └───────────┘
      ▲                                                │
      │              ✓ kept (score > watermark)        │
      └────────────────────────────────────────────────┘
                     ✗ discarded (rollback)

Proposer — Claude reads your RATCHET.md spec, the current prompt, progress history, and learnings, then suggests one targeted change.

Judge — The built-in LLM judge runs the modified prompt against your test cases, then scores each response against your eval criteria. No custom scorer code needed.

Watermark — If the score beats the current watermark by at least --min-delta, the change is kept. Otherwise it's rolled back.

A Prompt Project is Three Files

prompt.md          ← the prompt being optimized
test_cases.json    ← inputs + expected outputs
RATCHET.md         ← goal, constraints, eval criteria

That's it. Ratchet handles everything else.

Quick Start

# Install
git clone git@github.com:hackerrdave/ratchet.git
cd ratchet && bun install

# Link globally
bun link

# Go to an example
cd examples/commit-writer
export ANTHROPIC_API_KEY="sk-ant-..."

# Phase 1: Optimize quality
ratchet start -n 10

# Inspect the prompt's tokenization
ratchet tokens -m stats

# Phase 2: Compress for token efficiency
ratchet compress -n 5

# Review everything
ratchet log
ratchet learnings

RATCHET.md

One file configures both sides of the loop — the proposer and the judge:

# Goal
Improve the commit message prompt to generate accurate conventional commits.

# Prompt
prompt.md

# Constraints
- Must keep the {{diff}} placeholder
- Must output only the commit message, no explanation
- Must use conventional commit format (type: description)

# Eval
- Test cases: test_cases.json
- Target: claude-haiku-4-5-20251001
- Correct commit type (30%)
- Accurately describes the change (40%)
- Conventional commit format with no extra text (15%)
- Concise — single line, under 72 characters (15%)

# Context
- Read test_cases.json to see the range of diffs and expected messages

Section	Purpose
Goal	What "better" means — fed to the proposer
Prompt	The file being optimized (the only file ratchet modifies)
Constraints	Rules the proposer must follow
Eval	Test cases, target model, and scoring criteria for the judge
Context	Additional files the proposer should read

Eval Section

Test cases — Path to a JSON file: [{"field": "value", "expected": "..."}, ...]. All fields except expected get substituted into the prompt as {{field}}. The judge compares LLM output against expected.
Target — The model your prompt is designed for. Runs test cases and judges output. Optional — defaults to haiku.
Criteria — Natural language scoring criteria with optional percentage weights. The judge evaluates all criteria for each test case and computes a weighted score.

Commands

`ratchet start` — Quality Phase

ratchet start -n 20 --model claude-haiku-4-5-20251001

Option	Default	Description
`-n, --iterations`	20	Number of iterations
`--min-delta`	0.001	Minimum improvement to accept
`--model`	claude-haiku-4-5-20251001	Proposer model
`--judge-model`	(target model)	Override judge model
`--fresh`	—	Start fresh, ignore saved state
`--max-spend`	—	Budget cap in USD

Each iteration shows a spinner while proposing and judging, then a structured result:

  Ratchet [a1b2c3d4]
  ──────────────────────────────────────────────────
  Prompt               prompt.md  24 tokens ($0.0000/call)
  Test cases           test_cases.json
  Proposer             claude-haiku-4-5-20251001
  Iterations           10
  Watermark            0.7500

  ⠹ [1/10] Proposing (3.2s)
  ✓ [1/10] Proposing (4.1s)
  ⠹ [1/10] Judging (test_cases.json) (5.1s)
  ✓ [1/10] Judging (test_cases.json) (8.3s)
  [1/10] ✓ kept  0.8200 (+0.0700) │ 48tok │ $0.0034 │ 12.3s
         Add explicit type prefix examples for common patterns

  [2/10] ✗ disc  0.7900 (-0.0300) │ 62tok │ $0.0041 │ 14.1s
         Add instruction to analyze file extensions

  Done
  ──────────────────────────────────────────────────
  Kept                 4
  Discarded            6
  Watermark            0.8650
  Total spend          $0.0382

`ratchet compress` — Efficiency Phase

Optimize token count while keeping quality within a margin:

ratchet compress -n 10 --quality-margin 0.03

Option	Default	Description
`-n, --iterations`	10	Compression attempts
`--model`	claude-haiku-4-5-20251001	Proposer model
`--judge-model`	(target model)	Override judge model
`--quality-bar`	(current watermark)	Quality score to protect
`--quality-margin`	0.03	Acceptable quality drop
`--min-token-reduction`	0.10	Required token savings to justify any quality drop
`--max-spend`	—	Budget cap in USD

Acceptance criteria:

Quality must stay ≥ quality bar − margin
Tokens must decrease
If quality drops: token reduction must be ≥ 10% (configurable)

`ratchet tokens` — Visualize Tokenization

See how the LLM tokenizes your prompt:

ratchet tokens              # color mode (default)
ratchet tokens -m stats     # aggregate breakdown
ratchet tokens -m heatmap   # efficiency heatmap
ratchet tokens -m boundary  # pipe-delimited tokens
ratchet tokens -m ids       # token ID table
ratchet tokens -f any.md    # any file, not just the prompt

Stats mode shows token distribution, most common tokens, chars/token ratio, and cost projections at various call volumes.

`ratchet log` — Iteration History

ratchet log

  Ratchet log
  ──────────────────────────────────────────────────
  Iterations           10 (7 kept, 3 discarded)
  Total cost           $0.01
  Model                claude-haiku-4-5-20251001
  Score                0.7800 → 0.8900

    1 ✓ 0.8200 +0.0400 │ 482tok │ $0.0012 │ quality
      Add structured output format with JSON schema
    2 ✗ 0.8100 -0.0100 │ 531tok │ $0.0014 │ quality
      Add chain-of-thought reasoning section
    ...
    1 ✓ 0.8850 -0.0050 │ 580tok │ $0.0015 │ efficiency
      Merge redundant JSON instructions into single block

  Token trajectory
  ──────────────────────────────────────────────────
  Tokens               482 → 465 (-17, -3.5%)
                       ▂▅▆█▅▃▁
  Quality              482 → 648 tokens (4 kept)
  Efficiency           580 → 465 tokens (3 kept)

`ratchet learnings` — What Worked and What Didn't

ratchet learnings

After each run, Claude extracts tactical learnings from the progress log. These are formatted with section icons and word-wrapped explanations, and fed back to the proposer on subsequent runs.

Other Commands

ratchet init              # Scaffold RATCHET.md interactively
ratchet watch             # Live staircase chart during a run
ratchet diff 1 3          # Diff prompt between two kept iterations
ratchet show 3            # Print prompt at a specific iteration
ratchet checkout 3        # Restore prompt to a specific iteration
ratchet pause             # Stop after current iteration
ratchet resume            # Clear pause flag

Models

Three distinct model roles:

Role	Set by	Purpose
Proposer	`--model` flag	The optimizer that suggests prompt changes
Target	`Target:` in RATCHET.md `# Eval`	The model your prompt is for. Runs test cases and (by default) judges output.
Judge	`--judge-model` flag	Scores responses against criteria. Defaults to target model.

Use a cheap proposer with a smarter judge:

ratchet start --model claude-haiku-4-5-20251001 --judge-model claude-sonnet-4-20250514

Both models are recorded in every progress entry so you always know what produced the results.

Project Layout

my-prompt/
  prompt.md              ← your prompt (the thing ratchet optimizes)
  test_cases.json        ← inputs + expected outputs
  RATCHET.md             ← config: goal, constraints, eval criteria
  .ratchet/              ← all generated output (gitignore this)
    watermark.txt        ←   current high watermark
    progress.log         ←   JSONL iteration history
    learnings.md         ←   extracted tactical learnings
    state.json           ←   resume checkpoint (deleted on completion)
    best/prompt.md       ←   best prompt so far
    snapshots/           ←   prompt at each kept iteration
      1/prompt.md
      3/prompt.md
      ...

Add .ratchet/ to your .gitignore. The only files you author are prompt.md, test_cases.json, and RATCHET.md.

Examples

Prompt Grader

Sentiment classification — optimize a prompt for accuracy, valid JSON, and confidence scores.

cd examples/prompt-grader
ratchet start -n 15

Commit Writer

Generate conventional commit messages from git diffs.

cd examples/commit-writer
ratchet start -n 10

Installation

Prerequisites

Bun v1.0+
An Anthropic API key

From Source

git clone git@github.com:hackerrdave/ratchet.git
cd ratchet
bun install
bun link   # makes `ratchet` available globally

Compile to Binary

bun run build           # ./dist/ratchet
bun run build:all       # macOS, Linux, Windows

Development

bun run src/cli.ts <command>   # run in dev
bun test                       # 86 tests
bun run typecheck              # type check

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ratchet

How It Works

A Prompt Project is Three Files

Quick Start

RATCHET.md

Eval Section

Commands

`ratchet start` — Quality Phase

`ratchet compress` — Efficiency Phase

`ratchet tokens` — Visualize Tokenization

`ratchet log` — Iteration History

`ratchet learnings` — What Worked and What Didn't

Other Commands

Models

Project Layout

Examples

Prompt Grader

Commit Writer

Installation

Prerequisites

From Source

Compile to Binary

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ratchet

How It Works

A Prompt Project is Three Files

Quick Start

RATCHET.md

Eval Section

Commands

ratchet start — Quality Phase

ratchet compress — Efficiency Phase

ratchet tokens — Visualize Tokenization

ratchet log — Iteration History

ratchet learnings — What Worked and What Didn't

Other Commands

Models

Project Layout

Examples

Prompt Grader

Commit Writer

Installation

Prerequisites

From Source

Compile to Binary

Development

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`ratchet start` — Quality Phase

`ratchet compress` — Efficiency Phase

`ratchet tokens` — Visualize Tokenization

`ratchet log` — Iteration History

`ratchet learnings` — What Worked and What Didn't

Packages