Evolutionary artifact optimization for Claude Code — an open-source implementation of AlphaEvolve-style evolutionary search that runs natively inside Claude Code sessions.
Claude Evolve uses MAP-Elites quality-diversity search with island-based populations to evolve programs, prompts, algorithms, configurations, and any text artifact — with Claude acting as both the intelligent mutation engine and an autonomous research agent.
Claude Evolve achieved sum of radii = 2.6359835671240317 (16 significant digits) for packing 26 non-overlapping circles in a unit square, starting from generic baselines only — no seeding from any known solution. This numerically exceeds all published results:
| System | Sum of Radii | Delta vs Ours | Source |
|---|---|---|---|
| Claude Evolve v4 | 2.6359835671240317 | — | This project |
| ThetaEvolve (8B model) | 2.63598308 | +4.87e-07 | ArXiv 2511.23473 |
| AlphaEvolve (DeepMind) | 2.6359830849 | +4.82e-07 | ArXiv 2506.13131 |
| ShinkaEvolve (Sakana AI) | 2.63597770 | +5.87e-06 | ArXiv 2509.19349 |
| OpenEvolve community | 2.635977 | +6.57e-06 | GitHub #156 |
| FICO Xpress (ZIB/MODAL) | 2.635916 | +6.81e-05 | FICO blog |
The solution uses 7 generic initialization patterns (ring, hex grid, Halton quasi-random, sunflower spiral, diagonal strips, billiard, corners+edges), 3-stage optimization (penalty relaxation + LP radii + SLSQP joint), ultra-fine coordinate descent, and warm-cache accumulation across evolution iterations. All constraints are evaluator-valid (gaps >= -1e-6).
Read the full paper | Verify the result | Solution code
- How It Works
- Key Features
- Architecture
- v2/v3 Research-Driven Features
- Installation
- Quick Start
- Evaluation Modes
- Configuration
- CLI Reference
- Plugin Commands
- MAP-Elites & Island Evolution
- Writing Evaluators
- Advanced Usage
- Project Structure
- License
Claude Evolve turns Claude Code into an evolutionary optimization engine. You provide:
- An artifact — the file you want to improve (a Python program, a prompt, a config, an algorithm)
- An evaluator — a script that scores candidates on 0.0-1.0 metrics, or a prompt for Claude-as-judge
Then Claude Evolve runs an evolution loop:
Each iteration, Claude receives different context — a new parent artifact selected from the population, inspiration from diverse high-performing solutions, stagnation diagnostics, strategy directives, and guidance on unexplored regions of the solution space.
- MAP-Elites quality-diversity search — maintains diverse elite solutions across configurable feature dimensions
- Island-based evolution — multiple isolated populations with periodic migration
- Universal novelty system — 3-layer similarity (structural + behavioral + semantic) working across all artifact types, not just code
- Stepping stones archive — preserves diverse intermediate solutions that open new search space regions
- 7 built-in strategies — Incremental, Creative Leap, Hybrid Synthesis, Research-Driven, Solver Hybrid, Multi-Iteration Accumulation, Problem Decomposition
- Stagnation Engine — detects plateaus (5 levels: NONE to CRITICAL) and adapts exploration
- Continuous G_t Signal — AdaEvolve-inspired exponential moving average replacing discrete stagnation levels, driving all adaptation from a single continuous signal
- Research Agent — literature search and approach discovery via web search
- Diagnostician Agent — root cause analysis of why evolution is stuck
- UCB1 Strategy Selection — bandit-based strategy selection replacing weighted-random, with capped reward and exploration modulation
- Cross-Run Memory — persists learnings, failed approaches, and successful strategies across runs
- Meta-Scratchpad — periodic pattern synthesis from evolution history (ShinkaEvolve-inspired)
- Verbal Gradients — pairwise reflection comparing artifacts to generate directional mutation guidance (ReEvo-inspired)
- Thought-Code Coevolution — evolves natural-language rationale alongside code for better LLM reasoning (EoH-inspired)
- Warm-Start Cache — persists intermediate computation (numpy arrays, JSON, text) between iterations with LRU eviction
- Multi-Iteration Accumulation — each iteration continues from where the last left off, enabling sustained search across hundreds of iterations
- Evaluation Caching — skip re-evaluation of deterministic results
- Solution Seeding — inject known-good solutions into the population
- Power-Law Parent Selection — rank-based selection with adaptive alpha from G_t signal and offspring novelty weighting (ShinkaEvolve/FunSearch-inspired)
- Failure Reflexion — captures recent failures with reasons, injecting "avoid these" guidance into future iterations
- Pre-Evaluation Novelty Gate — rejects near-duplicate candidates before wasting evaluation budget
- IterationOrchestrator — unified coordination of all feature modules for next/submit lifecycle
- Structural similarity — token n-gram overlap analysis working across Python, JS, YAML, JSON, SQL, markdown, and prose
- Behavioral similarity — metric fingerprint comparison (normalized score vectors across evaluation dimensions)
- Semantic fingerprints — concept extraction identifying algorithmic ideas, data structures, and approaches
- Stepping stones — archive of diverse intermediate solutions injected into iteration context for crossover-inspired evolution
- Artifact-agnostic — same novelty pipeline handles code, prompts, configs, and any text artifact
- Native plugin — runs inside Claude Code sessions via
/evolvecommand - Dynamic per-iteration prompts — each iteration gets fresh context with population insights and strategy directives
- Full autonomy per iteration — Claude can use web search, spawn subagents, run code, and apply any available skill
- Critic mode — Claude acts as adversarial evaluator for non-code artifacts
- Quantitative problems (math, optimization) — warm cache, multi-iteration accumulation, constraint propagation
- Qualitative problems (business, writing) — research agents, section-by-section iteration, style consistency
- Hybrid problems (data science, ML) — model checkpoints, hyperparameter search, problem decomposition
- 1039 tests covering unit, integration, and end-to-end flows
- Subprocess isolation — evaluator runs in isolated subprocess with timeout protection
- Checkpoint/resume — periodic snapshots with seamless resume
- Session isolation — multiple sessions don't interfere
- Fresh init —
initclears stale state by default (cross-run memory preserved)
Claude Evolve is a three-layer hybrid system:
Layer 1 (Shell) manages the iteration lifecycle. The stop hook calls diagnose (stagnation detection) before next (context generation).
Layer 2 (Python) provides all deterministic logic — MAP-Elites database, evaluation, stagnation detection, strategy selection, warm-start caching, cross-run memory, research log management, and state persistence. This is a standalone pip-installable package with a claude-evolve CLI.
Layer 3 (Skill + Agents) teaches Claude how to behave during each iteration, with problem-type-specific guidance and specialized agents for research and diagnosis.
Detects when evolution has plateaued and adapts the search strategy:
| Level | Iterations Stuck | Response |
|---|---|---|
| NONE | 0-2 | Continue normally |
| MILD | 3-5 | Increase exploration, try new approaches |
| MODERATE | 6-10 | Paradigm shift, spawn research agent |
| SEVERE | 11-20 | Radical departure, spawn diagnostician |
| CRITICAL | 20+ | Full restart, problem reformulation |
7 built-in strategies, selected based on stagnation level and past performance:
- Incremental Improvement — small targeted changes (low exploration)
- Creative Leap — ignore current approach, try something novel (high exploration)
- Hybrid Synthesis — combine best elements from top solutions
- Research-Driven — 80% effort on literature review, then implement
- Solver Hybrid — formulate as constraint satisfaction, use solvers
- Multi-Iteration Accumulation — continue from warm-cached state
- Problem Decomposition — break into independent sub-problems
Persists intermediate computation between iterations:
# In your candidate code:
import os, numpy as np
# Load from previous iteration
cache_file = '.claude/evolve-state/warm_cache/items/best_matrix.npy'
if os.path.exists(cache_file):
prev_best = np.load(cache_file)
# Continue optimizing from prev_best
# Save for next iteration
os.makedirs('.claude/evolve-state/warm_cache/items', exist_ok=True)
np.save(cache_file, my_result)Learnings persist across evolution runs:
- Failed approaches (avoid repeating)
- Successful strategies (build on)
- Key insights from prior runs
Traditional code-evolution systems use code-specific similarity (AST diff, token overlap). Claude Evolve's novelty system works across all artifact types via three complementary layers:
| Layer | Method | What It Captures |
|---|---|---|
| Structural | Token n-gram overlap (bigrams + trigrams) | Surface-level textual similarity |
| Behavioral | Metric fingerprint cosine similarity | Functional equivalence (same scores = same behavior) |
| Semantic | Concept extraction + Jaccard overlap | Algorithmic ideas, data structures, approaches |
Combined similarity = weighted average (structural 0.4, behavioral 0.3, semantic 0.3). Candidates above the novelty threshold are rejected as duplicates, preserving population diversity.
Inspired by FunSearch's best-shot prompting and ShinkaEvolve's novelty rejection sampling, the stepping stones archive preserves diverse intermediate solutions — not just the best. These are injected into iteration context to enable crossover-style evolution:
- Each submission is checked against the archive for novelty
- Sufficiently novel solutions are preserved regardless of fitness
- During context generation, stepping stones from different search space regions are selected
- Claude can combine ideas from stepping stones with the current best (semantic crossover)
This prevents the population from collapsing to a single approach and enables discovering solutions that require traversing low-fitness intermediates.
- Python 3.10+
- Claude Code installed
jq(for stop hook JSON processing):apt install jqorbrew install jq
git clone https://github.com/BudEcosystem/ClaudeEvolve.git
cd ClaudeEvolve
# Install with a virtual environment (recommended)
bash install.sh --venv
# Or install directly
bash install.shclaude-evolve --help
cd claude_evolve && python -m pytest tests/ -q # 1039 testscd ClaudeEvolve
claude # Start Claude Code
# In Claude Code:
/evolve circle_packing/program.py circle_packing/evaluator.py --max-iterations 50 --target-score 1.0/evolve ramsey_R5_5/program.py ramsey_R5_5/evaluator.py --max-iterations 500 --target-score 1.0/evolve my_prompt.md eval_criteria.md --mode critic --max-iterations 20The evaluator is a Python script that scores candidates:
# evaluator.py
import json, sys
def evaluate(candidate_path):
return {
"combined_score": 0.85,
"accuracy": 0.9,
"efficiency": 0.8,
}
if __name__ == "__main__":
result = evaluate(sys.argv[1])
print(json.dumps(result))Claude spawns an adversarial critic agent:
/evolve prompt.md eval_criteria.md --mode criticCombines script evaluation with critic feedback:
/evolve algorithm.py benchmark.py --mode hybridClaude Evolve uses YAML configuration with sensible defaults. Key options:
| Category | Option | Default | Description |
|---|---|---|---|
| General | max_iterations |
50 | Maximum evolution iterations |
| General | target_score |
null | Early stop threshold |
| Database | num_islands |
5 | Isolated populations |
| Database | population_size |
1000 | Max population per island |
| Database | feature_dimensions |
["complexity", "diversity"] |
MAP-Elites grid |
| Database | similarity_threshold |
0.99 | Novelty rejection threshold |
| Evaluator | timeout |
300 | Evaluation timeout (seconds) |
| Stagnation | enabled |
true | Enable stagnation detection |
| Stagnation | mild_threshold |
3 | Iterations for MILD level |
| Cross-Run | enabled |
true | Persist learnings across runs |
| Research | enabled |
false | Enable research agent |
| Research | trigger |
"on_stagnation" |
When to research |
See Configuration docs for full reference.
# Initialize a new evolution run (clears stale state)
claude-evolve init --artifact solution.py --evaluator benchmark.py
# Initialize preserving previous state
claude-evolve init --artifact solution.py --evaluator benchmark.py --keep-state
# Generate next iteration context (called by stop hook)
claude-evolve next --state-dir .claude/evolve-state
# Submit a candidate for evaluation
claude-evolve submit --candidate candidate.py --state-dir .claude/evolve-state
# Run stagnation diagnostics
claude-evolve diagnose --state-dir .claude/evolve-state
# Seed a known-good solution into the population
claude-evolve seed --artifact known_good.py --state-dir .claude/evolve-state
# Save/inspect warm cache
claude-evolve cache-put --key best_matrix --file result.npy --type numpy
claude-evolve cache --key best_matrix --state-dir .claude/evolve-state
# Cache evaluation results
claude-evolve cache-eval --n 42 --result '{"valid": true}'
# Append research findings
claude-evolve research-log --findings '{"approaches": [...]}'
# Show evolution progress
claude-evolve status --state-dir .claude/evolve-state
# Export best artifact
claude-evolve export --state-dir .claude/evolve-state --output best.py| Command | Description |
|---|---|
/evolve <artifact> <evaluator> [OPTIONS] |
Start evolution loop |
/evolve-status |
Check progress |
/cancel-evolve |
Cancel and export best |
MAP-Elites maintains a map of the best solution in each region of a feature space, producing a diverse collection of high-performing solutions. Solutions are placed on a configurable grid:
database:
feature_dimensions: ["complexity", "diversity"]
feature_bins: 10 # 10x10 = 100 cellsIslands evolve independently with periodic migration (ring topology), preventing premature convergence. Selection balances elite exploitation (70%), exploration (20%), and elite sampling (10%).
ClaudeEvolve/
├── claude_evolve/ # Python package (pip installable)
│ ├── claude_evolve/
│ │ ├── core/
│ │ │ ├── artifact.py # Artifact dataclass
│ │ │ ├── database.py # MAP-Elites + islands + novelty
│ │ │ ├── evaluator.py # Subprocess-isolated evaluation
│ │ │ ├── stagnation.py # Stagnation detection engine (v2)
│ │ │ ├── memory.py # Cross-run memory (v2)
│ │ │ ├── research.py # Research log management (v2)
│ │ │ ├── strategy.py # Strategy evolver (v2)
│ │ │ ├── warm_cache.py # Warm-start cache with LRU eviction (v3)
│ │ │ ├── novelty.py # Universal novelty system (v3)
│ │ │ ├── improvement_signal.py # Continuous G_t signal (v4, AdaEvolve)
│ │ │ ├── ucb_selector.py # UCB1 strategy selection (v4, ShinkaEvolve)
│ │ │ ├── reflection.py # Verbal gradients engine (v4, ReEvo)
│ │ │ ├── scratchpad.py # Meta-scratchpad synthesis (v4, ShinkaEvolve)
│ │ │ └── orchestrator.py # IterationOrchestrator (v4)
│ │ ├── prompt/
│ │ │ ├── context_builder.py # Per-iteration context generation
│ │ │ └── templates.py # Template management
│ │ ├── state/
│ │ │ ├── manager.py # State persistence (fresh init)
│ │ │ └── checkpoint.py # Checkpoint management
│ │ ├── cli.py # CLI (init/next/submit/diagnose/seed/cache)
│ │ └── config.py # 6 sub-configs + master config
│ └── tests/ # 1039 tests
├── plugin/ # Claude Code plugin
│ ├── hooks/stop-hook.sh # Evolution loop + stagnation
│ ├── skills/evolve/SKILL.md # Iteration protocol + problem guidance
│ ├── agents/
│ │ ├── critic.md # Adversarial evaluation
│ │ ├── researcher.md # Literature search (v2)
│ │ └── diagnostician.md # Root cause analysis (v2)
│ └── commands/ # /evolve, /evolve-status, /cancel-evolve
├── circle_packing/ # Circle packing problem (n=26)
│ ├── program.py # Seed program
│ └── evaluator.py # Evaluator (target: sum_radii >= 2.636)
├── ramsey_R5_5/ # Ramsey R(5,5) problem
│ ├── program.py # Seed program
│ └── evaluator.py # Evaluator (target: 0 mono-K_5 at n=43)
├── evolve_output/ # Best artifacts from evolution runs
│ ├── best_circle_packing_strict.py # Circle packing result (2.635983)
│ └── best_circle_packing_final.py # Final 15dp result (2.635982928557747)
├── docs/
│ ├── circle_packing_paper.md # Paper: circle packing result
│ └── verify_circle_packing.py # Independent verification script
├── install.sh
├── LICENSE # Apache 2.0
└── README.md
cd claude_evolve
python -m pytest tests/ -v # All 1039 tests
python -m pytest tests/ -q # Quick summary
python -m pytest tests/test_database.py # Specific moduleClaude Evolve is inspired by AlphaEvolve (Google DeepMind) and built upon the open-source OpenEvolve implementation. The loop mechanism is adapted from the Ralph Loop Claude Code plugin pattern.
The universal novelty system and stepping stones archive draw from research in automated discovery systems including FunSearch (best-shot prompting, stepping stones), ShinkaEvolve (code novelty rejection sampling), CodeEvolve (inspiration-based crossover), and DiscoPOP (quality-diversity for prompt optimization).
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.