LLM-guided optimization for text artifacts using an iterative propose-evaluate-reflect loop with a bring-your-own evaluator.
# 1) Install
curl -fsSL https://raw.githubusercontent.com/ASRagab/optimize-anything/main/install.sh | bash
# 2) Create a seed artifact
echo "Write a concise support prompt" > seed.txt
# 3) Generate a starter evaluator (default: judge/python template)
optimize-anything generate-evaluator seed.txt \
--objective "Score clarity, actionability, and specificity" \
> eval.py
# 4) Optimize
optimize-anything optimize seed.txt \
--judge-model openai/gpt-4o-mini \
--objective "Improve clarity and specificity" \
--model openai/gpt-4o-mini \
--budget 20 \
--parallel --workers 4 \
--cache \
--run-dir runs \
--output result.txtCLI stdout returns a JSON summary — see Result Contract for the full shape.
optimize-anything runs a GEPA (Guided Evolutionary Prompt Algorithm) loop: propose → evaluate → reflect, repeating until budget is exhausted or early stopping kicks in.
seed.txt ──► [Propose] ──► candidates
▲ │
│ [Evaluate]
[Reflect] ◄──── scores + diagnostics
- Propose — The optimizer generates candidate artifacts from your seed (or from scratch in seedless mode).
- Evaluate — Each candidate is scored by your evaluator. Three evaluator types are supported: a command evaluator (any executable that reads JSON on stdin and writes a score on stdout), an HTTP evaluator (a service that accepts POST requests), or a built-in LLM judge (no evaluator script required — just pass
--judge-model). - Reflect — Scores and diagnostics feed back into the next proposal round. The loop continues, progressively improving the artifact toward your objective.
The evaluator is the only thing you bring. Everything else — proposal strategy, reflection, early stopping, caching, parallelism — is handled by the optimizer.
Use --dataset for multi-task optimization (one evaluator call per example). Add --valset for generalization validation.
optimize-anything optimize prompt.txt \
--judge-model openai/gpt-4o-mini \
--objective "Generalize across customer request types" \
--dataset data/train.jsonl \
--valset data/val.jsonl \
--model openai/gpt-4o-mini \
--budget 120 --parallel --workers 6 --cache --run-dir runsCross-check one artifact with multiple judge providers:
optimize-anything validate result.txt \
--providers openai/gpt-4o-mini anthropic/claude-sonnet-4-5 google/gemini-2.0-flash \
--objective "Score clarity, constraints, and robustness" \
--intake-file intake.jsonNo seed file required; GEPA bootstraps from objective.
optimize-anything optimize --no-seed \
--objective "Draft a concise, testable API prompt" \
--model openai/gpt-4o-mini \
--judge-model openai/gpt-4o-mini--no-seed requires both --objective and --model.
- Early stop is auto-enabled when
--budget > 30(or force with--early-stop) - Reuse prior evaluator cache with
--cache-from(requires--cache+--run-dir)
optimize-anything optimize seed.txt \
--evaluator-command bash eval.sh \
--model openai/gpt-4o-mini \
--budget 150 \
--cache --cache-from runs/run-20260303-120000 \
--run-dir runs \
--early-stop --early-stop-window 12 --early-stop-threshold 0.003For command/HTTP evaluators:
--score-range unit(default): enforce score in[0, 1]--score-range any: allow any finite float
optimize-anything optimize seed.txt \
--evaluator-command bash eval.sh \
--model openai/gpt-4o-mini \
--score-range anyoptimizegenerate-evaluatorintakeexplainbudgetscoreanalyzevalidate
optimize-anything is also a Claude Code plugin with guided slash commands and skills.
# In Claude Code
/plugin install ASRagab/optimize-anythingOr clone and install locally:
git clone https://github.com/ASRagab/optimize-anything.git
cd optimize-anything
/plugin install .| Command | Description |
|---|---|
/optimize-anything:optimize |
Guided optimization workflow — walks you through mode selection, evaluator setup, execution, and results |
/optimize-anything:quick |
Zero-config one-shot optimization. Just provide a file and objective. |
/optimize-anything:analyze |
Discover quality dimensions for an artifact and objective |
/optimize-anything:score |
Score a single artifact without optimization |
/optimize-anything:validate |
Cross-validate with multiple LLM judges |
/optimize-anything:compare |
Side-by-side comparison of two artifacts |
/optimize-anything:budget |
Get a budget recommendation for your artifact |
/optimize-anything:explain |
Preview the optimization plan without running it |
/optimize-anything:intake |
Normalize and validate an intake specification |
The plugin includes three skills that Claude Code can invoke automatically:
- optimization-guide — Full workflow walkthrough covering modes, configuration, budget, and result interpretation
- generate-evaluator — Choose the right evaluator pattern (judge, command, composite) and generate a script
- evaluator-patterns — Library of ready-to-use evaluator templates for prompts, code, docs, and agent instructions
/optimize-anything:analyze prompt.txt --objective "Score clarity"
→ discovers quality dimensions
/optimize-anything:quick prompt.txt "improve clarity and specificity"
→ runs analyze + optimize with sensible defaults, shows diff
/optimize-anything:validate result.txt --providers openai/gpt-4o anthropic/claude-sonnet-4-5
→ cross-checks the result with multiple judges
CLI stdout returns a JSON summary with these fields:
best_artifacttotal_metric_callsscore_summary(initial,latest,best, deltas,num_candidates)top_diagnostics(list of{name, value})plateau_detected,plateau_guidance- optional
evaluator_failure_signal - optional
early_stopped,stopped_at_iteration
Evaluator input payload (stdin JSON for command mode, POST JSON for HTTP mode):
{"_protocol_version": 2, "candidate": "...", "example": {...}, "task_model": "..."}candidateis required_protocol_version,example, andtask_modelare optional/additive- legacy evaluators that only read
candidateremain compatible
Evaluator output payload:
{"score": 0.75, "notes": "optional diagnostics"}scoreis required- additional keys are treated as side-info
Intake is optional structured guidance you pass to the optimizer to shape how evaluation works. Instead of relying solely on an --objective string, intake lets you declare quality dimensions, hard constraints, evaluation patterns, and execution preferences — giving you finer control over what "better" means for your artifact.
Use intake when your evaluation criteria are multi-dimensional, when you need to enforce hard constraints, or when you want consistent evaluation behavior across runs.
Pass it inline or from a file:
# Inline
optimize-anything optimize seed.txt \
--intake-json '{"quality_dimensions": ["clarity", "specificity"], "hard_constraints": ["max 100 words"]}' \
--judge-model openai/gpt-4o-mini
# From file
optimize-anything optimize seed.txt \
--intake-file intake.json \
--judge-model openai/gpt-4o-minioptimize-anything intake normalizes and validates these keys:
artifact_classquality_dimensionshard_constraintsevaluation_patternexecution_modeevaluator_cwd
Exactly one evaluator source is required: --evaluator-command OR --evaluator-url OR --judge-model.
| Flag | Description | Default |
|---|---|---|
--no-seed |
Run without seed file; bootstrap from objective | false |
--evaluator-command <cmd...> |
Command evaluator (stdin/stdout JSON) | -- |
--evaluator-url <url> |
HTTP evaluator endpoint | -- |
--intake-json <json> |
Inline intake spec | -- |
--intake-file <path> |
Intake spec file | -- |
--evaluator-cwd <path> |
Working dir for command evaluator | -- |
--objective <text> |
Optimization objective | -- |
--background <text> |
Extra domain context | -- |
--dataset <train.jsonl> |
Training dataset JSONL | -- |
--valset <val.jsonl> |
Validation dataset JSONL (requires --dataset) |
-- |
--budget <int> |
Max evaluator calls | 100 |
--output, -o <file> |
Write best artifact to file | -- |
--model <model> |
Proposer model (or env fallback) | OPTIMIZE_ANYTHING_MODEL |
--judge-model <model> |
Built-in LLM judge evaluator model | -- |
--judge-objective <text> |
Judge objective override | falls back to --objective |
--api-base <url> |
Override LiteLLM API base | -- |
--diff |
Print unified diff (seed vs best) to stderr | false |
--run-dir <path> |
Save run artifacts in timestamped run dir | -- |
--parallel |
Enable parallel evaluator calls | false |
--workers <int> |
Max workers for parallel evaluation | -- |
--cache |
Enable evaluator cache | false |
--cache-from <run-dir> |
Copy prior fitness_cache into new run |
-- |
--early-stop |
Enable plateau early stop | auto on when budget > 30 |
--early-stop-window <int> |
Plateau window size | 10 |
--early-stop-threshold <float> |
Min improvement required over window | 0.005 |
--spec-file <path> |
Load TOML spec defaults | -- |
--task-model <model> |
Optional metadata forwarded to evaluators | -- |
| `--score-range unit | any` | Score validation mode for cmd/http |