GitHub - Sariel2018/forage: How can an autonomous agent know it's done — when 'done' itself must be discovered? Forage separates the agent that defines success from the agent that pursues it.

How can an autonomous agent know it's done — when "done" itself must be discovered?

Most agent frameworks assume a pre-defined goal. But the real world is full of tasks where "done" is unknown at the start:

A data collection agent told to "get all articles" doesn't know how many articles exist.
An AI scientist exploring a chemical space doesn't know how many compounds have the desired property.
A security auditor doesn't know how many vulnerabilities exist in the system.
A literature reviewer doesn't know how many relevant papers are out there.

In each case, the agent collects some results, declares victory, and stops --- often at single-digit completeness while self-reporting 100%. We call this denominator blindness: the agent confidently answers a question it was never equipped to ask --- "how much is left?"

Forage solves this with one architectural principle: separate the agent that defines success from the agent that pursues it. No human specifies the evaluation criteria. No human decides when to stop. No human intervenes at any point. The system autonomously discovers what "complete" means, verifies its own progress, and terminates when the goal is met --- or honestly reports what's missing when it isn't.

The name draws from Optimal Foraging Theory in ecology: like animals foraging in an unknown environment under energy budgets, Forage operates without knowing the total resource pool, iteratively refining its estimate of the world's boundaries.

Why This Matters

Current AI systems --- AutoML, AI-Scientist, autoresearch --- all use fixed evaluation criteria defined by humans before execution. This works when the goal is known. But when the goal space is open-ended, fixed evaluation becomes the bottleneck: the agent literally cannot know what it doesn't know.

Forage makes evaluation criteria co-evolve with the task:

The Evaluator's denominator estimate evolves across rounds: 308 → 551 → 296 → 295 (ground truth). No human intervention.

This is an epistemological question that pervades autonomous AI: how can an agent establish reliable knowledge about the boundaries of what it does not know? Data collection is our case study because completeness is quantitatively verifiable, but the principle applies to any domain where "done" is unknown at the start.

Results

In a 6-group ablation across two benchmarks:

Without independent evaluation, agents self-report 100% at 15.9% actual recall (coverage gap: +84pp).

Denominator blindness is real: self-evaluating agents average +54pp coverage gap
Architectural separation works: method isolation alone gives +25.9pp recall on the harder benchmark
Forage is calibrated: coverage gap of only -3pp (slightly conservative)
Forage is efficient: 3--4x cheaper than single-agent baselines

Full results in the paper.

How It Works

Each round:

Evaluator Agent (LLM) explores data sources, defines the denominator, writes eval.py
Planner Agent (LLM) reads coverage gaps, designs strategy, writes collect.py
Executor (deterministic) runs collect.py, then runs eval.py on the dataset
Stop? Continue if coverage insufficient; stop when converged or budget exhausted

The key constraint: the Evaluator never sees collect.py, and the Planner never sees eval.py. This method isolation prevents cognitive anchoring --- the same reason you don't let developers test their own code.

Quick Start

git clone https://github.com/Sariel2018/forage.git
cd forage
pip install -e .

# Requires Claude Code CLI: https://claude.ai/code
forage run tasks/whitehouse_trump2.yaml --knowledge knowledge/

Running Experiments

# 6-group ablation with 3 repeats
forage experiment tasks/fa_2011.yaml \
    --groups SA,M-no-eval,M-no-iso,M-co-eval,M-exp,M \
    --repeats 3 --knowledge knowledge/

Project Structure

forage/             # Core package
  agents/           #   Evaluator, Planner, Executor
  core/             #   Outer loop, spec parser
  experiments/      #   Experiment runner
tasks/              # Task specs (YAML)
knowledge/          # Experience knowledge base
scripts/            # Analysis & figure generation
tests/              # Tests

Citation

@article{xie2026forage,
  title={Forage: Solving Denominator Blindness in Autonomous Agents 
         via Co-Evolving Evaluation},
  author={Xie, Huaqing},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2026}
}

License

MIT

If you find this work useful, please consider giving it a star and citing our paper.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
forage		forage
knowledge		knowledge
scripts		scripts
tasks		tasks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How can an autonomous agent know it's done — when "done" itself must be discovered?

Why This Matters

Results

How It Works

Quick Start

Running Experiments

Project Structure

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How can an autonomous agent know it's done — when "done" itself must be discovered?

Why This Matters

Results

How It Works

Quick Start

Running Experiments

Project Structure

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages