Heatcheck

heatcheck-demo.mov

A learning project I built while exploring agentic AI. Combines Camel-AI's OASIS for social simulation with Andrej Karpathy's autoresearch pattern to make something actually useful — paste a tweet, describe your audience, and it runs it through 100 synthetic personas before you post.

Good excuse to get function calling, the OpenAI API, and FastAPI SSE all working together in something real.

How it works

The simulation (OASIS)

OASIS spins up synthetic social media personas from a CSV. Each agent has a name, bio, interests, and posting style. Each one independently decides whether to like, repost, reply, or do nothing.

Five archetypes:

Lurker — rarely engages, only when something directly hits a core interest
Active — engages with content that fits their interests, skips noise
Curator — reposts quality content their followers would thank them for
Debater — replies to challenge assumptions and push back on claims
Reply guy — always in the comments, processes content through conversation

The scoring

score = reposts × 3 + replies × 2 + likes × 1

A repost means someone put their name on it. That's the signal. Likes are passive approval.

The refinement loop (autoresearch)

After each round, a refiner (GPT-4.1) looks at the full engagement history and edits the tweet directly. Makes the smallest change it can justify, then runs the simulation again.

draft → simulate 100 agents → score → refine tweet → repeat

Max 100 agents. Max 20 iterations. Returns the best-scoring version.

Architecture

frontend/          Next.js 16 + React 19
  app/
    page.tsx       Multi-step wizard (tweet → audience → key → simulate → results)
  components/
    StepAudience   Agent count, iterations, posting style mix
    StepAPIKey     OpenAI key input (never stored, sent directly to OpenAI)
    StepSimulation Live SSE feed of agent reactions
    StepResults    Iteration history, spread curve, best tweet

backend/           FastAPI + OASIS
  main.py          SSE endpoint (/simulate)
  personas.py      GPT-4o generates synthetic personas via function calling
  simulation.py    OASIS orchestration, agent behavior, scoring
  refiner.py       GPT-4.1 tweet refinement with chain-of-thought
  database.py      SQLite engagement queries (OASIS writes here)

Data flow:

User pastes tweet + describes audience
GPT-4o generates N personas via function calling
OASIS runs each agent — independently decides like/repost/reply/nothing
Engagement is scored, refiner edits the tweet
Loop repeats up to max iterations
Frontend streams all of this live via SSE

Stack

Component	Tool	Why
Agent simulation	OASIS (Camel-AI)	Multi-agent Twitter simulation with SQLite-backed state
Persona generation	GPT-4o	Function-calling for structured persona output
Tweet refinement	GPT-4.1	Chain-of-thought reasoning across full history
Agent LLM	GPT-4.1-mini	Higher TPM limit (200K vs 30K) — needed for concurrent agents
Backend	FastAPI + SSE	Real-time streaming of agent actions to frontend
Frontend	Next.js + Recharts	Step wizard + live feed + score visualization

Setup

Requirements: Python 3.12+, Node 18+, OpenAI API key

Backend

python -m venv .venv
source .venv/bin/activate
pip install -e .

uvicorn backend.main:app --reload

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:3000.

Your OpenAI key is entered in the UI and sent directly to OpenAI. Never stored or logged.

Cost

Under $1 at max settings (100 agents × 20 iterations).

Persona generation: ~$0.02 one-time (GPT-4o)
Agents: ~$0.00028 per agent per iteration (GPT-4.1-mini)
Refiner: ~$0.006 per iteration (GPT-4.1)

Limitations

100 synthetic personas aren't real Twitter. It's a model.
Scores are relative — a score of 10 means this version beat the others in this run, not that it'll go viral.
If early iterations score 0, the refiner has no signal to work from and can drift.
GPT-4.1-mini caps at 200K TPM. Drop agent count if you hit rate limit errors.

Results

I ran the tweet about this tool through the tool. Round 3 peaked at a score of 4 — one repost, one like. 17 more iterations couldn't beat it.

The agents were not impressed.

Separate run finding: brevity beats nuance. Short, punchy takes get instant reposts. Nuanced threads trigger debater spirals — agents arguing with each other for four rounds before anyone liked the original.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heatcheck

How it works

The simulation (OASIS)

The scoring

The refinement loop (autoresearch)

Architecture

Stack

Setup

Backend

Frontend

Cost

Limitations

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heatcheck

How it works

The simulation (OASIS)

The scoring

The refinement loop (autoresearch)

Architecture

Stack

Setup

Backend

Frontend

Cost

Limitations

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages