heatcheck-demo.mov
A learning project I built while exploring agentic AI. Combines Camel-AI's OASIS for social simulation with Andrej Karpathy's autoresearch pattern to make something actually useful — paste a tweet, describe your audience, and it runs it through 100 synthetic personas before you post.
Good excuse to get function calling, the OpenAI API, and FastAPI SSE all working together in something real.
OASIS spins up synthetic social media personas from a CSV. Each agent has a name, bio, interests, and posting style. Each one independently decides whether to like, repost, reply, or do nothing.
Five archetypes:
- Lurker — rarely engages, only when something directly hits a core interest
- Active — engages with content that fits their interests, skips noise
- Curator — reposts quality content their followers would thank them for
- Debater — replies to challenge assumptions and push back on claims
- Reply guy — always in the comments, processes content through conversation
score = reposts × 3 + replies × 2 + likes × 1
A repost means someone put their name on it. That's the signal. Likes are passive approval.
After each round, a refiner (GPT-4.1) looks at the full engagement history and edits the tweet directly. Makes the smallest change it can justify, then runs the simulation again.
draft → simulate 100 agents → score → refine tweet → repeat
Max 100 agents. Max 20 iterations. Returns the best-scoring version.
frontend/ Next.js 16 + React 19
app/
page.tsx Multi-step wizard (tweet → audience → key → simulate → results)
components/
StepAudience Agent count, iterations, posting style mix
StepAPIKey OpenAI key input (never stored, sent directly to OpenAI)
StepSimulation Live SSE feed of agent reactions
StepResults Iteration history, spread curve, best tweet
backend/ FastAPI + OASIS
main.py SSE endpoint (/simulate)
personas.py GPT-4o generates synthetic personas via function calling
simulation.py OASIS orchestration, agent behavior, scoring
refiner.py GPT-4.1 tweet refinement with chain-of-thought
database.py SQLite engagement queries (OASIS writes here)
Data flow:
- User pastes tweet + describes audience
- GPT-4o generates N personas via function calling
- OASIS runs each agent — independently decides like/repost/reply/nothing
- Engagement is scored, refiner edits the tweet
- Loop repeats up to max iterations
- Frontend streams all of this live via SSE
| Component | Tool | Why |
|---|---|---|
| Agent simulation | OASIS (Camel-AI) | Multi-agent Twitter simulation with SQLite-backed state |
| Persona generation | GPT-4o | Function-calling for structured persona output |
| Tweet refinement | GPT-4.1 | Chain-of-thought reasoning across full history |
| Agent LLM | GPT-4.1-mini | Higher TPM limit (200K vs 30K) — needed for concurrent agents |
| Backend | FastAPI + SSE | Real-time streaming of agent actions to frontend |
| Frontend | Next.js + Recharts | Step wizard + live feed + score visualization |
Requirements: Python 3.12+, Node 18+, OpenAI API key
python -m venv .venv
source .venv/bin/activate
pip install -e .
uvicorn backend.main:app --reloadcd frontend
npm install
npm run devOpen http://localhost:3000.
Your OpenAI key is entered in the UI and sent directly to OpenAI. Never stored or logged.
Under $1 at max settings (100 agents × 20 iterations).
- Persona generation: ~$0.02 one-time (GPT-4o)
- Agents: ~$0.00028 per agent per iteration (GPT-4.1-mini)
- Refiner: ~$0.006 per iteration (GPT-4.1)
- 100 synthetic personas aren't real Twitter. It's a model.
- Scores are relative — a score of 10 means this version beat the others in this run, not that it'll go viral.
- If early iterations score 0, the refiner has no signal to work from and can drift.
- GPT-4.1-mini caps at 200K TPM. Drop agent count if you hit rate limit errors.
I ran the tweet about this tool through the tool. Round 3 peaked at a score of 4 — one repost, one like. 17 more iterations couldn't beat it.
The agents were not impressed.
Separate run finding: brevity beats nuance. Short, punchy takes get instant reposts. Nuanced threads trigger debater spirals — agents arguing with each other for four rounds before anyone liked the original.