Skip to content

JovannyEspinal/heatcheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heatcheck

heatcheck-demo.mov

A learning project I built while exploring agentic AI. Combines Camel-AI's OASIS for social simulation with Andrej Karpathy's autoresearch pattern to make something actually useful — paste a tweet, describe your audience, and it runs it through 100 synthetic personas before you post.

Good excuse to get function calling, the OpenAI API, and FastAPI SSE all working together in something real.


How it works

The simulation (OASIS)

OASIS spins up synthetic social media personas from a CSV. Each agent has a name, bio, interests, and posting style. Each one independently decides whether to like, repost, reply, or do nothing.

Five archetypes:

  • Lurker — rarely engages, only when something directly hits a core interest
  • Active — engages with content that fits their interests, skips noise
  • Curator — reposts quality content their followers would thank them for
  • Debater — replies to challenge assumptions and push back on claims
  • Reply guy — always in the comments, processes content through conversation

The scoring

score = reposts × 3 + replies × 2 + likes × 1

A repost means someone put their name on it. That's the signal. Likes are passive approval.

The refinement loop (autoresearch)

After each round, a refiner (GPT-4.1) looks at the full engagement history and edits the tweet directly. Makes the smallest change it can justify, then runs the simulation again.

draft → simulate 100 agents → score → refine tweet → repeat

Max 100 agents. Max 20 iterations. Returns the best-scoring version.


Architecture

frontend/          Next.js 16 + React 19
  app/
    page.tsx       Multi-step wizard (tweet → audience → key → simulate → results)
  components/
    StepAudience   Agent count, iterations, posting style mix
    StepAPIKey     OpenAI key input (never stored, sent directly to OpenAI)
    StepSimulation Live SSE feed of agent reactions
    StepResults    Iteration history, spread curve, best tweet

backend/           FastAPI + OASIS
  main.py          SSE endpoint (/simulate)
  personas.py      GPT-4o generates synthetic personas via function calling
  simulation.py    OASIS orchestration, agent behavior, scoring
  refiner.py       GPT-4.1 tweet refinement with chain-of-thought
  database.py      SQLite engagement queries (OASIS writes here)

Data flow:

  1. User pastes tweet + describes audience
  2. GPT-4o generates N personas via function calling
  3. OASIS runs each agent — independently decides like/repost/reply/nothing
  4. Engagement is scored, refiner edits the tweet
  5. Loop repeats up to max iterations
  6. Frontend streams all of this live via SSE

Stack

Component Tool Why
Agent simulation OASIS (Camel-AI) Multi-agent Twitter simulation with SQLite-backed state
Persona generation GPT-4o Function-calling for structured persona output
Tweet refinement GPT-4.1 Chain-of-thought reasoning across full history
Agent LLM GPT-4.1-mini Higher TPM limit (200K vs 30K) — needed for concurrent agents
Backend FastAPI + SSE Real-time streaming of agent actions to frontend
Frontend Next.js + Recharts Step wizard + live feed + score visualization

Setup

Requirements: Python 3.12+, Node 18+, OpenAI API key

Backend

python -m venv .venv
source .venv/bin/activate
pip install -e .

uvicorn backend.main:app --reload

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:3000.

Your OpenAI key is entered in the UI and sent directly to OpenAI. Never stored or logged.


Cost

Under $1 at max settings (100 agents × 20 iterations).

  • Persona generation: ~$0.02 one-time (GPT-4o)
  • Agents: ~$0.00028 per agent per iteration (GPT-4.1-mini)
  • Refiner: ~$0.006 per iteration (GPT-4.1)

Limitations

  • 100 synthetic personas aren't real Twitter. It's a model.
  • Scores are relative — a score of 10 means this version beat the others in this run, not that it'll go viral.
  • If early iterations score 0, the refiner has no signal to work from and can drift.
  • GPT-4.1-mini caps at 200K TPM. Drop agent count if you hit rate limit errors.

Results

I ran the tweet about this tool through the tool. Round 3 peaked at a score of 4 — one repost, one like. 17 more iterations couldn't beat it.

The agents were not impressed.

Separate run finding: brevity beats nuance. Short, punchy takes get instant reposts. Nuanced threads trigger debater spirals — agents arguing with each other for four rounds before anyone liked the original.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors