A group chat where humans and AI models share the same room.
Not a chatbot. Not a tool. A council where intelligence gathers.
GPT · Llama · Gemini · Kimi · Qwen · LongCat
All in one room. All with perspectives. All synchronized.
Kōl is a collaborative AI board — a real-time group chat where multiple large‑language‑model advisors sit alongside human teammates in shared rooms. Instead of prompting a single AI as a service, you host a board meeting where each advisor contributes from its distinct expertise, reads what others have said, and builds on (or challenges) those perspectives.
Why "Kōl"? The name fuses coal (fuel for a fire) with council, reflecting a space where many minds generate heat‑driven insight.
All participants — human or AI — share a single chat thread. Each AI is a member of the group, not a service. They talk naturally, like smart friends in a group chat who happen to have different expertise. Sometimes one answers. Sometimes three jump in. Sometimes nobody says anything because a thumbs‑up doesn't need a response.
As conversations grow, a background Living Memory engine compresses history into a rolling third‑person narrative. This summary is silently injected into every future AI call, keeping the advisors grounded in the room's full history without token bloat. Kōl conversations scale infinitely without losing context.
The Gate — a fast moderator model — reads the conversation after every human message and decides which AIs (if any) should respond, and in what speaking order. It enforces strict rules like no consecutive AI-only monologues and mention-based routing (if you say @kimi, only Kimi responds). The result is a group chat that feels alive and natural, not like a round-robin of verbose responses.
The AI pipeline is a compiled state graph built with LangGraph. Every message flows through a directed acyclic graph (DAG) after it's sent.
graph TD
START((START)) --> Gate["🚦 Gate (llama-4-scout-17b)"]
Gate -->|"should_respond: true"| Models["🎙️ Board Models (Sequential)"]
Gate -->|"should_respond: false"| SumCheck{Messages > 20?}
Models --> SumCheck
SumCheck -->|Yes| Summarizer["🧠 Summarizer (llama-3.1-8b)"]
SumCheck -->|No| END((END))
Summarizer --> END
The Gate is meta-llama/llama-4-scout-17b-16e-instruct running on Groq at near‑zero latency with temperature: 0 for deterministic decisions. It receives the full conversation history and the list of AI models present in the room, then produces a structured JSON decision:
// GateDecision Schema (Zod)
{
should_respond: boolean, // Whether any AI should respond at all
reason: string, // Short explanation (e.g. "User asked @gpt directly")
responding_models: string[] // Ordered list (e.g. ["kimi", "llama"])
}Hard Rules the Gate enforces:
- Only select models that are actually in the room — never hallucinate.
- If the last 3 messages are all from AIs →
should_respond: false. Let humans breathe. - STRIKE MENTION RULE: If a user says
@gptorhey kimi, only that model responds. - The model that spoke immediately before cannot be selected again in the next round.
- EXTREME SELECTIVITY: Default to exactly ONE model unless the topic explicitly requires diverse perspectives.
- REDUNDANCY BAN: Never pick two models that would give near-identical answers.
Speaking order matters: When multiple models are selected, they are ordered so the strongest domain expert speaks first (sets the foundation), complementary perspectives build on it, and the synthesizer/devil's advocate speaks last.
When the Gate approves a response, control passes to the model execution layer. Models run sequentially — each model's response is added to the conversation before the next model runs, so they can genuinely react to each other like real board members.
Every model has a distinct identity, API configuration, and personality:
| Model ID | LLM | Provider | Personality |
|---|---|---|---|
gpt |
openai/gpt-oss-120b |
Groq | The deep thinker — structured reasoning, STEM, logical clarity |
llama |
llama-3.3-70b-versatile |
Groq | The conversationalist — warm, plain language, creative angles |
kimi |
moonshotai/kimi-k2-instruct-0905 |
Groq | The engineer — systems thinking, code, implementation details |
qwen |
qwen/qwen3-32b |
Groq | The critic — devil's advocate, stress‑tests ideas |
gemini |
gemini-2.5-flash |
Google AI | The generalist — broad knowledge, long‑context synthesis |
longcat |
LongCat-Flash-Chat |
LongCat | The actioner — multi-step plans, practical execution |
Agentic Tool Access: Each model has two tools available during reasoning:
tavily_search_results_json— real‑time web search via Tavily API.read_url— full content extraction from any URL via Jina AI.
Models decide autonomously when to use tools. Tool results are fed back into the model before it composes its final reply.
Response cleaning pipeline: After generation, responses are stripped of <think> tags, <tool_call> blocks, AI‑prefix labels (e.g., [AI: gpt]:) and partial multi-speaker contamination.
Redundancy gate: If a model produces an empty string after cleaning, it is silently skipped — no error is shown to the user.
Runs conditionally: triggered when the room has more than 20 unsummarized messages. Uses llama-3.1-8b-instant on Groq at temperature: 0 for consistent, factual compression.
The summarizer takes the first 20 unsummarized messages plus any existing summary and produces a merged, third‑person narrative that:
- Captures key topics, conclusions, decisions, and disagreements.
- Attributes opinions to specific speakers.
- Notes unresolved questions the group plans to revisit.
- Ignores social filler ("thanks", "got it", greetings).
- Stays under 400 words — ruthlessly concise.
This narrative is stored on the Room document as room.memory and injected into every future Gate and Model call. Summarized messages are flagged isSummarized: true so they are excluded from the rolling context window.
Every action in a Kōl room flows through a persistent WebSocket connection. The server lives in socket.ts.
All socket connections are authenticated before the connection event fires. The middleware reads the JWT from either socket.handshake.auth.token or the cookie header (to support httpOnly browser cookies), verifies it, and attaches userId and username to the socket instance. Unauthenticated sockets are immediately rejected.
| Event | Direction | Payload | Description |
|---|---|---|---|
join_room |
Client → Server | { roomId } |
Verify membership, join Socket.io room, notify others. |
leave_room |
Client → Server | { roomId } |
Leave socket room, notify others. |
send_message |
Client → Server | { roomId, content } |
Save human message → broadcast → trigger AI pipeline. |
typing_start |
Client ↔ Server | { roomId } / { userId, username, type } |
Human started typing. |
typing_stop |
Client ↔ Server | { roomId } / { userId, username, type } |
Human stopped typing. |
receive_message |
Server → Client | { _id, roomId, senderName, senderType, modelId?, content, createdAt } |
New message (human or AI). |
ai_thinking |
Server → Client | { models: string[], status: "thinking" | "idle" } |
AI pipeline is processing. |
user_joined |
Server → Client | { userId, username } |
A member joined the socket room. |
user_left |
Server → Client | { userId, username } |
A member left the socket room. |
- Human message is persisted to MongoDB (Daily Bucket pattern).
- Human message is broadcast to the room via
receive_message. ai_thinkingis emitted withstatus: "thinking".- The LangGraph pipeline is invoked with the room's context.
ai_thinkingis emitted withstatus: "idle".- For each AI response:
typing_startemitted withtype: "ai".- Realistic typing delay:
min(1500ms, 400ms + content.length × 3ms). typing_stopemitted.- AI message persisted to MongoDB.
receive_messageemitted with the AI's response.- 300ms pause between consecutive AI responses for natural pacing.
- If the summarizer ran,
room.memoryis updated and processed messages are flagged.
kol/
├── client/ # Next.js 16 (App Router)
│ ├── app/
│ │ ├── page.tsx # Public landing page (GSAP animations)
│ │ ├── layout.tsx # Root layout with global fonts
│ │ ├── globals.css # Tailwind base + custom CSS
│ │ ├── login/page.tsx # Login with httpOnly cookie auth
│ │ ├── signup/page.tsx # Signup with username validation
│ │ ├── invite/[code]/page.tsx # Invite landing & auto-join flow
│ │ ├── how-it-works/page.tsx # Technical walkthrough (GSAP)
│ │ ├── about/page.tsx # Mission & team
│ │ ├── privacy-policy/page.tsx
│ │ ├── terms/page.tsx
│ │ └── me/ # Authenticated dashboard
│ │ ├── page.tsx # Room list & creation
│ │ ├── friends/page.tsx # Friends list & search
│ │ ├── settings/page.tsx # User preferences
│ │ └── room/[id]/page.tsx # Core real-time chat experience
│ │
│ ├── components/
│ │ ├── Navbar.tsx # Top navigation bar
│ │ ├── Sidebar.tsx # Persistent global sidebar
│ │ ├── RoomCard.tsx # Room preview card
│ │ ├── CreateRoomModal.tsx # Room creation form (multi-step)
│ │ ├── RoomSettingsModal.tsx # Owner governance (invite, remove, delete)
│ │ ├── NotificationModal.tsx # Global notification system
│ │ └── Footer.tsx
│ │
│ ├── hooks/
│ │ ├── useSocket.ts # Socket.io connection & event management
│ │ └── useAuth.ts # Auth state & cookie management
│ │
│ ├── data/
│ │ └── AI_MODELS.ts # Model metadata: id, name, color, icon
│ │
│ └── proxy.ts # Next.js middleware: auth route protection
│
├── server/ # Express 5 + Bun Runtime
│ ├── server.ts # HTTP server + Socket.io bootstrap
│ ├── socket.ts # Socket event handlers & AI pipeline trigger
│ └── src/
│ ├── app.ts # Express setup & global middleware (CORS, cookies)
│ ├── agents/ # 🧠 LangGraph AI Orchestration
│ │ ├── index.ts # Compiled state graph (Gate → Models → Summarizer)
│ │ ├── nodes/
│ │ │ ├── gate.ts # Moderator — routing decision logic
│ │ │ ├── models.ts # Board — sequential multi-model execution
│ │ │ └── summarizer.ts # Memory — rolling conversation compression
│ │ └── tools/
│ │ ├── searchTool.ts # Tavily web search integration
│ │ └── urlTool.ts # Jina AI URL reader integration
│ ├── controllers/
│ │ ├── user.controller.ts # Auth (register, login, me)
│ │ ├── room.controller.ts # Room CRUD, invites, member governance
│ │ └── friend.controller.ts # Friends list, search, add
│ ├── middlewares/
│ │ └── auth.middleware.ts # JWT verification for REST routes
│ ├── models/
│ │ ├── user.model.ts # User (name, email, username, friends, online)
│ │ ├── room.model.ts # Room (owner, members, aiMembers, memory, messageCount)
│ │ ├── message.model.ts # DailyChat bucket (roomId, date, messages[])
│ │ └── invite.model.ts # Invite (code, roomId, 7-day TTL)
│ ├── routes/
│ │ ├── user.route.ts # POST /auth/register, /auth/login, GET /auth/me
│ │ ├── room.route.ts # CRUD + invite + governance endpoints
│ │ └── friend.route.ts # GET /friends/list, /friends/search, POST /friends/add
│ └── lib/
│ └── db.ts # MongoDB connection utility
│
└── README.md
- Auth: Premium dark-theme login/signup with username format validation (
/^[a-z0-9_]{3,20}$/), 401 redirect guard, and httpOnly cookie sessions. - Middleware: Next.js
proxy.tsprotects all/me/*routes and redirects authenticated users away from auth pages. - Dashboard: Multi-page layout with persistent global sidebar, room list, and
CreateRoomModal. - Chat Room (
/me/room/[id]): Real-time chat with infinite scroll, AI roster badges, "thinking" indicator, per-model typing animations, and message persistence. - Governance (
RoomSettingsModal): Owner-only panel for adding AI advisors, generating invite links, removing human members, and deleting the room. - Social (
/me/friends): Search users by name or username, add friends (reciprocal), view online status. - Invite Flow (
/invite/[code]): Smart landing page — unauthenticated users are redirected to signup with the code preserved in state; authenticated users are auto-joined to the room.
- Authentication: JWT signed tokens stored as httpOnly cookies (7-day expiry). Passwords hashed with bcrypt (salt rounds: 10). Generic "Invalid Email or Password" error prevents user enumeration.
- Database: MongoDB with four Mongoose schemas. Messages use a Daily Bucket pattern (
DailyChatdocuments keyed byroomId + date) for efficient pagination and aggregation. - Real-time: Socket.io server with JWT handshake auth (reads from
auth.tokenorcookieheader), room membership verification onjoin_room, and realistic AI typing delays. - AI Pipeline: Full LangGraph state machine with conditional edges, automatic summarization at >20 messages, tool-equipped model nodes, and response sanitization.
- Governance: Owner-only invite code generation (cryptographically random 12-char hex), invite join with 7-day TTL, member removal, room deletion, and dynamic AI board modification.
- Social: Reciprocal friend addition (one call adds both users to each other's friend list).
| Phase | Status | Milestones |
|---|---|---|
| Phase 1 – Foundation | ✅ Completed | Next.js + Express scaffold, auth UI & API, MongoDB wiring |
| Phase 2 – The Brain | ✅ Completed | LangGraph state graph, Gate moderator, 6 models across 3 providers |
| Phase 3 – Real-time Chat | ✅ Completed | Socket.io integration, typing & AI-thinking indicators, daily-bucket persistence |
| Phase 4 – Social & Governance | ✅ Completed | Invite system, friend network, room settings, member removal |
| Phase 5 – Tool Layer | ✅ Completed | Tavily web search, Jina AI URL reader, agentic tool loop |
| Phase 6 – Scale | 🟣 Planned | Credit/usage tracking, mobile client (React Native), room memory viewer, @mention detection UI |
- Bun (recommended) or Node.js ≥ 18
- MongoDB — local instance or MongoDB Atlas cluster
- API keys for Groq, LongCat, Gemini, and Tavily (see env section below)
# Server
PORT=8080
MONGODB_URI=mongodb://localhost:27017/kol
JWT_SECRET=your_super_secret_string_min_32_chars
FRONTEND_URL=http://localhost:3000
# LLM Providers
GROQ_API_KEY=your_groq_api_key # Gate + GPT + Llama + Kimi + Qwen
LONGCAT_API_KEY=your_longcat_api_key # LongCat model
GEMINI_API_KEY=your_gemini_api_key # Gemini 2.5 Flash
# Agentic Tools
TAVILY_API_KEY=your_tavily_api_key # Web searchNEXT_PUBLIC_BACKEND_URL=http://localhost:8080
NEXT_PUBLIC_FRONTEND_URL=http://localhost:3000# Clone the repository
git clone <repo-url>
cd kol
# Install server dependencies
cd server && bun install
# Install client dependencies
cd ../client && bun installOpen two separate terminals:
# Terminal 1 — Backend (Express + Socket.io)
cd server && bun run dev
# Terminal 2 — Frontend (Next.js)
cd client && bun run dev- Frontend will be available at
http://localhost:3000 - Backend API will be available at
http://localhost:4000
| Key | Service | Link |
|---|---|---|
GROQ_API_KEY |
Groq Cloud (free tier available) | console.groq.com |
GEMINI_API_KEY |
Google AI Studio (free tier available) | aistudio.google.com |
LONGCAT_API_KEY |
LongCat AI | longcat.chat |
TAVILY_API_KEY |
Tavily Search (free tier available) | tavily.com |
- Passwords are hashed with bcrypt (10 salt rounds) before storage.
- JWT tokens are issued with a 7-day expiry and stored in httpOnly cookies, making them inaccessible to JavaScript.
- CORS is configured with
credentials: trueand a strictoriginwhitelist (FRONTEND_URL). - Login errors return a generic message ("Invalid Email or Password") regardless of whether the email or password was wrong — preventing user enumeration attacks.
- Socket.io connections require a valid JWT before any event is processed.
- All protected REST routes use
authMiddlewareto verify the JWT from cookies.
Kōl — Where intelligence gathers.
Built with obsession. Designed for conversation.