A high-performance, asynchronous Python backend for real-time AI conversations with function calling, streaming responses, and persistent storage.
- Real-time WebSocket Communication: Bi-directional streaming with low latency
- Modular Tool System: Extensible plugin architecture with auto-discovery (easily scale to 100+ tools)
- LLM Function Calling: AI can execute tools dynamically (weather, database search, email, etc.)
- Token-by-Token Streaming: Response tokens stream immediately to client
- Persistent Storage: All events logged to Supabase PostgreSQL in real-time
- Post-Session Automation: AI-generated summaries after conversation ends
- Dual LLM Support: Works with local Ollama or cloud Groq models
- Architecture
- Prerequisites
- Installation
- Database Setup
- Configuration
- Running the Application
- Testing
- API Documentation
- Design Decisions
- Project Structure
┌─────────────┐ WebSocket ┌──────────────────┐
│ Client │◄────────────────────►│ FastAPI Server │
│ (Browser) │ (bidirectional) │ (async/await) │
└─────────────┘ └────────┬─────────┘
│
┌─────────┴─────────┐
│ │
┌─────▼──────┐ ┌─────▼─────┐
│ LLM API │ │ Supabase │
│ (streaming)│ │ (Postgres)│
└────────────┘ └───────────┘
Key Components:
- FastAPI: Async web framework with WebSocket support
- Supabase: PostgreSQL database with real-time capabilities
- Groq/Ollama: LLM providers with streaming and function calling
- Session Manager: In-memory state management for active sessions
- Python 3.11+
- Supabase account (https://supabase.com)
- Groq API key (https://console.groq.com) - Free tier available
- OR Ollama installed locally (https://ollama.com)
git clone https://github.com/allwin107/Realtime-AI-Backend.git
cd realtime-ai-backendpython -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activatepip install -r requirements.txt- Go to https://supabase.com
- Create new project
- Wait for database provisioning (~2 minutes)
Go to SQL Editor in Supabase dashboard and execute schema.sql:
-- Enable UUID extension
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Create sessions table
CREATE TABLE sessions (
session_id VARCHAR(255) PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
start_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
end_time TIMESTAMP WITH TIME ZONE,
duration_seconds INTEGER,
summary TEXT,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Create events table
CREATE TABLE events (
event_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
session_id VARCHAR(255) NOT NULL REFERENCES sessions(session_id) ON DELETE CASCADE,
timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
event_type VARCHAR(50) NOT NULL,
content TEXT,
metadata JSONB,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Create indexes
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
CREATE INDEX idx_sessions_start_time ON sessions(start_time DESC);
CREATE INDEX idx_events_session_id ON events(session_id);
CREATE INDEX idx_events_timestamp ON events(timestamp);
CREATE INDEX idx_events_type ON events(event_type);
-- Enable RLS
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
-- Create RLS policies
CREATE POLICY "Allow service role to insert sessions"
ON sessions FOR INSERT TO anon, authenticated WITH CHECK (true);
CREATE POLICY "Allow service role to select sessions"
ON sessions FOR SELECT TO anon, authenticated USING (true);
CREATE POLICY "Allow service role to update sessions"
ON sessions FOR UPDATE TO anon, authenticated USING (true) WITH CHECK (true);
CREATE POLICY "Allow service role to delete sessions"
ON sessions FOR DELETE TO anon, authenticated USING (true);
CREATE POLICY "Allow service role to insert events"
ON events FOR INSERT TO anon, authenticated WITH CHECK (true);
CREATE POLICY "Allow service role to select events"
ON events FOR SELECT TO anon, authenticated USING (true);
CREATE POLICY "Allow service role to update events"
ON events FOR UPDATE TO anon, authenticated USING (true) WITH CHECK (true);
CREATE POLICY "Allow service role to delete events"
ON events FOR DELETE TO anon, authenticated USING (true);In Supabase Dashboard:
- Go to Project Settings → API
- Copy Project URL (e.g.,
https://xxxxx.supabase.co) - Copy anon public key
Copy .env.example to .env:
cp .env.example .envEdit .env:
# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-anon-key-here
# LLM Configuration
LLM_PROVIDER=groq # Options: "ollama" or "groq"
# Groq Settings (if using Groq)
GROQ_API_KEY=your-groq-api-key
GROQ_MODEL=llama-3.3-70b-versatile
# Ollama Settings (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
# Application Settings
HOST=0.0.0.0
PORT=8000
DEBUG=True
ENVIRONMENT=developmentpython -m app.mainExpected output:
Configuration loaded successfully!
LLM Provider: groq
Environment: development
Port: 8000
Database client initialized
LLM Service initialized: Groq (llama-3.3-70b-versatile)
Session Manager initialized
==================================================
REALTIME AI BACKEND STARTING
==================================================
Host: 0.0.0.0
Port: 8000
LLM: groq
Environment: development
==================================================
INFO: Uvicorn running on http://0.0.0.0:8000
Step 1: Start the Server (if not already running)
python -m app.mainStep 2: Open the Frontend
- Navigate to your project directory
- Double-click
frontend.htmlOR - Right-click
frontend.html→ Open with → Your browser (Chrome, Firefox, Edge) - The file will open at:
file:///path/to/realtime-ai-backend/frontend.html
Step 3: Connect and Test
- In the frontend UI, click Connect button (uses default session ID)
- Send test messages:
- "What's the weather in Tokyo?"
- "What's the weather in London?"
- "Search our database for Python tutorials"
- Watch the AI respond in real-time with tool execution!
Note: Tool calls are intentionally displayed in the frontend to demonstrate the complex interaction requirement. The green boxes show when the AI calls functions, their arguments, and results.
To disable tool call display: If you prefer a cleaner UI without tool call boxes, comment out lines 393-397 in frontend.html:
// case 'tool_call':
// typingIndicator.classList.remove('active');
// addToolCallMessage(data);
// typingIndicator.classList.add('active');
// break;This will hide the tool execution details while still allowing the AI to use tools behind the scenes.
- Create new WebSocket Request
- Connect to:
ws://localhost:8000/ws/session/test-001?user_id=testuser - Send text messages
- Observe streaming tokens and tool calls
import asyncio
import websockets
import json
async def test_websocket():
uri = "ws://localhost:8000/ws/session/test-001?user_id=testuser"
async with websockets.connect(uri) as websocket:
# Send message
await websocket.send("What's the weather in London?")
# Receive responses
async for message in websocket:
data = json.loads(message)
print(f"Received: {data}")
asyncio.run(test_websocket())After testing:
- Go to Table Editor → sessions
- Check for your session with status "completed"
- View the AI-generated summary
- Go to events table to see chronological log
URL: ws://localhost:8000/ws/session/{session_id}?user_id={user_id}
Parameters:
session_id(path): Unique session identifieruser_id(query): User identifier
Message Format:
Client → Server:
Plain text message
Server → Client:
// System message
{"type": "system", "content": "Connected to session: xyz"}
// Streaming start
{"type": "start", "content": ""}
// Token streaming
{"type": "token", "content": "Hello"}
// Tool call
{
"type": "tool_call",
"tool_name": "get_weather",
"arguments": {"location": "Tokyo"},
"result": {"temperature": 22, "condition": "Sunny"}
}
// Streaming end
{"type": "end", "content": ""}
// Error
{"type": "error", "content": "Error message"}Health Check:
GET /
Session Info:
GET /sessions/{session_id}
Detailed Health:
GET /health
Why both Memory and Database?
- Memory (SessionManager): Fast access for active conversations, builds context for LLM
- Database (Supabase): Persistent storage, survives crashes, enables historical analysis
This pattern balances performance (memory) with reliability (database).
Why save events during the session, not just at the end?
- Crash resilience: Partial conversation history preserved
- Live monitoring: Can view ongoing conversations
- Audit trail: Exact chronological record of all events
Why stream instead of waiting for complete response?
- Lower perceived latency (~500ms to first token vs ~5s for complete)
- Better UX: Users see progress immediately
- Matches modern AI chat interfaces (ChatGPT, Claude)
Note on Groq Performance: The system implements true token-by-token streaming via WebSocket. However, Groq's inference is extremely fast (500+ tokens/second), which may make the streaming appear nearly instantaneous in the UI. The streaming architecture is fully functional - you can verify by checking browser console network logs or using slower LLM providers like Ollama. For demonstration purposes, add a small delay (await asyncio.sleep(0.02)) after line 127 in app/main.py to make streaming visually apparent.
Why use the LLM's native tool calling vs. custom parsing?
- Reliability: LLM decides when tools are needed
- Flexibility: Easy to add new tools without prompt engineering
- Standards-based: Uses OpenAI-compatible format
For this demo, why Groq?
- Memory constraints: Local LLMs require 4-8GB RAM
- Speed: Groq delivers 500+ tokens/sec (vs 20-50 for local)
- Free tier: Sufficient for development and demos
- Tool support: All models support function calling
Why plugin-based tools instead of hardcoded definitions?
- Scalability: Easily scale from 2 to 100+ tools without modifying core code
- Maintainability: Each tool isolated in its own file for independent testing
- Auto-discovery: Tools automatically loaded from
app/tools/directory - Team collaboration: Multiple developers can work on different tools without conflicts
- Flexibility: Enable/disable tools by renaming files (no code changes)
Why enable RLS policies?
- Security best practice: Defense in depth
- Access control: Can add user-specific policies later
- Compliance ready: Meets data protection requirements
realtime-ai-backend/
│
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI app & WebSocket endpoint
│ ├── config.py # Configuration management
│ ├── database.py # Supabase client & operations
│ ├── llm_service.py # LLM integration (Groq/Ollama)
│ ├── session_manager.py # In-memory session state
│ ├── models.py # Pydantic models (optional)
│ └── tools/ # Modular tool system (auto-discovery)
│ ├── __init__.py # Tool registry
│ ├── base.py # Abstract base class for tools
│ ├── README.md # Developer guide for adding tools
│ ├── get_weather.py # Weather lookup tool
│ ├── search_database.py # Database search tool
│ └── send_email.py # Email sending tool (example)
│
├── frontend.html # Simple web UI
├── requirements.txt # Python dependencies
├── .env # Environment variables (not committed)
├── .env.example # Environment template
└── README.md # This file
- FastAPI: Modern async web framework
- Uvicorn: ASGI server with WebSocket support
- Supabase: PostgreSQL with real-time capabilities
- Groq: Ultra-fast LLM inference API
- Pydantic: Data validation and settings management
Stores high-level session metadata:
| Column | Type | Description |
|---|---|---|
| session_id | VARCHAR(255) | Primary key, unique session identifier |
| user_id | VARCHAR(255) | User identifier |
| start_time | TIMESTAMP | Session start time (auto) |
| end_time | TIMESTAMP | Session end time (set on close) |
| duration_seconds | INTEGER | Total session duration |
| summary | TEXT | AI-generated conversation summary |
| status | VARCHAR(50) | 'active' or 'completed' |
Stores granular event log:
| Column | Type | Description |
|---|---|---|
| event_id | UUID | Primary key (auto-generated) |
| session_id | VARCHAR(255) | Foreign key to sessions |
| timestamp | TIMESTAMP | Event timestamp (auto) |
| event_type | VARCHAR(50) | Event category |
| content | TEXT | Event content/message |
| metadata | JSONB | Additional structured data |
Event Types:
user_message: User inputai_response: AI's complete responsetool_call: Function/tool executionsystem_event: Connection, disconnection, errors
"ModuleNotFoundError: No module named 'app'"
- Use
python -m app.maininstead ofpython app/main.py
"Row-level security policy violation"
- Ensure RLS policies are created in Supabase
- Check SQL editor for errors during schema setup
"Model requires more memory" (if using Ollama)
- Switch to smaller model:
ollama pull llama3.1:3b - Or use Groq: Set
LLM_PROVIDER=groqin.env
"WebSocket connection failed"
- Verify server is running on correct port
- Check firewall settings
- Try
localhostinstead of0.0.0.0
"Groq rate limit exceeded"
- Free tier: 30 requests/minute
- Wait 60 seconds or upgrade plan
Potential improvements for future versions:
- Authentication: User login and session management
- Rate Limiting: Request throttling
- More Tools: Email, calendar, web search integration
- Multi-language: Support for different languages