Realtime AI Backend - WebSocket + Supabase

A high-performance, asynchronous Python backend for real-time AI conversations with function calling, streaming responses, and persistent storage.

Features

Real-time WebSocket Communication: Bi-directional streaming with low latency
Modular Tool System: Extensible plugin architecture with auto-discovery (easily scale to 100+ tools)
LLM Function Calling: AI can execute tools dynamically (weather, database search, email, etc.)
Token-by-Token Streaming: Response tokens stream immediately to client
Persistent Storage: All events logged to Supabase PostgreSQL in real-time
Post-Session Automation: AI-generated summaries after conversation ends
Dual LLM Support: Works with local Ollama or cloud Groq models

Architecture

┌─────────────┐      WebSocket       ┌──────────────────┐
│   Client    │◄────────────────────►│  FastAPI Server  │
│  (Browser)  │    (bidirectional)   │   (async/await)  │
└─────────────┘                      └────────┬─────────┘
                                              │
                                    ┌─────────┴─────────┐
                                    │                   │
                              ┌─────▼──────┐     ┌─────▼─────┐
                              │  LLM API   │     │  Supabase │
                              │ (streaming)│     │ (Postgres)│
                              └────────────┘     └───────────┘

Key Components:

FastAPI: Async web framework with WebSocket support
Supabase: PostgreSQL database with real-time capabilities
Groq/Ollama: LLM providers with streaming and function calling
Session Manager: In-memory state management for active sessions

Prerequisites

Python 3.11+
Supabase account (https://supabase.com)
Groq API key (https://console.groq.com) - Free tier available
OR Ollama installed locally (https://ollama.com)

Installation

1. Clone Repository

git clone https://github.com/allwin107/Realtime-AI-Backend.git
cd realtime-ai-backend

2. Create Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

Database Setup

1. Create Supabase Project

Go to https://supabase.com
Create new project
Wait for database provisioning (~2 minutes)

2. Run Schema Setup

Go to SQL Editor in Supabase dashboard and execute schema.sql:

-- Enable UUID extension
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

-- Create sessions table
CREATE TABLE sessions (
    session_id VARCHAR(255) PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    start_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    end_time TIMESTAMP WITH TIME ZONE,
    duration_seconds INTEGER,
    summary TEXT,
    status VARCHAR(50) DEFAULT 'active',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create events table
CREATE TABLE events (
    event_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id VARCHAR(255) NOT NULL REFERENCES sessions(session_id) ON DELETE CASCADE,
    timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    event_type VARCHAR(50) NOT NULL,
    content TEXT,
    metadata JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create indexes
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
CREATE INDEX idx_sessions_start_time ON sessions(start_time DESC);
CREATE INDEX idx_events_session_id ON events(session_id);
CREATE INDEX idx_events_timestamp ON events(timestamp);
CREATE INDEX idx_events_type ON events(event_type);

-- Enable RLS
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE events ENABLE ROW LEVEL SECURITY;

-- Create RLS policies
CREATE POLICY "Allow service role to insert sessions"
ON sessions FOR INSERT TO anon, authenticated WITH CHECK (true);

CREATE POLICY "Allow service role to select sessions"
ON sessions FOR SELECT TO anon, authenticated USING (true);

CREATE POLICY "Allow service role to update sessions"
ON sessions FOR UPDATE TO anon, authenticated USING (true) WITH CHECK (true);

CREATE POLICY "Allow service role to delete sessions"
ON sessions FOR DELETE TO anon, authenticated USING (true);

CREATE POLICY "Allow service role to insert events"
ON events FOR INSERT TO anon, authenticated WITH CHECK (true);

CREATE POLICY "Allow service role to select events"
ON events FOR SELECT TO anon, authenticated USING (true);

CREATE POLICY "Allow service role to update events"
ON events FOR UPDATE TO anon, authenticated USING (true) WITH CHECK (true);

CREATE POLICY "Allow service role to delete events"
ON events FOR DELETE TO anon, authenticated USING (true);

3. Get Credentials

In Supabase Dashboard:

Go to Project Settings → API
Copy Project URL (e.g., https://xxxxx.supabase.co)
Copy anon public key

Configuration

1. Create `.env` File

Copy .env.example to .env:

cp .env.example .env

2. Configure Environment Variables

Edit .env:

# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-anon-key-here

# LLM Configuration
LLM_PROVIDER=groq  # Options: "ollama" or "groq"

# Groq Settings (if using Groq)
GROQ_API_KEY=your-groq-api-key
GROQ_MODEL=llama-3.3-70b-versatile

# Ollama Settings (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

# Application Settings
HOST=0.0.0.0
PORT=8000
DEBUG=True
ENVIRONMENT=development

Running the Application

Start the Server

python -m app.main

Expected output:

Configuration loaded successfully!
   LLM Provider: groq
   Environment: development
   Port: 8000
Database client initialized
LLM Service initialized: Groq (llama-3.3-70b-versatile)
Session Manager initialized
==================================================
REALTIME AI BACKEND STARTING
==================================================
Host: 0.0.0.0
Port: 8000
LLM: groq
Environment: development
==================================================
INFO:     Uvicorn running on http://0.0.0.0:8000

Testing

Option 1: Web Frontend (Recommended)

Step 1: Start the Server (if not already running)

python -m app.main

Step 2: Open the Frontend

Navigate to your project directory
Double-click frontend.html OR
Right-click frontend.html → Open with → Your browser (Chrome, Firefox, Edge)
The file will open at: file:///path/to/realtime-ai-backend/frontend.html

Step 3: Connect and Test

In the frontend UI, click Connect button (uses default session ID)
Send test messages:
- "What's the weather in Tokyo?"
- "What's the weather in London?"
- "Search our database for Python tutorials"
Watch the AI respond in real-time with tool execution!

Note: Tool calls are intentionally displayed in the frontend to demonstrate the complex interaction requirement. The green boxes show when the AI calls functions, their arguments, and results.

To disable tool call display: If you prefer a cleaner UI without tool call boxes, comment out lines 393-397 in frontend.html:

// case 'tool_call':
//     typingIndicator.classList.remove('active');
//     addToolCallMessage(data);
//     typingIndicator.classList.add('active');
//     break;

This will hide the tool execution details while still allowing the AI to use tools behind the scenes.

Option 2: Postman

Create new WebSocket Request
Connect to: ws://localhost:8000/ws/session/test-001?user_id=testuser
Send text messages
Observe streaming tokens and tool calls

Option 3: Python Script

import asyncio
import websockets
import json

async def test_websocket():
    uri = "ws://localhost:8000/ws/session/test-001?user_id=testuser"
    
    async with websockets.connect(uri) as websocket:
        # Send message
        await websocket.send("What's the weather in London?")
        
        # Receive responses
        async for message in websocket:
            data = json.loads(message)
            print(f"Received: {data}")

asyncio.run(test_websocket())

Verify in Supabase

After testing:

Go to Table Editor → sessions
Check for your session with status "completed"
View the AI-generated summary
Go to events table to see chronological log

API Documentation

WebSocket Endpoint

URL: ws://localhost:8000/ws/session/{session_id}?user_id={user_id}

Parameters:

session_id (path): Unique session identifier
user_id (query): User identifier

Message Format:

Client → Server:

Plain text message

Server → Client:

// System message
{"type": "system", "content": "Connected to session: xyz"}

// Streaming start
{"type": "start", "content": ""}

// Token streaming
{"type": "token", "content": "Hello"}

// Tool call
{
  "type": "tool_call",
  "tool_name": "get_weather",
  "arguments": {"location": "Tokyo"},
  "result": {"temperature": 22, "condition": "Sunny"}
}

// Streaming end
{"type": "end", "content": ""}

// Error
{"type": "error", "content": "Error message"}

HTTP Endpoints

Health Check:

GET /

Session Info:

GET /sessions/{session_id}

Detailed Health:

GET /health

Design Decisions

1. Dual Storage Strategy

Why both Memory and Database?

Memory (SessionManager): Fast access for active conversations, builds context for LLM
Database (Supabase): Persistent storage, survives crashes, enables historical analysis

This pattern balances performance (memory) with reliability (database).

2. Real-Time Event Logging

Why save events during the session, not just at the end?

Crash resilience: Partial conversation history preserved
Live monitoring: Can view ongoing conversations
Audit trail: Exact chronological record of all events

3. Token-by-Token Streaming

Why stream instead of waiting for complete response?

Lower perceived latency (~500ms to first token vs ~5s for complete)
Better UX: Users see progress immediately
Matches modern AI chat interfaces (ChatGPT, Claude)

Note on Groq Performance: The system implements true token-by-token streaming via WebSocket. However, Groq's inference is extremely fast (500+ tokens/second), which may make the streaming appear nearly instantaneous in the UI. The streaming architecture is fully functional - you can verify by checking browser console network logs or using slower LLM providers like Ollama. For demonstration purposes, add a small delay (await asyncio.sleep(0.02)) after line 127 in app/main.py to make streaming visually apparent.

4. Function Calling Architecture

Why use the LLM's native tool calling vs. custom parsing?

Reliability: LLM decides when tools are needed
Flexibility: Easy to add new tools without prompt engineering
Standards-based: Uses OpenAI-compatible format

5. Groq Over Local Models

For this demo, why Groq?

Memory constraints: Local LLMs require 4-8GB RAM
Speed: Groq delivers 500+ tokens/sec (vs 20-50 for local)
Free tier: Sufficient for development and demos
Tool support: All models support function calling

6. Modular Tool Architecture

Why plugin-based tools instead of hardcoded definitions?

Scalability: Easily scale from 2 to 100+ tools without modifying core code
Maintainability: Each tool isolated in its own file for independent testing
Auto-discovery: Tools automatically loaded from app/tools/ directory
Team collaboration: Multiple developers can work on different tools without conflicts
Flexibility: Enable/disable tools by renaming files (no code changes)

7. Row-Level Security (RLS)

Why enable RLS policies?

Security best practice: Defense in depth
Access control: Can add user-specific policies later
Compliance ready: Meets data protection requirements

Project Structure

realtime-ai-backend/
│
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI app & WebSocket endpoint
│   ├── config.py            # Configuration management
│   ├── database.py          # Supabase client & operations
│   ├── llm_service.py       # LLM integration (Groq/Ollama)
│   ├── session_manager.py   # In-memory session state
│   ├── models.py            # Pydantic models (optional)
│   └── tools/               # Modular tool system (auto-discovery)
│       ├── __init__.py      # Tool registry
│       ├── base.py          # Abstract base class for tools
│       ├── README.md        # Developer guide for adding tools
│       ├── get_weather.py   # Weather lookup tool
│       ├── search_database.py  # Database search tool
│       └── send_email.py    # Email sending tool (example)
│
├── frontend.html            # Simple web UI
├── requirements.txt         # Python dependencies
├── .env                    # Environment variables (not committed)
├── .env.example            # Environment template
└── README.md               # This file

Key Technologies

FastAPI: Modern async web framework
Uvicorn: ASGI server with WebSocket support
Supabase: PostgreSQL with real-time capabilities
Groq: Ultra-fast LLM inference API
Pydantic: Data validation and settings management

Database Schema Details

Sessions Table

Stores high-level session metadata:

Column	Type	Description
session_id	VARCHAR(255)	Primary key, unique session identifier
user_id	VARCHAR(255)	User identifier
start_time	TIMESTAMP	Session start time (auto)
end_time	TIMESTAMP	Session end time (set on close)
duration_seconds	INTEGER	Total session duration
summary	TEXT	AI-generated conversation summary
status	VARCHAR(50)	'active' or 'completed'

Events Table

Stores granular event log:

Column	Type	Description
event_id	UUID	Primary key (auto-generated)
session_id	VARCHAR(255)	Foreign key to sessions
timestamp	TIMESTAMP	Event timestamp (auto)
event_type	VARCHAR(50)	Event category
content	TEXT	Event content/message
metadata	JSONB	Additional structured data

Event Types:

user_message: User input
ai_response: AI's complete response
tool_call: Function/tool execution
system_event: Connection, disconnection, errors

Troubleshooting

"ModuleNotFoundError: No module named 'app'"

Use python -m app.main instead of python app/main.py

"Row-level security policy violation"

Ensure RLS policies are created in Supabase
Check SQL editor for errors during schema setup

"Model requires more memory" (if using Ollama)

Switch to smaller model: ollama pull llama3.1:3b
Or use Groq: Set LLM_PROVIDER=groq in .env

"WebSocket connection failed"

Verify server is running on correct port
Check firewall settings
Try localhost instead of 0.0.0.0

"Groq rate limit exceeded"

Free tier: 30 requests/minute
Wait 60 seconds or upgrade plan

Roadmap

Potential improvements for future versions:

Authentication: User login and session management
Rate Limiting: Request throttling
More Tools: Email, calendar, web search integration
Multi-language: Support for different languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
frontend.html		frontend.html
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Realtime AI Backend - WebSocket + Supabase

Features

Table of Contents

Architecture

Prerequisites

Installation

1. Clone Repository

2. Create Virtual Environment

3. Install Dependencies

Database Setup

1. Create Supabase Project

2. Run Schema Setup

3. Get Credentials

Configuration

1. Create .env File

2. Configure Environment Variables

Running the Application

Start the Server

Testing

Option 1: Web Frontend (Recommended)

Option 2: Postman

Option 3: Python Script

Verify in Supabase

API Documentation

WebSocket Endpoint

HTTP Endpoints

Design Decisions

1. Dual Storage Strategy

2. Real-Time Event Logging

3. Token-by-Token Streaming

4. Function Calling Architecture

5. Groq Over Local Models

6. Modular Tool Architecture

7. Row-Level Security (RLS)

Project Structure

Key Technologies

Database Schema Details

Sessions Table

Events Table

Troubleshooting

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Create `.env` File

Packages