Skip to content

allwin107/Realtime-AI-Backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Realtime AI Backend - WebSocket + Supabase

A high-performance, asynchronous Python backend for real-time AI conversations with function calling, streaming responses, and persistent storage.

Features

  • Real-time WebSocket Communication: Bi-directional streaming with low latency
  • Modular Tool System: Extensible plugin architecture with auto-discovery (easily scale to 100+ tools)
  • LLM Function Calling: AI can execute tools dynamically (weather, database search, email, etc.)
  • Token-by-Token Streaming: Response tokens stream immediately to client
  • Persistent Storage: All events logged to Supabase PostgreSQL in real-time
  • Post-Session Automation: AI-generated summaries after conversation ends
  • Dual LLM Support: Works with local Ollama or cloud Groq models

Table of Contents


Architecture

┌─────────────┐      WebSocket       ┌──────────────────┐
│   Client    │◄────────────────────►│  FastAPI Server  │
│  (Browser)  │    (bidirectional)   │   (async/await)  │
└─────────────┘                      └────────┬─────────┘
                                              │
                                    ┌─────────┴─────────┐
                                    │                   │
                              ┌─────▼──────┐     ┌─────▼─────┐
                              │  LLM API   │     │  Supabase │
                              │ (streaming)│     │ (Postgres)│
                              └────────────┘     └───────────┘

Key Components:

  • FastAPI: Async web framework with WebSocket support
  • Supabase: PostgreSQL database with real-time capabilities
  • Groq/Ollama: LLM providers with streaming and function calling
  • Session Manager: In-memory state management for active sessions

Prerequisites


Installation

1. Clone Repository

git clone https://github.com/allwin107/Realtime-AI-Backend.git
cd realtime-ai-backend

2. Create Virtual Environment

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

Database Setup

1. Create Supabase Project

  1. Go to https://supabase.com
  2. Create new project
  3. Wait for database provisioning (~2 minutes)

2. Run Schema Setup

Go to SQL Editor in Supabase dashboard and execute schema.sql:

-- Enable UUID extension
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

-- Create sessions table
CREATE TABLE sessions (
    session_id VARCHAR(255) PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    start_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    end_time TIMESTAMP WITH TIME ZONE,
    duration_seconds INTEGER,
    summary TEXT,
    status VARCHAR(50) DEFAULT 'active',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create events table
CREATE TABLE events (
    event_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id VARCHAR(255) NOT NULL REFERENCES sessions(session_id) ON DELETE CASCADE,
    timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    event_type VARCHAR(50) NOT NULL,
    content TEXT,
    metadata JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create indexes
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
CREATE INDEX idx_sessions_start_time ON sessions(start_time DESC);
CREATE INDEX idx_events_session_id ON events(session_id);
CREATE INDEX idx_events_timestamp ON events(timestamp);
CREATE INDEX idx_events_type ON events(event_type);

-- Enable RLS
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
ALTER TABLE events ENABLE ROW LEVEL SECURITY;

-- Create RLS policies
CREATE POLICY "Allow service role to insert sessions"
ON sessions FOR INSERT TO anon, authenticated WITH CHECK (true);

CREATE POLICY "Allow service role to select sessions"
ON sessions FOR SELECT TO anon, authenticated USING (true);

CREATE POLICY "Allow service role to update sessions"
ON sessions FOR UPDATE TO anon, authenticated USING (true) WITH CHECK (true);

CREATE POLICY "Allow service role to delete sessions"
ON sessions FOR DELETE TO anon, authenticated USING (true);

CREATE POLICY "Allow service role to insert events"
ON events FOR INSERT TO anon, authenticated WITH CHECK (true);

CREATE POLICY "Allow service role to select events"
ON events FOR SELECT TO anon, authenticated USING (true);

CREATE POLICY "Allow service role to update events"
ON events FOR UPDATE TO anon, authenticated USING (true) WITH CHECK (true);

CREATE POLICY "Allow service role to delete events"
ON events FOR DELETE TO anon, authenticated USING (true);

3. Get Credentials

In Supabase Dashboard:

  • Go to Project SettingsAPI
  • Copy Project URL (e.g., https://xxxxx.supabase.co)
  • Copy anon public key

Configuration

1. Create .env File

Copy .env.example to .env:

cp .env.example .env

2. Configure Environment Variables

Edit .env:

# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-anon-key-here

# LLM Configuration
LLM_PROVIDER=groq  # Options: "ollama" or "groq"

# Groq Settings (if using Groq)
GROQ_API_KEY=your-groq-api-key
GROQ_MODEL=llama-3.3-70b-versatile

# Ollama Settings (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

# Application Settings
HOST=0.0.0.0
PORT=8000
DEBUG=True
ENVIRONMENT=development

Running the Application

Start the Server

python -m app.main

Expected output:

Configuration loaded successfully!
   LLM Provider: groq
   Environment: development
   Port: 8000
Database client initialized
LLM Service initialized: Groq (llama-3.3-70b-versatile)
Session Manager initialized
==================================================
REALTIME AI BACKEND STARTING
==================================================
Host: 0.0.0.0
Port: 8000
LLM: groq
Environment: development
==================================================
INFO:     Uvicorn running on http://0.0.0.0:8000

Testing

Option 1: Web Frontend (Recommended)

Step 1: Start the Server (if not already running)

python -m app.main

Step 2: Open the Frontend

  • Navigate to your project directory
  • Double-click frontend.html OR
  • Right-click frontend.html → Open with → Your browser (Chrome, Firefox, Edge)
  • The file will open at: file:///path/to/realtime-ai-backend/frontend.html

Step 3: Connect and Test

  1. In the frontend UI, click Connect button (uses default session ID)
  2. Send test messages:
    • "What's the weather in Tokyo?"
    • "What's the weather in London?"
    • "Search our database for Python tutorials"
  3. Watch the AI respond in real-time with tool execution!

Note: Tool calls are intentionally displayed in the frontend to demonstrate the complex interaction requirement. The green boxes show when the AI calls functions, their arguments, and results.

To disable tool call display: If you prefer a cleaner UI without tool call boxes, comment out lines 393-397 in frontend.html:

// case 'tool_call':
//     typingIndicator.classList.remove('active');
//     addToolCallMessage(data);
//     typingIndicator.classList.add('active');
//     break;

This will hide the tool execution details while still allowing the AI to use tools behind the scenes.

Option 2: Postman

  1. Create new WebSocket Request
  2. Connect to: ws://localhost:8000/ws/session/test-001?user_id=testuser
  3. Send text messages
  4. Observe streaming tokens and tool calls

Option 3: Python Script

import asyncio
import websockets
import json

async def test_websocket():
    uri = "ws://localhost:8000/ws/session/test-001?user_id=testuser"
    
    async with websockets.connect(uri) as websocket:
        # Send message
        await websocket.send("What's the weather in London?")
        
        # Receive responses
        async for message in websocket:
            data = json.loads(message)
            print(f"Received: {data}")

asyncio.run(test_websocket())

Verify in Supabase

After testing:

  1. Go to Table Editorsessions
  2. Check for your session with status "completed"
  3. View the AI-generated summary
  4. Go to events table to see chronological log

API Documentation

WebSocket Endpoint

URL: ws://localhost:8000/ws/session/{session_id}?user_id={user_id}

Parameters:

  • session_id (path): Unique session identifier
  • user_id (query): User identifier

Message Format:

Client → Server:

Plain text message

Server → Client:

// System message
{"type": "system", "content": "Connected to session: xyz"}

// Streaming start
{"type": "start", "content": ""}

// Token streaming
{"type": "token", "content": "Hello"}

// Tool call
{
  "type": "tool_call",
  "tool_name": "get_weather",
  "arguments": {"location": "Tokyo"},
  "result": {"temperature": 22, "condition": "Sunny"}
}

// Streaming end
{"type": "end", "content": ""}

// Error
{"type": "error", "content": "Error message"}

HTTP Endpoints

Health Check:

GET /

Session Info:

GET /sessions/{session_id}

Detailed Health:

GET /health

Design Decisions

1. Dual Storage Strategy

Why both Memory and Database?

  • Memory (SessionManager): Fast access for active conversations, builds context for LLM
  • Database (Supabase): Persistent storage, survives crashes, enables historical analysis

This pattern balances performance (memory) with reliability (database).

2. Real-Time Event Logging

Why save events during the session, not just at the end?

  • Crash resilience: Partial conversation history preserved
  • Live monitoring: Can view ongoing conversations
  • Audit trail: Exact chronological record of all events

3. Token-by-Token Streaming

Why stream instead of waiting for complete response?

  • Lower perceived latency (~500ms to first token vs ~5s for complete)
  • Better UX: Users see progress immediately
  • Matches modern AI chat interfaces (ChatGPT, Claude)

Note on Groq Performance: The system implements true token-by-token streaming via WebSocket. However, Groq's inference is extremely fast (500+ tokens/second), which may make the streaming appear nearly instantaneous in the UI. The streaming architecture is fully functional - you can verify by checking browser console network logs or using slower LLM providers like Ollama. For demonstration purposes, add a small delay (await asyncio.sleep(0.02)) after line 127 in app/main.py to make streaming visually apparent.

4. Function Calling Architecture

Why use the LLM's native tool calling vs. custom parsing?

  • Reliability: LLM decides when tools are needed
  • Flexibility: Easy to add new tools without prompt engineering
  • Standards-based: Uses OpenAI-compatible format

5. Groq Over Local Models

For this demo, why Groq?

  • Memory constraints: Local LLMs require 4-8GB RAM
  • Speed: Groq delivers 500+ tokens/sec (vs 20-50 for local)
  • Free tier: Sufficient for development and demos
  • Tool support: All models support function calling

6. Modular Tool Architecture

Why plugin-based tools instead of hardcoded definitions?

  • Scalability: Easily scale from 2 to 100+ tools without modifying core code
  • Maintainability: Each tool isolated in its own file for independent testing
  • Auto-discovery: Tools automatically loaded from app/tools/ directory
  • Team collaboration: Multiple developers can work on different tools without conflicts
  • Flexibility: Enable/disable tools by renaming files (no code changes)

7. Row-Level Security (RLS)

Why enable RLS policies?

  • Security best practice: Defense in depth
  • Access control: Can add user-specific policies later
  • Compliance ready: Meets data protection requirements

Project Structure

realtime-ai-backend/
│
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI app & WebSocket endpoint
│   ├── config.py            # Configuration management
│   ├── database.py          # Supabase client & operations
│   ├── llm_service.py       # LLM integration (Groq/Ollama)
│   ├── session_manager.py   # In-memory session state
│   ├── models.py            # Pydantic models (optional)
│   └── tools/               # Modular tool system (auto-discovery)
│       ├── __init__.py      # Tool registry
│       ├── base.py          # Abstract base class for tools
│       ├── README.md        # Developer guide for adding tools
│       ├── get_weather.py   # Weather lookup tool
│       ├── search_database.py  # Database search tool
│       └── send_email.py    # Email sending tool (example)
│
├── frontend.html            # Simple web UI
├── requirements.txt         # Python dependencies
├── .env                    # Environment variables (not committed)
├── .env.example            # Environment template
└── README.md               # This file

Key Technologies

  • FastAPI: Modern async web framework
  • Uvicorn: ASGI server with WebSocket support
  • Supabase: PostgreSQL with real-time capabilities
  • Groq: Ultra-fast LLM inference API
  • Pydantic: Data validation and settings management

Database Schema Details

Sessions Table

Stores high-level session metadata:

Column Type Description
session_id VARCHAR(255) Primary key, unique session identifier
user_id VARCHAR(255) User identifier
start_time TIMESTAMP Session start time (auto)
end_time TIMESTAMP Session end time (set on close)
duration_seconds INTEGER Total session duration
summary TEXT AI-generated conversation summary
status VARCHAR(50) 'active' or 'completed'

Events Table

Stores granular event log:

Column Type Description
event_id UUID Primary key (auto-generated)
session_id VARCHAR(255) Foreign key to sessions
timestamp TIMESTAMP Event timestamp (auto)
event_type VARCHAR(50) Event category
content TEXT Event content/message
metadata JSONB Additional structured data

Event Types:

  • user_message: User input
  • ai_response: AI's complete response
  • tool_call: Function/tool execution
  • system_event: Connection, disconnection, errors

Troubleshooting

"ModuleNotFoundError: No module named 'app'"

  • Use python -m app.main instead of python app/main.py

"Row-level security policy violation"

  • Ensure RLS policies are created in Supabase
  • Check SQL editor for errors during schema setup

"Model requires more memory" (if using Ollama)

  • Switch to smaller model: ollama pull llama3.1:3b
  • Or use Groq: Set LLM_PROVIDER=groq in .env

"WebSocket connection failed"

  • Verify server is running on correct port
  • Check firewall settings
  • Try localhost instead of 0.0.0.0

"Groq rate limit exceeded"

  • Free tier: 30 requests/minute
  • Wait 60 seconds or upgrade plan

Roadmap

Potential improvements for future versions:

  • Authentication: User login and session management
  • Rate Limiting: Request throttling
  • More Tools: Email, calendar, web search integration
  • Multi-language: Support for different languages

About

A high-performance, asynchronous Python backend that simulates a real-time conversational session.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors