Skip to content

DTSP-AI/numen-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NumenAI

Python Next.js FastAPI LangGraph LiveKit Deepgram ElevenLabs License

Production-grade voice-enabled AI agent platform. Users create personalized Guide agents through a multi-step wizard, then interact via real-time voice or text conversations. Each agent operates from a JSON contract that defines its personality, communication style, and therapeutic approach.


Voice Pipeline

┌─────────────────────────────────────────────────────────────┐
│                    Real-Time Voice Flow                      │
│                                                             │
│  🎤 User Speaks                                             │
│       │                                                     │
│       ▼                                                     │
│  ┌──────────┐     ┌──────────────┐     ┌──────────────┐    │
│  │  LiveKit  │────▶│  Deepgram    │────▶│  LangGraph   │    │
│  │  Capture  │     │  STT         │     │  Agent       │    │
│  │          │     │  (streaming) │     │  (contract)  │    │
│  └──────────┘     └──────────────┘     └──────┬───────┘    │
│                                               │             │
│                                               ▼             │
│  🔊 User Hears    ┌──────────────┐     ┌──────────────┐    │
│       ▲            │  LiveKit     │◀────│  ElevenLabs  │    │
│       └────────────│  Stream      │     │  TTS         │    │
│                    │              │     │  (Turbo v2)  │    │
│                    └──────────────┘     └──────────────┘    │
│                                                             │
│                    End-to-end latency: < 300ms               │
└─────────────────────────────────────────────────────────────┘

Architecture

┌───────────────────────────────────────────────────────────────┐
│  Next.js 14 Frontend (React 18 + TailwindCSS + Framer Motion)│
│                                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────┐   │
│  │  Intake Form  │  │  7-Step Agent │  │  Chat + Voice     │   │
│  │  (Discovery)  │  │  Builder      │  │  Interface        │   │
│  └──────┬───────┘  └──────┬───────┘  └─────────┬─────────┘   │
│         └─────────────────┴─────────────────────┘             │
│                            │ REST + WebSocket                 │
├────────────────────────────┼──────────────────────────────────┤
│  FastAPI Backend           │                                  │
│                            ▼                                  │
│  ┌─────────────────────────────────────────────────────┐      │
│  │              LangGraph Agent System                 │      │
│  │                                                     │      │
│  │  ┌─────────────┐         ┌─────────────────────┐   │      │
│  │  │ IntakeAgent  │────────▶│ Guide Agent         │   │      │
│  │  │ V2           │ creates │ (from JSON contract) │   │      │
│  │  │              │ contract│                     │   │      │
│  │  └─────────────┘         │ • Personality traits │   │      │
│  │                          │ • Communication style│   │      │
│  │  ┌─────────────┐        │ • Focus areas        │   │      │
│  │  │ Affirmation  │        │ • Voice selection    │   │      │
│  │  │ Agent        │        └─────────────────────┘   │      │
│  │  └─────────────┘                                   │      │
│  │  ┌─────────────┐                                   │      │
│  │  │ Protocol     │                                   │      │
│  │  │ Agent        │                                   │      │
│  │  └─────────────┘                                   │      │
│  └─────────────────────────────────────────────────────┘      │
│                            │                                  │
│              ┌─────────────┼─────────────┐                    │
│              ▼             ▼             ▼                    │
│       ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│       │   Mem0   │  │PostgreSQL│  │ pgvector │              │
│       │ (Semantic│  │(Supabase)│  │(Vectors) │              │
│       │  Memory) │  │          │  │          │              │
│       └──────────┘  └──────────┘  └──────────┘              │
└───────────────────────────────────────────────────────────────┘

How It Works

1. Intake & Discovery — IntakeAgentV2 collects user goals, preferences, and therapeutic needs through a conversational flow powered by LangGraph.

2. Agent Creation — A 7-step builder wizard lets users customize their Guide agent across 9 personality trait sliders, communication style, voice selection (8 ElevenLabs voices with live preview), focus areas, and philosophical approach. The result is a JSON contract that fully defines the agent's behavior.

3. Conversational Sessions — Users interact with their personalized agent via text or real-time voice. The voice pipeline streams through LiveKit → Deepgram STT → LangGraph agent → ElevenLabs TTS (Turbo v2) → LiveKit, achieving sub-300ms end-to-end latency.

4. Content Generation — Specialized agents generate personalized affirmations, hypnotherapy scripts, and comprehensive manifestation protocols based on the user's goals and session history.

5. Semantic Memory — All interactions persist in Mem0 with namespace isolation (tenant:agent:context), enabling agents to recall relevant context from previous sessions through vector similarity search.

Tech Stack

Layer Technology
Frontend Next.js 14, React 18, TypeScript, TailwindCSS, Radix UI, Framer Motion
Backend FastAPI 0.115+, Python 3.11+, Pydantic V2
Agent Framework LangGraph 0.2.27 (StateGraph with tool routing)
Agent Behavior JSON contract-based prompt architecture
Voice — Capture LiveKit 1.0.12 (WebRTC rooms)
Voice — STT Deepgram 3.7.0 (streaming transcription)
Voice — TTS ElevenLabs 1.8.0 (Turbo v2, 8 voice options)
Memory Mem0 0.1.17 (cloud semantic memory with vector search)
Database PostgreSQL via Supabase, pgvector for embeddings
Auth Supabase Auth with cookie-based sessions
Infrastructure Docker Compose, Azure Container Apps

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Supabase account
  • OpenAI API key

Setup

# Clone and install
git clone https://github.com/DTSP-AI/numen-ai.git
cd numen-ai

# Windows
scripts\install.bat

# Mac/Linux
./scripts/install.sh

Configure

Edit .env with your credentials:

# Required (core functionality)
SUPABASE_DB_URL=postgresql://...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_KEY=eyJ...
OPENAI_API_KEY=sk-proj-...

# Optional (voice features — app degrades gracefully without these)
ELEVENLABS_API_KEY=...
DEEPGRAM_API_KEY=...
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...

Run

# Windows
scripts\dev.bat

# Mac/Linux
./scripts/dev.sh

Frontend: http://localhost:3003 API Docs: http://localhost:8003/docs Health: http://localhost:8003/health

API Endpoints

Method Endpoint Purpose
POST /api/agents Create agent from JSON contract
GET /api/agents List all agents
GET /api/agents/{id} Get agent details
PATCH /api/agents/{id} Update agent contract
POST /api/chat/{agent_id} Send message (streaming response)
GET /api/threads/{agent_id} Get conversation history
GET /api/voices List available ElevenLabs voices
POST /api/voices/preview Generate voice preview
POST /api/voices/synthesize Text-to-speech synthesis
POST /api/affirmations/generate Generate personalized affirmations
POST /api/livekit/token Generate LiveKit room access token

Design Decisions

Contract-Driven Agents — Every agent's personality, scope, tone, and behavior is defined in a JSON contract file. This means agent behavior can be modified, A/B tested, or cloned without touching Python code. The same pattern powers production multi-agent systems at scale.

Mem0 over local vector stores — Cloud-based semantic memory with namespace isolation provides multi-tenant memory without managing FAISS/Chroma infrastructure. Namespace pattern (tenant:agent:context) ensures complete data separation.

Graceful Voice Degradation — The app checks service availability at startup and disables voice features if LiveKit/Deepgram/ElevenLabs keys are missing. Core text-based functionality always works. Health endpoint reports "healthy" or "degraded" with per-service status.

ElevenLabs Turbo v2 — Chosen for voice synthesis because it achieves sub-300ms latency, which is the threshold for natural-feeling conversation. Standard models add 500ms+ which creates noticeable pauses that break immersion.

Project Structure

backend/
├── agents/
│   ├── guide_agent/       # Personalized therapy agent
│   ├── intake_agent/      # Goal collection + contract generation
│   └── __init__.py
├── graph/                 # LangGraph StateGraph definitions
├── services/              # LiveKit, Deepgram, ElevenLabs, Memory
├── routers/               # FastAPI route handlers
├── models/                # Pydantic V2 schemas
├── memoryManager/         # Mem0 integration layer
├── middleware/             # Auth, CORS, rate limiting
├── config.py              # Environment + settings
└── main.py

frontend/
├── src/
│   ├── app/               # Next.js App Router
│   ├── components/        # React components (agent builder, chat, voice)
│   ├── lib/               # API client, hooks, utilities
│   └── types/             # TypeScript definitions
├── tailwind.config.js
└── package.json

Memory Architecture

Mem0 provides semantic memory with automatic embedding generation and vector similarity search:

  • Namespace Isolation{tenant_id}:{agent_id}:{context} pattern ensures complete data separation between users and agents
  • Automatic Relevance — Agent queries Mem0 before each response to surface relevant prior context
  • Multi-Session Persistence — Memory persists across sessions, enabling continuity without explicit state management
  • Structured Storage — PostgreSQL (via Supabase) handles agent configs, thread history, and generated content alongside Mem0's semantic layer

License

MIT

Author

Pete Davidsmeier — AI Agent Architect, CTO at YBRYX Business Systems

LinkedIn

About

Voice-enabled AI agent platform. LiveKit + Deepgram + ElevenLabs + LangGraph + Mem0.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors