Production-grade voice-enabled AI agent platform. Users create personalized Guide agents through a multi-step wizard, then interact via real-time voice or text conversations. Each agent operates from a JSON contract that defines its personality, communication style, and therapeutic approach.
┌─────────────────────────────────────────────────────────────┐
│ Real-Time Voice Flow │
│ │
│ 🎤 User Speaks │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ LiveKit │────▶│ Deepgram │────▶│ LangGraph │ │
│ │ Capture │ │ STT │ │ Agent │ │
│ │ │ │ (streaming) │ │ (contract) │ │
│ └──────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ 🔊 User Hears ┌──────────────┐ ┌──────────────┐ │
│ ▲ │ LiveKit │◀────│ ElevenLabs │ │
│ └────────────│ Stream │ │ TTS │ │
│ │ │ │ (Turbo v2) │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ End-to-end latency: < 300ms │
└─────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────┐
│ Next.js 14 Frontend (React 18 + TailwindCSS + Framer Motion)│
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ Intake Form │ │ 7-Step Agent │ │ Chat + Voice │ │
│ │ (Discovery) │ │ Builder │ │ Interface │ │
│ └──────┬───────┘ └──────┬───────┘ └─────────┬─────────┘ │
│ └─────────────────┴─────────────────────┘ │
│ │ REST + WebSocket │
├────────────────────────────┼──────────────────────────────────┤
│ FastAPI Backend │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ LangGraph Agent System │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ IntakeAgent │────────▶│ Guide Agent │ │ │
│ │ │ V2 │ creates │ (from JSON contract) │ │ │
│ │ │ │ contract│ │ │ │
│ │ └─────────────┘ │ • Personality traits │ │ │
│ │ │ • Communication style│ │ │
│ │ ┌─────────────┐ │ • Focus areas │ │ │
│ │ │ Affirmation │ │ • Voice selection │ │ │
│ │ │ Agent │ └─────────────────────┘ │ │
│ │ └─────────────┘ │ │
│ │ ┌─────────────┐ │ │
│ │ │ Protocol │ │ │
│ │ │ Agent │ │ │
│ │ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Mem0 │ │PostgreSQL│ │ pgvector │ │
│ │ (Semantic│ │(Supabase)│ │(Vectors) │ │
│ │ Memory) │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└───────────────────────────────────────────────────────────────┘
1. Intake & Discovery — IntakeAgentV2 collects user goals, preferences, and therapeutic needs through a conversational flow powered by LangGraph.
2. Agent Creation — A 7-step builder wizard lets users customize their Guide agent across 9 personality trait sliders, communication style, voice selection (8 ElevenLabs voices with live preview), focus areas, and philosophical approach. The result is a JSON contract that fully defines the agent's behavior.
3. Conversational Sessions — Users interact with their personalized agent via text or real-time voice. The voice pipeline streams through LiveKit → Deepgram STT → LangGraph agent → ElevenLabs TTS (Turbo v2) → LiveKit, achieving sub-300ms end-to-end latency.
4. Content Generation — Specialized agents generate personalized affirmations, hypnotherapy scripts, and comprehensive manifestation protocols based on the user's goals and session history.
5. Semantic Memory — All interactions persist in Mem0 with namespace isolation (tenant:agent:context), enabling agents to recall relevant context from previous sessions through vector similarity search.
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, React 18, TypeScript, TailwindCSS, Radix UI, Framer Motion |
| Backend | FastAPI 0.115+, Python 3.11+, Pydantic V2 |
| Agent Framework | LangGraph 0.2.27 (StateGraph with tool routing) |
| Agent Behavior | JSON contract-based prompt architecture |
| Voice — Capture | LiveKit 1.0.12 (WebRTC rooms) |
| Voice — STT | Deepgram 3.7.0 (streaming transcription) |
| Voice — TTS | ElevenLabs 1.8.0 (Turbo v2, 8 voice options) |
| Memory | Mem0 0.1.17 (cloud semantic memory with vector search) |
| Database | PostgreSQL via Supabase, pgvector for embeddings |
| Auth | Supabase Auth with cookie-based sessions |
| Infrastructure | Docker Compose, Azure Container Apps |
- Python 3.11+
- Node.js 18+
- Supabase account
- OpenAI API key
# Clone and install
git clone https://github.com/DTSP-AI/numen-ai.git
cd numen-ai
# Windows
scripts\install.bat
# Mac/Linux
./scripts/install.shEdit .env with your credentials:
# Required (core functionality)
SUPABASE_DB_URL=postgresql://...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_KEY=eyJ...
OPENAI_API_KEY=sk-proj-...
# Optional (voice features — app degrades gracefully without these)
ELEVENLABS_API_KEY=...
DEEPGRAM_API_KEY=...
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...# Windows
scripts\dev.bat
# Mac/Linux
./scripts/dev.shFrontend: http://localhost:3003 API Docs: http://localhost:8003/docs Health: http://localhost:8003/health
| Method | Endpoint | Purpose |
|---|---|---|
POST |
/api/agents |
Create agent from JSON contract |
GET |
/api/agents |
List all agents |
GET |
/api/agents/{id} |
Get agent details |
PATCH |
/api/agents/{id} |
Update agent contract |
POST |
/api/chat/{agent_id} |
Send message (streaming response) |
GET |
/api/threads/{agent_id} |
Get conversation history |
GET |
/api/voices |
List available ElevenLabs voices |
POST |
/api/voices/preview |
Generate voice preview |
POST |
/api/voices/synthesize |
Text-to-speech synthesis |
POST |
/api/affirmations/generate |
Generate personalized affirmations |
POST |
/api/livekit/token |
Generate LiveKit room access token |
Contract-Driven Agents — Every agent's personality, scope, tone, and behavior is defined in a JSON contract file. This means agent behavior can be modified, A/B tested, or cloned without touching Python code. The same pattern powers production multi-agent systems at scale.
Mem0 over local vector stores — Cloud-based semantic memory with namespace isolation provides multi-tenant memory without managing FAISS/Chroma infrastructure. Namespace pattern (tenant:agent:context) ensures complete data separation.
Graceful Voice Degradation — The app checks service availability at startup and disables voice features if LiveKit/Deepgram/ElevenLabs keys are missing. Core text-based functionality always works. Health endpoint reports "healthy" or "degraded" with per-service status.
ElevenLabs Turbo v2 — Chosen for voice synthesis because it achieves sub-300ms latency, which is the threshold for natural-feeling conversation. Standard models add 500ms+ which creates noticeable pauses that break immersion.
backend/
├── agents/
│ ├── guide_agent/ # Personalized therapy agent
│ ├── intake_agent/ # Goal collection + contract generation
│ └── __init__.py
├── graph/ # LangGraph StateGraph definitions
├── services/ # LiveKit, Deepgram, ElevenLabs, Memory
├── routers/ # FastAPI route handlers
├── models/ # Pydantic V2 schemas
├── memoryManager/ # Mem0 integration layer
├── middleware/ # Auth, CORS, rate limiting
├── config.py # Environment + settings
└── main.py
frontend/
├── src/
│ ├── app/ # Next.js App Router
│ ├── components/ # React components (agent builder, chat, voice)
│ ├── lib/ # API client, hooks, utilities
│ └── types/ # TypeScript definitions
├── tailwind.config.js
└── package.json
Mem0 provides semantic memory with automatic embedding generation and vector similarity search:
- Namespace Isolation —
{tenant_id}:{agent_id}:{context}pattern ensures complete data separation between users and agents - Automatic Relevance — Agent queries Mem0 before each response to surface relevant prior context
- Multi-Session Persistence — Memory persists across sessions, enabling continuity without explicit state management
- Structured Storage — PostgreSQL (via Supabase) handles agent configs, thread history, and generated content alongside Mem0's semantic layer
MIT
Pete Davidsmeier — AI Agent Architect, CTO at YBRYX Business Systems