NumenAI

Production-grade voice-enabled AI agent platform. Users create personalized Guide agents through a multi-step wizard, then interact via real-time voice or text conversations. Each agent operates from a JSON contract that defines its personality, communication style, and therapeutic approach.

Voice Pipeline

┌─────────────────────────────────────────────────────────────┐
│                    Real-Time Voice Flow                      │
│                                                             │
│  🎤 User Speaks                                             │
│       │                                                     │
│       ▼                                                     │
│  ┌──────────┐     ┌──────────────┐     ┌──────────────┐    │
│  │  LiveKit  │────▶│  Deepgram    │────▶│  LangGraph   │    │
│  │  Capture  │     │  STT         │     │  Agent       │    │
│  │          │     │  (streaming) │     │  (contract)  │    │
│  └──────────┘     └──────────────┘     └──────┬───────┘    │
│                                               │             │
│                                               ▼             │
│  🔊 User Hears    ┌──────────────┐     ┌──────────────┐    │
│       ▲            │  LiveKit     │◀────│  ElevenLabs  │    │
│       └────────────│  Stream      │     │  TTS         │    │
│                    │              │     │  (Turbo v2)  │    │
│                    └──────────────┘     └──────────────┘    │
│                                                             │
│                    End-to-end latency: < 300ms               │
└─────────────────────────────────────────────────────────────┘

Architecture

┌───────────────────────────────────────────────────────────────┐
│  Next.js 14 Frontend (React 18 + TailwindCSS + Framer Motion)│
│                                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────┐   │
│  │  Intake Form  │  │  7-Step Agent │  │  Chat + Voice     │   │
│  │  (Discovery)  │  │  Builder      │  │  Interface        │   │
│  └──────┬───────┘  └──────┬───────┘  └─────────┬─────────┘   │
│         └─────────────────┴─────────────────────┘             │
│                            │ REST + WebSocket                 │
├────────────────────────────┼──────────────────────────────────┤
│  FastAPI Backend           │                                  │
│                            ▼                                  │
│  ┌─────────────────────────────────────────────────────┐      │
│  │              LangGraph Agent System                 │      │
│  │                                                     │      │
│  │  ┌─────────────┐         ┌─────────────────────┐   │      │
│  │  │ IntakeAgent  │────────▶│ Guide Agent         │   │      │
│  │  │ V2           │ creates │ (from JSON contract) │   │      │
│  │  │              │ contract│                     │   │      │
│  │  └─────────────┘         │ • Personality traits │   │      │
│  │                          │ • Communication style│   │      │
│  │  ┌─────────────┐        │ • Focus areas        │   │      │
│  │  │ Affirmation  │        │ • Voice selection    │   │      │
│  │  │ Agent        │        └─────────────────────┘   │      │
│  │  └─────────────┘                                   │      │
│  │  ┌─────────────┐                                   │      │
│  │  │ Protocol     │                                   │      │
│  │  │ Agent        │                                   │      │
│  │  └─────────────┘                                   │      │
│  └─────────────────────────────────────────────────────┘      │
│                            │                                  │
│              ┌─────────────┼─────────────┐                    │
│              ▼             ▼             ▼                    │
│       ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│       │   Mem0   │  │PostgreSQL│  │ pgvector │              │
│       │ (Semantic│  │(Supabase)│  │(Vectors) │              │
│       │  Memory) │  │          │  │          │              │
│       └──────────┘  └──────────┘  └──────────┘              │
└───────────────────────────────────────────────────────────────┘

How It Works

1. Intake & Discovery — IntakeAgentV2 collects user goals, preferences, and therapeutic needs through a conversational flow powered by LangGraph.

2. Agent Creation — A 7-step builder wizard lets users customize their Guide agent across 9 personality trait sliders, communication style, voice selection (8 ElevenLabs voices with live preview), focus areas, and philosophical approach. The result is a JSON contract that fully defines the agent's behavior.

3. Conversational Sessions — Users interact with their personalized agent via text or real-time voice. The voice pipeline streams through LiveKit → Deepgram STT → LangGraph agent → ElevenLabs TTS (Turbo v2) → LiveKit, achieving sub-300ms end-to-end latency.

4. Content Generation — Specialized agents generate personalized affirmations, hypnotherapy scripts, and comprehensive manifestation protocols based on the user's goals and session history.

5. Semantic Memory — All interactions persist in Mem0 with namespace isolation (tenant:agent:context), enabling agents to recall relevant context from previous sessions through vector similarity search.

Tech Stack

Layer	Technology
Frontend	Next.js 14, React 18, TypeScript, TailwindCSS, Radix UI, Framer Motion
Backend	FastAPI 0.115+, Python 3.11+, Pydantic V2
Agent Framework	LangGraph 0.2.27 (StateGraph with tool routing)
Agent Behavior	JSON contract-based prompt architecture
Voice — Capture	LiveKit 1.0.12 (WebRTC rooms)
Voice — STT	Deepgram 3.7.0 (streaming transcription)
Voice — TTS	ElevenLabs 1.8.0 (Turbo v2, 8 voice options)
Memory	Mem0 0.1.17 (cloud semantic memory with vector search)
Database	PostgreSQL via Supabase, pgvector for embeddings
Auth	Supabase Auth with cookie-based sessions
Infrastructure	Docker Compose, Azure Container Apps

Quick Start

Prerequisites

Python 3.11+
Node.js 18+
Supabase account
OpenAI API key

Setup

# Clone and install
git clone https://github.com/DTSP-AI/numen-ai.git
cd numen-ai

# Windows
scripts\install.bat

# Mac/Linux
./scripts/install.sh

Configure

Edit .env with your credentials:

# Required (core functionality)
SUPABASE_DB_URL=postgresql://...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_KEY=eyJ...
OPENAI_API_KEY=sk-proj-...

# Optional (voice features — app degrades gracefully without these)
ELEVENLABS_API_KEY=...
DEEPGRAM_API_KEY=...
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...

Run

# Windows
scripts\dev.bat

# Mac/Linux
./scripts/dev.sh

Frontend: http://localhost:3003 API Docs: http://localhost:8003/docs Health: http://localhost:8003/health

API Endpoints

Method	Endpoint	Purpose
`POST`	`/api/agents`	Create agent from JSON contract
`GET`	`/api/agents`	List all agents
`GET`	`/api/agents/{id}`	Get agent details
`PATCH`	`/api/agents/{id}`	Update agent contract
`POST`	`/api/chat/{agent_id}`	Send message (streaming response)
`GET`	`/api/threads/{agent_id}`	Get conversation history
`GET`	`/api/voices`	List available ElevenLabs voices
`POST`	`/api/voices/preview`	Generate voice preview
`POST`	`/api/voices/synthesize`	Text-to-speech synthesis
`POST`	`/api/affirmations/generate`	Generate personalized affirmations
`POST`	`/api/livekit/token`	Generate LiveKit room access token

Design Decisions

Contract-Driven Agents — Every agent's personality, scope, tone, and behavior is defined in a JSON contract file. This means agent behavior can be modified, A/B tested, or cloned without touching Python code. The same pattern powers production multi-agent systems at scale.

Mem0 over local vector stores — Cloud-based semantic memory with namespace isolation provides multi-tenant memory without managing FAISS/Chroma infrastructure. Namespace pattern (tenant:agent:context) ensures complete data separation.

Graceful Voice Degradation — The app checks service availability at startup and disables voice features if LiveKit/Deepgram/ElevenLabs keys are missing. Core text-based functionality always works. Health endpoint reports "healthy" or "degraded" with per-service status.

ElevenLabs Turbo v2 — Chosen for voice synthesis because it achieves sub-300ms latency, which is the threshold for natural-feeling conversation. Standard models add 500ms+ which creates noticeable pauses that break immersion.

Project Structure

backend/
├── agents/
│   ├── guide_agent/       # Personalized therapy agent
│   ├── intake_agent/      # Goal collection + contract generation
│   └── __init__.py
├── graph/                 # LangGraph StateGraph definitions
├── services/              # LiveKit, Deepgram, ElevenLabs, Memory
├── routers/               # FastAPI route handlers
├── models/                # Pydantic V2 schemas
├── memoryManager/         # Mem0 integration layer
├── middleware/             # Auth, CORS, rate limiting
├── config.py              # Environment + settings
└── main.py

frontend/
├── src/
│   ├── app/               # Next.js App Router
│   ├── components/        # React components (agent builder, chat, voice)
│   ├── lib/               # API client, hooks, utilities
│   └── types/             # TypeScript definitions
├── tailwind.config.js
└── package.json

Memory Architecture

Mem0 provides semantic memory with automatic embedding generation and vector similarity search:

Namespace Isolation — {tenant_id}:{agent_id}:{context} pattern ensures complete data separation between users and agents
Automatic Relevance — Agent queries Mem0 before each response to surface relevant prior context
Multi-Session Persistence — Memory persists across sessions, enabling continuity without explicit state management
Structured Storage — PostgreSQL (via Supabase) handles agent configs, thread history, and generated content alongside Mem0's semantic layer

License

MIT

Author

Pete Davidsmeier — AI Agent Architect, CTO at YBRYX Business Systems

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.cursor/plans		.cursor/plans
backend		backend
frontend		frontend
scripts		scripts
supabase		supabase
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NumenAI

Voice Pipeline

Architecture

How It Works

Tech Stack

Quick Start

Prerequisites

Setup

Configure

Run

API Endpoints

Design Decisions

Project Structure

Memory Architecture

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NumenAI

Voice Pipeline

Architecture

How It Works

Tech Stack

Quick Start

Prerequisites

Setup

Configure

Run

API Endpoints

Design Decisions

Project Structure

Memory Architecture

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages