GitHub - HimathX/Sonance: Cognitive DJ agent mapping subjective intent to acoustic vectors. High-fidelity voice-to-playback via WebRTC and Superlinked.

A real-time, voice-powered AI DJ agent that listens, thinks, and plays the perfect track — instantly.

Overview

Sonance is a real-time AI DJ that you talk to naturally. Say "play something chill for a rainy evening" and Sonance translates that mood into precise acoustic parameters, runs a hybrid semantic + acoustic search across a 10k-track vector database, and starts playing — all within sub-second latency.

It combines live WebRTC voice capture, ultra-fast LLM reasoning (Groq), hybrid vector search (Superlinked + Qdrant), and DJ-quality text-to-speech (ElevenLabs) into a seamless, conversational music experience.

Key Features

🎙️ Real-time voice interaction — zero-latency WebRTC audio pipeline with VAD and Deepgram Nova-2 STT
🧠 Mood-aware LLM reasoning — a LangChain ReAct agent powered by Groq maps abstract descriptions to acoustic dimensions
🔍 Hybrid vector search — Superlinked fuses semantic lyric embeddings with numeric acoustic spaces (valence, energy, tempo, instrumentalness)
🎵 Seamless music playback — YouTube IFrame API plays tracks in-browser with no Spotify auth required
📋 5-track smart queue — auto-advances through a curated queue with skip-forward, skip-back, and history
🎭 Multiple DJ personas — choose between DJ, Tara, Leo, Zoe, and Mia, each with a distinct style and ElevenLabs voice
📊 Full observability — every interaction traced end-to-end with Opik

Architecture

System Overview

The full pipeline spans four distinct layers. The browser captures voice, pipes it over WebRTC, the backend transcribes → reasons → searches → speaks, and sends a playback command back to the frontend and plays music all in under a second.

Sensory Layer — Voice Input

The browser captures the microphone stream via the WebRTC API. FastRTC handles audio capture and VAD (Voice Activity Detection). When the user finishes speaking, the audio buffer is sent to Deepgram Nova-2 for transcription in under 300 ms.

Cognitive Layer — Agent Reasoning

The transcript is passed to a LangChain ReAct agent running on Groq (Llama 3 / Mixtral). The agent decides whether to call music_discovery_tool or pause_music_tool, then generates a natural-language DJ response in the voice of the selected persona.

Retrieval Layer — Hybrid Music Search

music_discovery_tool calls Superlinked with a natural language query and optional acoustic targets. Superlinked encodes both a semantic lyric vector and numeric acoustic vectors, fuses them in Qdrant, filters out already-played tracks, and returns the top-K results.

Executive Layer — Playback & Response

The LLM final response triggers two parallel streams: ElevenLabs TTS synthesises the DJ voice and streams audio chunks back over the WebRTC audio track; simultaneously, a JSON control_command is sent via the FastRTC data channel, triggering the hidden YouTube IFrame Player in the browser. When a track ends, the queue auto-advances.

Tech Stack

Component	Technology	Role
Frontend	React 18 + Vite + TypeScript	3-column DJ dashboard
WebRTC Transport	FastRTC	Sub-second audio streaming (browser ↔ backend)
Speech-to-Text	Deepgram Nova-2 (via Groq)	Transcription < 300 ms
LLM Reasoning	Groq · Llama 3 / Mixtral	ReAct agent — mood → acoustic parameters
Vector Search	Superlinked + Qdrant Cloud	Hybrid semantic + acoustic music retrieval
Text-to-Speech	ElevenLabs / Orpheus	DJ persona voices, streamed as WebRTC audio
Music Playback	YouTube IFrame API	In-browser audio; no Spotify auth required
Lyrics Data	LyricsGenius	Semantic lyric embeddings for search
Observability	Opik	Full trace of every agent interaction
API Framework	FastAPI	REST + WebRTC signalling server

Getting Started

Prerequisites

Python 3.11+ and uv package manager
Node.js 18+ and npm
A Qdrant Cloud cluster (free tier is enough to start)
A Groq API key
A Genius API access token (for lyrics ingestion)
An ElevenLabs API key (for DJ voice)

Installation

# 1. Clone the repository
git clone https://github.com/your-org/sonance
cd sonance

# 2. Install backend dependencies
uv sync

# 3. Copy and configure environment variables
cp .env.example .env
# → Open .env and fill in your API keys (see Configuration below)

# 4. Install frontend dependencies
cd frontend && npm install

Configuration

Create a .env file at the project root based on .env.example:

Variable	Required	Description
`GROQ__API_KEY`	✅	Groq API key for LLM reasoning and STT
`QDRANT__CLUSTER_URL`	✅	Your Qdrant Cloud cluster URL
`QDRANT__API_KEY`	✅	Your Qdrant Cloud API key
`GENIUS__ACCESS_TOKEN`	✅	Genius API token for lyrics ingestion
`ELEVENLABS__API_KEY`	✅	ElevenLabs API key for DJ voice TTS
`SPOTIFY__CLIENT_ID`	⚪	Spotify app client ID (metadata only)
`SPOTIFY__CLIENT_SECRET`	⚪	Spotify app client secret (metadata only)
`HF_HOME`	⚪	HuggingFace cache dir — point to a drive with ≥ 10 GB

Note

Spotify credentials are only needed during data ingestion to fetch track metadata and audio features. They are not needed to run the agent or play music.

Data Ingestion

Before running the agent for the first time, populate your Qdrant cluster with the music index. The ingestion pipeline fetches tracks from curated Spotify playlists, downloads lyrics via Genius, computes Superlinked embeddings, and upserts everything into Qdrant.

Important

Run ingestion once before starting the agent. Start with --limit 5 to verify your API keys and Qdrant connectivity before a full run.

# Quick smoke-test: 5 tracks per playlist
uv run python src/sonance_agents/infrastructure/ingest.py --limit 5

# Full ingestion: ~100 tracks per playlist across 4 playlists (~400 tracks total)
uv run python src/sonance_agents/infrastructure/ingest.py --limit 100

What the pipeline does:

Authenticates with Spotify via the Client Credentials flow (no browser popup)
Fetches track metadata and audio features in bulk
Downloads lyrics from Genius for each track
Saves the raw dataset to data/seed_tracks.csv
Upserts all tracks into Qdrant via Superlinked (semantic + acoustic vectors)

Note

The first run downloads sentence-transformers/all-MiniLM-L6-v2 (~90 MB). Set HF_HOME to a drive with sufficient free space.

Running Sonance

Open two terminals from the project root:

Terminal 1 — Backend

uv run fastapi dev src/sonance_agents/api/main.py

The API is available at http://localhost:8000. The WebRTC signalling endpoint lives at http://localhost:8000/webrtc/offer.

Terminal 2 — Frontend

cd frontend
npm run dev

Open http://localhost:5173 in your browser. Click the voice button, start talking, and let your DJ take over.

DJ Personas

Select your DJ from the animated avatar picker in the left panel. Each persona has a distinct personality and voice:

Avatar	Personality	Voice Style
DJ	Charismatic all-rounder	Energetic, confident
Tara	Warm & soulful	Smooth, expressive
Leo	Hype & energetic	Punchy, upbeat
Zoe	Chill & lo-fi	Laid-back, mellow
Mia	Romantic & dreamy	Soft, atmospheric

Orb State Colours

The 3D orb in the center panel reflects the current agent state:

State	Colour
Idle	Indigo / Blue-violet
Listening	Cyan / Sky blue
Thinking	Amber / Yellow
Talking	Purple / Violet

Project Structure

sonance/
├── frontend/                          ← React + Vite + TypeScript dashboard
│   └── src/
│       ├── App.tsx                    ← 3-column DJ dashboard (chat · orb · now playing)
│       ├── App.css                    ← Futuristic dark theme with ambient blobs
│       ├── hooks/
│       │   ├── useWebRTC.ts           ← WebRTC connection, data channel, mic control
│       │   └── useYouTubePlayer.ts    ← YouTube IFrame player, queue, skip logic
│       └── components/ui/             ← Orb, VoiceButton, LiveWaveform, AnimatedTooltip…
│
├── src/sonance_agents/
│   ├── agent/
│   │   ├── fastrtc_agent.py           ← STT → ReAct Agent → TTS + control command dispatch
│   │   ├── stream.py                  ← VoiceAgentStream & make_control_command helpers
│   │   └── tools/
│   │       └── music_discovery.py     ← MusicDiscoveryTool: hybrid semantic + acoustic search
│   ├── avatars/
│   │   ├── base.py                    ← DJ system prompt template
│   │   └── definitions/               ← YAML persona configs (dj, tara, leo, zoe, mia)
│   ├── infrastructure/
│   │   ├── ingest.py                  ← Data ingestion pipeline (Spotify + Genius → Qdrant)
│   │   └── superlinked_integration/
│   │       ├── schema.py              ← Track schema
│   │       ├── index.py               ← Superlinked index definition
│   │       ├── query.py               ← Hybrid search query builder
│   │       └── service.py             ← MusicSearchService (query execution)
│   ├── api/
│   │   └── main.py                    ← FastAPI app: WebRTC signalling + REST endpoints
│   └── config.py                      ← Pydantic-settings (all env vars)
│
├── static/
│   ├── Sonance.png                    ← Hero banner
│   └── diagrams/                      ← Architecture diagrams
├── data/                              ← seed_tracks.csv (generated after ingestion)
└── pyproject.toml

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
docs		docs
frontend		frontend
notebooks		notebooks
scripts		scripts
src/sonance_agents		src/sonance_agents
static		static
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
FULL_ARCHITECTURE.md		FULL_ARCHITECTURE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

Key Features

Architecture

System Overview

Sensory Layer — Voice Input

Cognitive Layer — Agent Reasoning

Retrieval Layer — Hybrid Music Search

Executive Layer — Playback & Response

Tech Stack

Getting Started

Prerequisites

Installation

Configuration

Data Ingestion

Running Sonance

DJ Personas

Orb State Colours

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Key Features

Architecture

System Overview

Sensory Layer — Voice Input

Cognitive Layer — Agent Reasoning

Retrieval Layer — Hybrid Music Search

Executive Layer — Playback & Response

Tech Stack

Getting Started

Prerequisites

Installation

Configuration

Data Ingestion

Running Sonance

DJ Personas

Orb State Colours

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages