TADA TTS Studio

A web studio for TADA TTS -- high-quality open-source text-to-speech with voice cloning. Clone any voice from a reference audio and generate natural speech in your browser.

Features

Voice cloning -- upload a reference audio (5-30s) to clone any voice
Two models -- TADA-1B (English, ~4GB) and TADA-3B (10 languages, ~9GB)
CPU/GPU support -- toggle between CPU and CUDA in the UI
Browser-based UI -- dark-themed single-page app served by Flask, no build step
Real-time download progress -- SSE streams progress when fetching models from HuggingFace
Breathing-block chunking -- long texts split into 150-200 char blocks for natural pacing
Async generation with abort -- chunked generation runs in background threads; cancel anytime
Audio enhancement -- LavaSR upscaling from 24kHz to 48kHz
Silence removal -- Silero VAD trims dead air with configurable threshold (0.2s-1.0s)
Loudness normalization -- ffmpeg loudnorm for consistent volume
Karaoke word highlighting -- stable-ts forced alignment with real-time word tracking
Text normalization -- expands numbers, currency, abbreviations, dates, symbols
Speed control -- adjustable from 0.5x to 2.0x (persisted)
MP3 export -- WAV to MP3 conversion with live progress
3-version player -- Original, Enhanced, and Cleaned audio
Standalone force alignment -- upload any audio + transcript for word-level timestamps
Unified library -- TTS and alignment history with filter tabs
Soft delete -- files moved to TRASH folder
Dark theme -- always-dark UI with navy/teal/coral palette

Quick Start

# 1. Setup (creates venv, installs dependencies)
setup.bat

# 2. Run (starts server, opens browser)
runner.bat

Or manually:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python backend.py

How It Works

Voice Cloning

Upload a reference audio file (WAV, MP3, FLAC, OGG) -- 5-30 seconds recommended
Optionally provide the transcript of the reference audio
The TADA encoder processes the reference to create a voice prompt
Use the voice profile for all future generations

Generation Pipeline

Step 1: Generate audio (TADA) -> WAV + JSON metadata
Step 2: Enhance (LavaSR) -> 48kHz upscaled WAV
Step 3: Clean (Silero VAD) -> silence removed
Step 4: Normalize (ffmpeg loudnorm) -> consistent volume
Step 5: Convert (ffmpeg) -> MP3

Each step is optional and runs in the background. Steps 2-5 chain automatically.

Project Structure

TadaTTS-Studio/
+-- backend.py          # Flask API server
+-- frontend/
|   +-- index.html      # Single-file UI (inline CSS/JS, Tailwind CDN)
+-- voices/             # Voice profiles (reference audio + metadata)
+-- generated_assets/   # All generated output
+-- models/             # (gitignored) cached model files
+-- logs/               # (gitignored) rotating log files
+-- bin/                # (optional) local ffmpeg binary

Models

TADA-1B -- ~2B parameters, English only, ~4GB

HuggingFace: HumeAI/tada-1b

TADA-3B Multilingual -- ~4B parameters, 10 languages, ~9GB

HuggingFace: HumeAI/tada-3b-ml
Languages: English, Arabic, Chinese, German, Spanish, French, Italian, Japanese, Polish, Portuguese

TADA Codec Encoder (shared, required for voice cloning)

HuggingFace: HumeAI/tada-codec

Models auto-download from HuggingFace on first use and are cached locally.

HuggingFace Setup (Required)

TADA requires a HuggingFace account and Llama 3.2 license acceptance:

Create a free account at https://huggingface.co
Accept the Llama 3.2 license at https://huggingface.co/meta-llama/Llama-3.2-1B
Create an access token at https://huggingface.co/settings/tokens (Read access)

Login via terminal:

venv\Scripts\python -c "from huggingface_hub import login; login(token='hf_YOUR_TOKEN')"

Model Cache Location

Models are stored in the HuggingFace cache directory:

OS	Default Path
Windows	`C:\Users\<USER>\.cache\huggingface\hub\`
Linux	`~/.cache/huggingface/hub/`
macOS	`~/.cache/huggingface/hub/`

Inside the cache, each model has its own folder:

huggingface/hub/
+-- models--HumeAI--tada-1b/          # TADA 1B (~4GB)
+-- models--HumeAI--tada-3b-ml/       # TADA 3B (~9GB)
+-- models--HumeAI--tada-codec/       # Encoder (~1GB)

To change the cache location, set the HF_HOME environment variable before starting the server:

set HF_HOME=D:\models\huggingface

Manual Download (Faster)

If auto-download is slow, use aria2 for 16x parallel downloads:

# Download TADA-1B
aria2c -x 16 -s 16 -d "C:\Users\<USER>\.cache\huggingface\hub\models--HumeAI--tada-1b\snapshots\<hash>\" https://huggingface.co/HumeAI/tada-1b/resolve/main/model.safetensors

# Download TADA-3B (two parts)
aria2c -x 16 -s 16 https://huggingface.co/HumeAI/tada-3b-ml/resolve/main/model-00001-of-00002.safetensors
aria2c -x 16 -s 16 https://huggingface.co/HumeAI/tada-3b-ml/resolve/main/model-00002-of-00002.safetensors

Or download directly from the browser:

TADA-1B: https://huggingface.co/HumeAI/tada-1b/tree/main
TADA-3B: https://huggingface.co/HumeAI/tada-3b-ml/tree/main

Dependencies

hume-tada (pulls torch, torchaudio, transformers)
flask, flask-cors
loguru
openai-whisper, stable-ts (word alignment)
LavaSR (audio enhancement)
num2words (text normalization)

Credits

Built on TADA TTS by Hume AI.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
frontend		frontend
sample voices/en		sample voices/en
voices/voice-preview-veda-sky-cozy-late-night_20260316_212705		voices/voice-preview-veda-sky-cozy-late-night_20260316_212705
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
PLAN.md		PLAN.md
README.md		README.md
backend.py		backend.py
download-3b.bat		download-3b.bat
main.py		main.py
requirements.txt		requirements.txt
runner.bat		runner.bat
setup.bat		setup.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TADA TTS Studio

Features

Quick Start

How It Works

Voice Cloning

Generation Pipeline

Project Structure

Models

HuggingFace Setup (Required)

Model Cache Location

Manual Download (Faster)

Dependencies

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TADA TTS Studio

Features

Quick Start

How It Works

Voice Cloning

Generation Pipeline

Project Structure

Models

HuggingFace Setup (Required)

Model Cache Location

Manual Download (Faster)

Dependencies

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages