A web studio for TADA TTS -- high-quality open-source text-to-speech with voice cloning. Clone any voice from a reference audio and generate natural speech in your browser.
- Voice cloning -- upload a reference audio (5-30s) to clone any voice
- Two models -- TADA-1B (English, ~4GB) and TADA-3B (10 languages, ~9GB)
- CPU/GPU support -- toggle between CPU and CUDA in the UI
- Browser-based UI -- dark-themed single-page app served by Flask, no build step
- Real-time download progress -- SSE streams progress when fetching models from HuggingFace
- Breathing-block chunking -- long texts split into 150-200 char blocks for natural pacing
- Async generation with abort -- chunked generation runs in background threads; cancel anytime
- Audio enhancement -- LavaSR upscaling from 24kHz to 48kHz
- Silence removal -- Silero VAD trims dead air with configurable threshold (0.2s-1.0s)
- Loudness normalization -- ffmpeg loudnorm for consistent volume
- Karaoke word highlighting -- stable-ts forced alignment with real-time word tracking
- Text normalization -- expands numbers, currency, abbreviations, dates, symbols
- Speed control -- adjustable from 0.5x to 2.0x (persisted)
- MP3 export -- WAV to MP3 conversion with live progress
- 3-version player -- Original, Enhanced, and Cleaned audio
- Standalone force alignment -- upload any audio + transcript for word-level timestamps
- Unified library -- TTS and alignment history with filter tabs
- Soft delete -- files moved to TRASH folder
- Dark theme -- always-dark UI with navy/teal/coral palette
# 1. Setup (creates venv, installs dependencies)
setup.bat
# 2. Run (starts server, opens browser)
runner.batOr manually:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python backend.py- Upload a reference audio file (WAV, MP3, FLAC, OGG) -- 5-30 seconds recommended
- Optionally provide the transcript of the reference audio
- The TADA encoder processes the reference to create a voice prompt
- Use the voice profile for all future generations
Step 1: Generate audio (TADA) -> WAV + JSON metadata
Step 2: Enhance (LavaSR) -> 48kHz upscaled WAV
Step 3: Clean (Silero VAD) -> silence removed
Step 4: Normalize (ffmpeg loudnorm) -> consistent volume
Step 5: Convert (ffmpeg) -> MP3
Each step is optional and runs in the background. Steps 2-5 chain automatically.
TadaTTS-Studio/
+-- backend.py # Flask API server
+-- frontend/
| +-- index.html # Single-file UI (inline CSS/JS, Tailwind CDN)
+-- voices/ # Voice profiles (reference audio + metadata)
+-- generated_assets/ # All generated output
+-- models/ # (gitignored) cached model files
+-- logs/ # (gitignored) rotating log files
+-- bin/ # (optional) local ffmpeg binary
TADA-1B -- ~2B parameters, English only, ~4GB
- HuggingFace:
HumeAI/tada-1b
TADA-3B Multilingual -- ~4B parameters, 10 languages, ~9GB
- HuggingFace:
HumeAI/tada-3b-ml - Languages: English, Arabic, Chinese, German, Spanish, French, Italian, Japanese, Polish, Portuguese
TADA Codec Encoder (shared, required for voice cloning)
- HuggingFace:
HumeAI/tada-codec
Models auto-download from HuggingFace on first use and are cached locally.
TADA requires a HuggingFace account and Llama 3.2 license acceptance:
- Create a free account at https://huggingface.co
- Accept the Llama 3.2 license at https://huggingface.co/meta-llama/Llama-3.2-1B
- Create an access token at https://huggingface.co/settings/tokens (Read access)
- Login via terminal:
venv\Scripts\python -c "from huggingface_hub import login; login(token='hf_YOUR_TOKEN')"
Models are stored in the HuggingFace cache directory:
| OS | Default Path |
|---|---|
| Windows | C:\Users\<USER>\.cache\huggingface\hub\ |
| Linux | ~/.cache/huggingface/hub/ |
| macOS | ~/.cache/huggingface/hub/ |
Inside the cache, each model has its own folder:
huggingface/hub/
+-- models--HumeAI--tada-1b/ # TADA 1B (~4GB)
+-- models--HumeAI--tada-3b-ml/ # TADA 3B (~9GB)
+-- models--HumeAI--tada-codec/ # Encoder (~1GB)
To change the cache location, set the HF_HOME environment variable before starting the server:
set HF_HOME=D:\models\huggingfaceIf auto-download is slow, use aria2 for 16x parallel downloads:
# Download TADA-1B
aria2c -x 16 -s 16 -d "C:\Users\<USER>\.cache\huggingface\hub\models--HumeAI--tada-1b\snapshots\<hash>\" https://huggingface.co/HumeAI/tada-1b/resolve/main/model.safetensors
# Download TADA-3B (two parts)
aria2c -x 16 -s 16 https://huggingface.co/HumeAI/tada-3b-ml/resolve/main/model-00001-of-00002.safetensors
aria2c -x 16 -s 16 https://huggingface.co/HumeAI/tada-3b-ml/resolve/main/model-00002-of-00002.safetensorsOr download directly from the browser:
- TADA-1B: https://huggingface.co/HumeAI/tada-1b/tree/main
- TADA-3B: https://huggingface.co/HumeAI/tada-3b-ml/tree/main
hume-tada (pulls torch, torchaudio, transformers)
flask, flask-cors
loguru
openai-whisper, stable-ts (word alignment)
LavaSR (audio enhancement)
num2words (text normalization)
Built on TADA TTS by Hume AI.