Small Python project to batch-generate Russian "объяснительные" voice notes using the ElevenLabs HTTP API.
- Input:
scripts/samples.jsonorscripts/samples.txt - Output:
out/*.mp3 - Optional conversion:
out_opus/*.ogg(Opus viaffmpeg)
- Python 3.9+
- ElevenLabs API key
- Internet access
- Optional:
ffmpegfor Opus conversion
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
Copy-Item .env.example .envThen open .env and set ELEVENLABS_API_KEY.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .envThen edit .env and set ELEVENLABS_API_KEY.
python list_voices.pyThis prints each voice name and voice_id so you can pick a suitable male voice.
Your account list may show mostly default US/UK voices. To search the shared Voice Library:
python list_voices.py --shared --language ru --gender male --page-size 20This prints both voice_id and owner (public_owner_id).
Add one shared voice to your account:
python list_voices.py --add-owner-id <PUBLIC_OWNER_ID> --add-voice-id <VOICE_ID> --new-name "RU Male 1"Then verify it appears in your account:
python list_voices.pyImportant:
- Adding shared voices via API requires an API key with
voices_writepermission. - Using Voice Library voices via API requires a paid ElevenLabs plan (free plan returns HTTP 402 for library voices).
python tts_batch.py --voice-id <VOICE_ID>If ELEVENLABS_VOICE_ID is set in .env, --voice-id is optional.
Useful options:
python tts_batch.py \
--in scripts/samples.json \
--out out \
--format mp3_44100_128 \
--model eleven_multilingual_v2 \
--stability 0.5 \
--similarity 0.8 \
--style 0.0 \
--speaker-boost \
--sleep-ms 0Notes:
--insupports:- JSON:
{"scripts": ["...", "..."]} - TXT: blocks separated by blank lines
- JSON:
- Filenames are generated as:
{index:02d}_{voice}_{slug}.mp3
python convert_to_opus.pyThis scans out/*.mp3 and creates out_opus/*.ogg via:
ffmpeg -y -i input.mp3 -c:a libopus -b:a 24k -vbr on output.ogg
If ffmpeg is missing, the script prints installation guidance and exits.
A static webpage is included at web/index.html that mimics the original picture style and renders a downsampled waveform from an MP3.
Default audio source:
out/compare_liam_best/01_liam_script.mp3
Run a local server from project root:
python -m http.server 8000Open:
http://localhost:8000/web/
Optional: use a different MP3 file via query parameter:
http://localhost:8000/web/?audio=../out/free_chris/01_script.mp3
Use the helper script to generate and score candidate texts against a target pattern:
- quiet intro: ~2-3s
- loud explanation: ~10-12s
- short bridge/pause phrase: ~2s
- calmer ending/apology: ~3-4s
Run:
python match_waveform_phrase.py --voice-id TX3LPaxmHKxFdv7VOQHJ --in scripts/waveform_candidates.txt --out out/wave_match --target-seconds 20Outputs:
- Generated candidates:
out/wave_match/*.mp3 - Ranking report:
out/wave_match/match_report.json - Best text:
scripts/one_example_best.txt
Do not imitate real people without permission. Prefer generic/synthetic voices for parody, fiction, and harmless experimentation.