Script to Voice Generator — Kokoro TTS

Welcome to any script writing author out there!

Convert formatted script files into fully voiced audio using Kokoro ONNX — a high-quality, local, fully offline text-to-speech engine. No API keys, no cloud accounts, no usage limits. Everything runs on your machine.

Get it here: https://reactorcore.itch.io/script-to-voice-generator-kokoro-tts

Made by Reactorcore — https://linktr.ee/reactorcore

What It Does

Script to Voice Generator reads a formatted .txt or .md script file and:

Converts each dialogue line to speech using Kokoro TTS (49 built-in voices across 8 languages, plus custom blends).
Saves individual clips for each line — both clean (TTS only) and effects-processed.
Merges all clips into a single audio file, with smart pauses based on punctuation.
Produces both a raw merge and a loudness-normalized merge.
Generates a reference sheet listing every clip filename and its spoken text.

Multiple speakers are supported. Each speaker gets their own voice, pitch, speed, and audio effects settings, stored in character_profiles.json so they're remembered between sessions.

System Requirements

Windows 11 — built and tested on Windows 11.
Windows 10 — untested, use at your own risk.
Linux / macOS — the compiled .exe is Windows-only. No Linux or macOS build is available.
GPU — uses DirectML, which automatically uses your GPU if available and falls back to CPU if not. No setup required either way.

What You Need

Kokoro model files — Required for TTS generation.

The app expects these two files in a kokoro_tts/ folder next to the program:

kokoro_tts/
├── kokoro-v1.0.fp16.onnx
└── voices-v1.0.bin

Download links are available on the project's release page or from the Kokoro ONNX project.

FFMPEG — Required for audio effects and merging.

Automatic installer (recommended): https://reactorcore.itch.io/ffmpeg-to-path-installer
Manual install: https://ffmpeg.org/download.html — add to system PATH after installing.

Python 3.x — Required to run from source (not needed if using the compiled .exe). Use build_exe.bat to build to a single .exe in one click.

Quick Start

1. Write or prepare a script file

Scripts are .txt or .md files. Each spoken line uses the format:

SpeakerID: Dialogue text goes here.

Example:

# My Short Film

Alex: Hey, are you okay?
Jordan: Yeah, I'm fine. [sighs] Just tired.
(1.0s)
Alex: You sure? You look pale.
Jordan: I said I'm fine.

See Script Format below for full syntax details.

2. Load the script (Tab 1)

Launch the program and click Open Script File.
The parser checks for formatting errors and lists them in the log.
Fix any errors in your text editor and click Reload Script.
When the parse log shows no errors, click Continue →.

3. Configure voices (Tab 2)

Each detected speaker gets a panel with:

Voice — Choose from 49 built-in Kokoro voices across 8 languages (English US/UK, Chinese, Spanish, French, Hindi, Italian, Portuguese), plus any custom blended voices you've created in Tab 5. Custom blends appear with a ★ prefix.
Speed — Speaking rate from 0.5× to 2.0× (Kokoro native speed parameter).
Pitch — Multiplier from ×0.5 to ×2.0 (FFMPEG rubberband pitch shift). Default ×1.0 = no shift.
Level — 5–100% relative volume. 100% = full normalized output (default). Reduce to make a speaker quieter in the mix.
Yell Impact — Slows down single-word exclamatory lines (e.g. YES!). Makes such lines sound more deliberate and impactful. Set per speaker.
Audio Effects — Radio, Reverb, Distortion, Telephone, Robot Voice, Cheap Mic, Underwater, Megaphone, Worn Tape, Intercom, Alien Voice, Cave, and Pitch Shift. Most effects have Off / Mild / Medium / Strong levels.

Use Test Voice to generate a quick preview clip and hear the settings immediately.

Settings auto-save to character_profiles.json on every change, so known speakers are recalled automatically next session.

4. Generate (Tab 3)

Enter a Project Name (used as a filename prefix, 20 chars max).
Choose an Output Folder.
Click Generate All and confirm.

The generation log shows progress. When done, all files appear in the output folder:

output_folder/
├── clips_clean/          ← Raw TTS clips (no FFMPEG effects)
│   └── project_0001_Speaker_line-text.mp3
├── clips_effect/         ← Effects-processed clips (peak-normalized)
│   └── project_0001_Speaker_line-text.mp3
├── sfx/                  ← Processed SFX copies (only if SFX effects active)
├── !project_merged_pure.mp3        ← Merged audio, no normalization
├── !project_merged_loudnorm.mp3    ← Merged audio, loudness-normalized
└── project_reference.txt           ← Line-by-line reference sheet

Voice Blender (Tab 5)

Tab 5 lets you create custom voices by blending two or three Kokoro base voices together.

Pick Voice A and Voice B (and optionally Voice C).
Set the blend ratio using the sliders (e.g. 70% A / 30% B).
Click Test Blend to hear a preview.
Enter a name and click Save Blend.

Saved blends appear in Tab 2 voice dropdowns with a ★ prefix, ready to use like any built-in voice.

Multilingual Voices

Kokoro v1.0 includes voices in 8 languages. All are available in the Tab 2 voice dropdown — no extra model files needed.

Language	Voice count
American English	20
British English	8
Mandarin Chinese	8
Spanish	3
French	1
Hindi	4
Italian	2
Brazilian Portuguese	3

The correct language phonemization is selected automatically based on the voice — no settings to change.

Script Format

Dialogue lines

SpeakerID: Spoken text goes here.

SpeakerID must be 20 characters or fewer. Allowed: letters, numbers, spaces, hyphens, underscores.
All text after the first colon is spoken. Additional colons in the line are fine.
Lines over 500 characters throw a parse error.

Headings

# Scene title
## Sub-scene

Treated as metadata. Sets the script title. Not voiced.

Comments

// This is a comment
/* Multi-line
   comment */

Not voiced. Useful for stage directions, notes, or commented-out listener responses.

Pauses

(1.5s)
(pause 2.0)
(0.8)

Any line that is only parentheses containing a number inserts a silent pause in the merged audio. The number is in seconds.

Sound effects

{play filename.mp3, c1, loop}
{stop c1}
{stop all}
{play explosion.wav, c2, once}

Sound effect events are placed in the merge timeline at the correct position. Sound effect files must exist in the SFX folder specified in Tab 2.

Note: If a sound effect is the very last item in your script, it needs a pause after it to actually be heard in the merged audio. Add a (pause) line equal to or longer than the sound effect's duration immediately after the {play} line. Without it, the base audio ends at the same moment the SFX starts, and the SFX gets cut off.

Like this:

Rei: Signing off.
{play cloth.wav, c1, once}
(2.0s)

Supported formats — Any audio format FFMPEG can read: .mp3, .wav, .ogg, .flac, .aac, .m4a, and others. The filename in your script must match the actual file exactly (including extension).

Inner thoughts

SpeakerID: (( This line is an inner thought. ))

Wrapping dialogue in double parentheses marks it as an inner thought. Inner thought lines are voiced with a special filtering effect configured in Tab 4 (Dissociated, Whisper, or Dreamlike presets, or Custom). The filter runs on top of all the speaker's regular effects.

Inline notation

[brackets] on a dialogue line are stripped before TTS — use for performance notes, sound effect cues, or character direction for human voice actors.
**bold**, _italic_, and ~~strikethrough~~ markers are stripped before TTS (Kokoro does not use SSML emphasis).
// after dialogue text starts an inline comment; everything after it is stripped. A space before // is required (so URLs are not accidentally stripped).

Settings Tab (Tab 4)

Silence Trim — Controls how leading/trailing silence is removed from each TTS clip. Default: trim beginning and end. Options: Off, Beginning only, End only, Beginning + End, All silence.

Merged Audio Pauses — Adjust the pause duration added after each punctuation type (period, comma, exclamation, question, hyphen, ellipsis, etc.).

Contextual Modifiers — Fine-tune how pause lengths are modified by context: speaker changes, short lines, long lines, inner thought padding, same-speaker reduction, first/last line padding.

Inner Thoughts Effect — Choose from Whisper, Dreamlike, Dissociated presets or configure custom highpass/lowpass/echo parameters for the inner thought audio filter.

Audio Effects Reference

Effect	Description
Radio Filter	Walkie-talkie / comms radio effect. Bandpass + phaser + compression.
Reverb	Spatial depth. Configurable echo chains.
Distortion	Aggressive, gritty clipping and bit crushing.
Telephone	Lo-fi compressed sound. Narrow bandpass + bit crushing.
Robot Voice	Ring modulator for mechanical / robotic character.
Cheap Mic	Degraded quality, poor recording simulation.
Underwater	Muffled, wet, submerged sound. Lowpass + flanger.
Megaphone	Projected bullhorn. Treble-boosted, punchy, bandpassed.
Worn Tape	VHS/cassette degradation. Wow-flutter, lo-fi analog warble.
Intercom	Hallway speaker box. Flat, compressed, confined. Adds crackling static noise.
Alien Voice	Non-human vocal quality. Three variants: Insectoid, Dimensional, Warble.
Cave	Physical stone space reverb. Three variants: Tunnel, Cave, Abyss.
Pitch Shift	FFMPEG rubberband pitch shift. Multiplier ×0.5–×2.0. Works independently of speed.

Most effects have Off / Mild / Medium / Strong presets. Alien and Cave use named variants instead. Effects are combinable.

Tips

49 voices out of the box — Kokoro includes voices in 8 languages (English US/UK, Chinese, Spanish, French, Hindi, Italian, Portuguese). Audition them with the Test Voice button before committing.
Custom voice blends — Tab 5's Voice Blender lets you interpolate between two or three voices. Subtle blends (70/30) typically sound more coherent than 50/50.
Pitch for pitch shifting — The pitch slider in Tab 2 uses FFMPEG rubberband pitch shifting. It is independent of speaking speed and works for all Kokoro voices.
Bold and italic do NOT affect TTS output — Kokoro doesn't read SSML. Remove emphasis formatting from your script if you added it from a previous Google TTS workflow.
Test each voice before generating everything. The Test Voice button in Tab 2 saves a preview clip and opens it immediately.
Cheap Mic at Mild is a subtle effect that adds a hint of realism to otherwise very clean TTS voices. Worth trying as a default.
Prompt templates — The !docs/prompt_templates/ folder has templates for using AI chatbots to write scripts or generate voice line banks. Open them in any text editor.

Included Docs (`!docs/`)

Guides

File	Contents
`!docs/guides/Script_Writing_Guide.md`	Writing for TTS, pacing with punctuation and pauses, using effects as character design, AI-assisted workflow
`!docs/guides/Audio_Effects_Guide.md`	Full reference for all effects, preset levels, FFMPEG pipeline, Yell Impact, troubleshooting

Example Scripts

Ready-to-load .md script files — open any of them in Tab 1 to see the format in action.

File	What it demonstrates
`example_tiny.md`	Minimal 2-line script
`example_small.md`	Short 2-character scene with SFX, pause, and comments
`example_full_drama.md`	Full multi-character drama with SFX channels, inner thoughts, and scene structure
`example_monologue.md`	Single narrator, no character interaction
`example_meditation.md`	Atmospheric piece with long pauses and inner thought lines
`example_oneliners.md`	Voice bank format — one character, many independent lines by category
`example_game_scenes.md`	Multi-scene game dialogue with tactical characters, SFX, and inner thoughts

Prompt Templates

Fill-in-the-blank prompts for generating scripts with an AI chatbot. Copy, fill in characters/scenario, paste to a chatbot, save the output as a .md file, load in Tab 1.

File	Use case
`cohesive_script.md`	Continuous scene — characters talk to each other
`separate_voice_lines.md`	Voice bank — independent lines per category
`game_scene_pack.md`	Single game scene with character roles, SFX, and inner thoughts
`narrator_monologue.md`	Single narrator — story, documentary, speech, essay
`podcast_interview.md`	Two-person host/guest conversation
`ambient_narration.md`	Slow, atmospheric, mood-driven spoken word

Troubleshooting

Kokoro model not found — Make sure kokoro_tts/kokoro-v1.0.fp16.onnx and kokoro_tts/voices-v1.0.bin exist alongside the program. The model loads at startup; wait for the status bar to confirm it's ready before generating.

FFMPEG not found — Install FFMPEG and make sure it is in your system PATH. Use the automatic installer at https://reactorcore.itch.io/ffmpeg-to-path-installer then restart the program.

Parse errors on load — The parse log in Tab 1 lists every error with line numbers. Fix them in your text editor and click Reload Script.

Voice too quiet — The post-effects normalization pass ensures consistent loudness. If a speaker still sounds quiet relative to others, their Level slider may be below 100%.

Missing voice lines in output — Check the generation log in Tab 3 for per-line errors. A missing voice assignment or an FFMPEG issue on a specific line will be noted.

Test Voice not opening — The file is saved to output_test/ in the program folder. Open it manually if the auto-open fails.

Bold/italic has no effect — This is expected. Kokoro TTS does not use SSML. Emphasis markers are stripped before TTS. Write naturally with punctuation and word choice for expressive delivery instead.

Credits

Kokoro ONNX — Local TTS engine
ttkbootstrap — Modern themed tkinter UI
FFMPEG — Audio processing and merging
Script to Voice Generator — By Reactorcore

Links

Check out everything else I do: ⭐

https://linktr.ee/reactorcore

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
!docs		!docs
.gitattributes		.gitattributes
CLAUDE.md		CLAUDE.md
Promo.md		Promo.md
README.md		README.md
app.py		app.py
audio_generator.py		audio_generator.py
audio_merger.py		audio_merger.py
build_exe.bat		build_exe.bat
character_profiles.py		character_profiles.py
config.py		config.py
config_manager.py		config_manager.py
data_models.py		data_models.py
file_manager.py		file_manager.py
gui.py		gui.py
gui_generation.py		gui_generation.py
gui_handlers.py		gui_handlers.py
gui_tab1.py		gui_tab1.py
gui_tab2.py		gui_tab2.py
gui_tab2_state.py		gui_tab2_state.py
gui_tab3.py		gui_tab3.py
gui_tab4.py		gui_tab4.py
gui_tab5.py		gui_tab5.py
gui_theme.py		gui_theme.py
reference_writer.py		reference_writer.py
requirements.txt		requirements.txt
script_parser.py		script_parser.py
script_to_voice.ico		script_to_voice.ico
voice_blender.py		voice_blender.py

Folders and files

Latest commit

History

Repository files navigation

Script to Voice Generator — Kokoro TTS

What It Does

System Requirements

What You Need

Quick Start

1. Write or prepare a script file

2. Load the script (Tab 1)

3. Configure voices (Tab 2)

4. Generate (Tab 3)

Voice Blender (Tab 5)

Multilingual Voices

Script Format

Dialogue lines

Headings

Comments

Pauses

Sound effects

Inner thoughts

Inline notation

Settings Tab (Tab 4)

Audio Effects Reference

Tips

Included Docs (!docs/)

Guides

Example Scripts

Prompt Templates

Troubleshooting

Credits

Links

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Included Docs (`!docs/`)

Packages