Welcome to any script writing author out there!
Convert formatted script files into fully voiced audio using Kokoro ONNX — a high-quality, local, fully offline text-to-speech engine. No API keys, no cloud accounts, no usage limits. Everything runs on your machine.
Get it here: https://reactorcore.itch.io/script-to-voice-generator-kokoro-tts
Made by Reactorcore — https://linktr.ee/reactorcore
Script to Voice Generator reads a formatted .txt or .md script file and:
- Converts each dialogue line to speech using Kokoro TTS (49 built-in voices across 8 languages, plus custom blends).
- Saves individual clips for each line — both clean (TTS only) and effects-processed.
- Merges all clips into a single audio file, with smart pauses based on punctuation.
- Produces both a raw merge and a loudness-normalized merge.
- Generates a reference sheet listing every clip filename and its spoken text.
Multiple speakers are supported. Each speaker gets their own voice, pitch, speed,
and audio effects settings, stored in character_profiles.json so they're remembered
between sessions.
- Windows 11 — built and tested on Windows 11.
- Windows 10 — untested, use at your own risk.
- Linux / macOS — the compiled
.exeis Windows-only. No Linux or macOS build is available. - GPU — uses DirectML, which automatically uses your GPU if available and falls back to CPU if not. No setup required either way.
Kokoro model files — Required for TTS generation.
The app expects these two files in a kokoro_tts/ folder next to the program:
kokoro_tts/
├── kokoro-v1.0.fp16.onnx
└── voices-v1.0.bin
Download links are available on the project's release page or from the Kokoro ONNX project.
FFMPEG — Required for audio effects and merging.
- Automatic installer (recommended): https://reactorcore.itch.io/ffmpeg-to-path-installer
- Manual install: https://ffmpeg.org/download.html — add to system PATH after installing.
Python 3.x — Required to run from source (not needed if using the compiled .exe).
Use build_exe.bat to build to a single .exe in one click.
Scripts are .txt or .md files. Each spoken line uses the format:
SpeakerID: Dialogue text goes here.
Example:
# My Short Film
Alex: Hey, are you okay?
Jordan: Yeah, I'm fine. [sighs] Just tired.
(1.0s)
Alex: You sure? You look pale.
Jordan: I said I'm fine.
See Script Format below for full syntax details.
- Launch the program and click Open Script File.
- The parser checks for formatting errors and lists them in the log.
- Fix any errors in your text editor and click Reload Script.
- When the parse log shows no errors, click Continue →.
Each detected speaker gets a panel with:
- Voice — Choose from 49 built-in Kokoro voices across 8 languages (English US/UK, Chinese, Spanish, French, Hindi, Italian, Portuguese), plus any custom blended voices you've created in Tab 5. Custom blends appear with a ★ prefix.
- Speed — Speaking rate from 0.5× to 2.0× (Kokoro native speed parameter).
- Pitch — Multiplier from ×0.5 to ×2.0 (FFMPEG rubberband pitch shift). Default ×1.0 = no shift.
- Level — 5–100% relative volume. 100% = full normalized output (default). Reduce to make a speaker quieter in the mix.
- Yell Impact — Slows down single-word exclamatory lines (e.g.
YES!). Makes such lines sound more deliberate and impactful. Set per speaker. - Audio Effects — Radio, Reverb, Distortion, Telephone, Robot Voice, Cheap Mic, Underwater, Megaphone, Worn Tape, Intercom, Alien Voice, Cave, and Pitch Shift. Most effects have Off / Mild / Medium / Strong levels.
Use Test Voice to generate a quick preview clip and hear the settings immediately.
Settings auto-save to character_profiles.json on every change, so known speakers
are recalled automatically next session.
- Enter a Project Name (used as a filename prefix, 20 chars max).
- Choose an Output Folder.
- Click Generate All and confirm.
The generation log shows progress. When done, all files appear in the output folder:
output_folder/
├── clips_clean/ ← Raw TTS clips (no FFMPEG effects)
│ └── project_0001_Speaker_line-text.mp3
├── clips_effect/ ← Effects-processed clips (peak-normalized)
│ └── project_0001_Speaker_line-text.mp3
├── sfx/ ← Processed SFX copies (only if SFX effects active)
├── !project_merged_pure.mp3 ← Merged audio, no normalization
├── !project_merged_loudnorm.mp3 ← Merged audio, loudness-normalized
└── project_reference.txt ← Line-by-line reference sheet
Tab 5 lets you create custom voices by blending two or three Kokoro base voices together.
- Pick Voice A and Voice B (and optionally Voice C).
- Set the blend ratio using the sliders (e.g. 70% A / 30% B).
- Click Test Blend to hear a preview.
- Enter a name and click Save Blend.
Saved blends appear in Tab 2 voice dropdowns with a ★ prefix, ready to use like any built-in voice.
Kokoro v1.0 includes voices in 8 languages. All are available in the Tab 2 voice dropdown — no extra model files needed.
| Language | Voice count |
|---|---|
| American English | 20 |
| British English | 8 |
| Mandarin Chinese | 8 |
| Spanish | 3 |
| French | 1 |
| Hindi | 4 |
| Italian | 2 |
| Brazilian Portuguese | 3 |
The correct language phonemization is selected automatically based on the voice — no settings to change.
SpeakerID: Spoken text goes here.
- SpeakerID must be 20 characters or fewer. Allowed: letters, numbers, spaces, hyphens, underscores.
- All text after the first colon is spoken. Additional colons in the line are fine.
- Lines over 500 characters throw a parse error.
# Scene title
## Sub-scene
Treated as metadata. Sets the script title. Not voiced.
// This is a comment
/* Multi-line
comment */
Not voiced. Useful for stage directions, notes, or commented-out listener responses.
(1.5s)
(pause 2.0)
(0.8)
Any line that is only parentheses containing a number inserts a silent pause in the merged audio. The number is in seconds.
{play filename.mp3, c1, loop}
{stop c1}
{stop all}
{play explosion.wav, c2, once}
Sound effect events are placed in the merge timeline at the correct position. Sound effect files must exist in the SFX folder specified in Tab 2.
Note: If a sound effect is the very last item in your script, it needs a pause after it to actually be heard in the merged audio.
Add a (pause) line equal to or longer than the sound effect's duration immediately after the {play} line. Without it, the base audio ends at the same moment the SFX starts, and the SFX gets cut off.
Like this:
Rei: Signing off.
{play cloth.wav, c1, once}
(2.0s)
Supported formats — Any audio format FFMPEG can read: .mp3, .wav, .ogg, .flac, .aac, .m4a, and others.
The filename in your script must match the actual file exactly (including extension).
SpeakerID: (( This line is an inner thought. ))
Wrapping dialogue in double parentheses marks it as an inner thought. Inner thought lines are voiced with a special filtering effect configured in Tab 4 (Dissociated, Whisper, or Dreamlike presets, or Custom). The filter runs on top of all the speaker's regular effects.
[brackets]on a dialogue line are stripped before TTS — use for performance notes, sound effect cues, or character direction for human voice actors.**bold**,_italic_, and~~strikethrough~~markers are stripped before TTS (Kokoro does not use SSML emphasis).//after dialogue text starts an inline comment; everything after it is stripped. A space before//is required (so URLs are not accidentally stripped).
Silence Trim — Controls how leading/trailing silence is removed from each TTS clip. Default: trim beginning and end. Options: Off, Beginning only, End only, Beginning + End, All silence.
Merged Audio Pauses — Adjust the pause duration added after each punctuation type (period, comma, exclamation, question, hyphen, ellipsis, etc.).
Contextual Modifiers — Fine-tune how pause lengths are modified by context: speaker changes, short lines, long lines, inner thought padding, same-speaker reduction, first/last line padding.
Inner Thoughts Effect — Choose from Whisper, Dreamlike, Dissociated presets or configure custom highpass/lowpass/echo parameters for the inner thought audio filter.
| Effect | Description |
|---|---|
| Radio Filter | Walkie-talkie / comms radio effect. Bandpass + phaser + compression. |
| Reverb | Spatial depth. Configurable echo chains. |
| Distortion | Aggressive, gritty clipping and bit crushing. |
| Telephone | Lo-fi compressed sound. Narrow bandpass + bit crushing. |
| Robot Voice | Ring modulator for mechanical / robotic character. |
| Cheap Mic | Degraded quality, poor recording simulation. |
| Underwater | Muffled, wet, submerged sound. Lowpass + flanger. |
| Megaphone | Projected bullhorn. Treble-boosted, punchy, bandpassed. |
| Worn Tape | VHS/cassette degradation. Wow-flutter, lo-fi analog warble. |
| Intercom | Hallway speaker box. Flat, compressed, confined. Adds crackling static noise. |
| Alien Voice | Non-human vocal quality. Three variants: Insectoid, Dimensional, Warble. |
| Cave | Physical stone space reverb. Three variants: Tunnel, Cave, Abyss. |
| Pitch Shift | FFMPEG rubberband pitch shift. Multiplier ×0.5–×2.0. Works independently of speed. |
Most effects have Off / Mild / Medium / Strong presets. Alien and Cave use named variants instead. Effects are combinable.
-
49 voices out of the box — Kokoro includes voices in 8 languages (English US/UK, Chinese, Spanish, French, Hindi, Italian, Portuguese). Audition them with the Test Voice button before committing.
-
Custom voice blends — Tab 5's Voice Blender lets you interpolate between two or three voices. Subtle blends (70/30) typically sound more coherent than 50/50.
-
Pitch for pitch shifting — The pitch slider in Tab 2 uses FFMPEG rubberband pitch shifting. It is independent of speaking speed and works for all Kokoro voices.
-
Bold and italic do NOT affect TTS output — Kokoro doesn't read SSML. Remove emphasis formatting from your script if you added it from a previous Google TTS workflow.
-
Test each voice before generating everything. The Test Voice button in Tab 2 saves a preview clip and opens it immediately.
-
Cheap Mic at Mild is a subtle effect that adds a hint of realism to otherwise very clean TTS voices. Worth trying as a default.
-
Prompt templates — The
!docs/prompt_templates/folder has templates for using AI chatbots to write scripts or generate voice line banks. Open them in any text editor.
| File | Contents |
|---|---|
!docs/guides/Script_Writing_Guide.md |
Writing for TTS, pacing with punctuation and pauses, using effects as character design, AI-assisted workflow |
!docs/guides/Audio_Effects_Guide.md |
Full reference for all effects, preset levels, FFMPEG pipeline, Yell Impact, troubleshooting |
Ready-to-load .md script files — open any of them in Tab 1 to see the format in action.
| File | What it demonstrates |
|---|---|
example_tiny.md |
Minimal 2-line script |
example_small.md |
Short 2-character scene with SFX, pause, and comments |
example_full_drama.md |
Full multi-character drama with SFX channels, inner thoughts, and scene structure |
example_monologue.md |
Single narrator, no character interaction |
example_meditation.md |
Atmospheric piece with long pauses and inner thought lines |
example_oneliners.md |
Voice bank format — one character, many independent lines by category |
example_game_scenes.md |
Multi-scene game dialogue with tactical characters, SFX, and inner thoughts |
Fill-in-the-blank prompts for generating scripts with an AI chatbot. Copy, fill in characters/scenario, paste to a chatbot, save the output as a .md file, load in Tab 1.
| File | Use case |
|---|---|
cohesive_script.md |
Continuous scene — characters talk to each other |
separate_voice_lines.md |
Voice bank — independent lines per category |
game_scene_pack.md |
Single game scene with character roles, SFX, and inner thoughts |
narrator_monologue.md |
Single narrator — story, documentary, speech, essay |
podcast_interview.md |
Two-person host/guest conversation |
ambient_narration.md |
Slow, atmospheric, mood-driven spoken word |
Kokoro model not found — Make sure kokoro_tts/kokoro-v1.0.fp16.onnx and kokoro_tts/voices-v1.0.bin exist alongside the program. The model loads at startup; wait for the status bar to confirm it's ready before generating.
FFMPEG not found — Install FFMPEG and make sure it is in your system PATH. Use the automatic installer at https://reactorcore.itch.io/ffmpeg-to-path-installer then restart the program.
Parse errors on load — The parse log in Tab 1 lists every error with line numbers. Fix them in your text editor and click Reload Script.
Voice too quiet — The post-effects normalization pass ensures consistent loudness. If a speaker still sounds quiet relative to others, their Level slider may be below 100%.
Missing voice lines in output — Check the generation log in Tab 3 for per-line errors. A missing voice assignment or an FFMPEG issue on a specific line will be noted.
Test Voice not opening — The file is saved to output_test/ in the program folder.
Open it manually if the auto-open fails.
Bold/italic has no effect — This is expected. Kokoro TTS does not use SSML. Emphasis markers are stripped before TTS. Write naturally with punctuation and word choice for expressive delivery instead.
- Kokoro ONNX — Local TTS engine
- ttkbootstrap — Modern themed tkinter UI
- FFMPEG — Audio processing and merging
- Script to Voice Generator — By Reactorcore
Check out everything else I do: ⭐
