Skip to content

ReactorcoreGames/Script-to-Voice-Generator-EdgeTTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Script to Voice Generator

Welcome to any script writing author out there!

You can convert formatted script files into fully voiced audio using Microsoft Edge-TTS, with per-character voice settings, audio effects, and smart merged output.

Get it here: https://reactorcore.itch.io/script-to-voice-generator

Made by Reactorcorehttps://linktr.ee/reactorcore


What You Need

FFMPEG — Required for audio effects and merging.

Python 3.x — Required to run from source (not needed if using the compiled .exe). You use to build_exe.bat to build the whole thing in a single click.


What It Does

Script to Voice Generator reads a formatted .txt or .md script file and:

  • Converts each dialogue line to speech using Edge-TTS (300+ voices).
  • Saves individual clips for each line — both clean (TTS only) and effects-processed.
  • Merges all clips into a single audio file, with smart pauses based on punctuation.
  • Produces both a raw merge and a loudness-normalized merge.
  • Generates a reference sheet listing every clip filename and its spoken text.

Multiple speakers are supported. Each speaker gets their own voice, pitch, speed, and audio effects settings, stored in character_profiles.json so they're remembered between sessions.


Quick Start

1. Write or prepare a script file

Scripts are .txt or .md files. Each spoken line uses the format:

SpeakerID: Dialogue text goes here.

Example:

# My Short Film

Alex: Hey, are you okay?
Jordan: Yeah, I'm fine. [sighs] Just tired.
(1.0s)
Alex: You sure? You look pale.
Jordan: I said I'm fine.

See Script Format below for full syntax details.

2. Load the script (Tab 1)

  1. Launch the program and click Open Script File.
  2. The parser checks for formatting errors and lists them in the log.
  3. Fix any errors in your text editor and click Reload Script.
  4. When the parse log shows no errors, click Continue →.

3. Configure voices (Tab 2)

Each detected speaker gets a panel with:

  • Voice — Choose from Edge-TTS voices. Use the filter checkbox to show only English voices, or uncheck it for all languages.
  • Pitch — Deep / Normal / High / Custom preset.
  • Speed — Percentage adjustment.
  • Level — 5–100% relative volume. 100% = full normalized output (default). Reduce to make a speaker quieter in the mix.
  • Yell Impact — Slows down single-word exclamatory lines (e.g. WOW!). (with monotone edge-tts voices it makes such single word lines sound more emphasized, impactful.)
  • Audio Effects — Radio, Reverb, Distortion, Telephone, Robot Voice, Cheap Mic. Each has Off / Mild / Medium / Strong levels.

Use Test Voice to generate a quick preview clip and hear the settings immediately.

Settings auto-save to character_profiles.json on every change, so known speakers are recalled automatically next session.

4. Generate (Tab 3)

  1. Enter a Project Name (used as a filename prefix, 20 chars max).
  2. Choose an Output Folder.
  3. Click Generate All and confirm.

The generation log shows progress. When done, all files appear in the output folder:

output_folder/
├── clips_clean/          ← Raw TTS clips (no FFMPEG effects)
│   └── project_0001_Speaker_line-text.mp3
├── clips_effect/         ← Effects-processed clips
│   └── project_0001_Speaker_line-text.mp3
├── !project_merged_pure.mp3        ← Merged audio, no normalization
├── !project_merged_loudnorm.mp3    ← Merged audio, loudness-normalized
└── project_reference.txt           ← Line-by-line reference sheet

Script Format

Dialogue lines

SpeakerID: Spoken text goes here.
  • SpeakerID must be 20 characters or fewer. Allowed: letters, numbers, spaces, hyphens, underscores.
  • All text after the first colon is spoken. Additional colons in the line are fine.
  • Lines over 500 characters throw a parse error.

Headings

# Scene title
## Sub-scene

Treated as metadata. Sets the script title. Not voiced.

Comments

// This is a comment
/* Multi-line
   comment */

Not voiced. Useful for stage directions, notes, or commented-out listener responses.

Pauses

(1.5s)
(pause 2.0)
(0.8)

Any line that is only parentheses containing a number inserts a silent pause in the merged audio. The number is in seconds.

Sound effects

{play filename.mp3, c1, loop}
{stop c1}
{stop all}
{play explosion.wav, once}

Sound effect events are placed in the merge timeline at the correct position. Sound effect files must exist in the SFX folder specified in Tab 2.

Note: If a sound effect is the very last item in your script, it needs a pause after it to actually be heard in the merged audio. Add a (pause) line equal to or longer than the sound effect's duration immediately after the {play} line. Without it, the base audio ends at the same moment the SFX starts, and the SFX gets cut off.

Like this:

Rei: Signing off.
{play cloth.wav, c1, once}
(2.0s)

Supported formats — Any audio format FFMPEG can read: .mp3, .wav, .ogg, .flac, .aac, .m4a, and others. The filename in your script must match the actual file exactly (including extension).

File size and length — No enforced limit. FFMPEG loads each SFX file into the mix at its playback position. Very large or very long SFX files will increase merge time and output file size, but will not cause errors on their own. Loop-mode SFX (loop) are automatically trimmed at the matching {stop} event, so file length only matters for once mode plays.

Inline notation

  • [brackets] on a dialogue line are stripped before TTS — use for performance notes, sound effect cues, or character direction for human voice actors.
  • bold, italic, strikethrough are stripped before TTS (markdown formatting).
  • // after dialogue text starts an inline comment; everything after it is stripped.

Settings Tab (Tab 4)

Merged Audio Pauses — Adjust the pause duration added after each punctuation type (period, comma, exclamation, question, hyphen, ellipsis, etc.).

Contextual Modifiers — Fine-tune how pause lengths are modified by context: speaker changes, short lines, long lines, inner thought padding, etc.

Inner Thoughts Effect — Choose from Whisper, Dreamlike, Dissociated presets or configure custom highpass/lowpass/echo parameters for the inner thought audio filter.


Audio Effects Reference

Effect Description
Radio Filter Walkie-talkie / comms radio effect. Bandpass + phaser + compression.
Reverb Spatial depth. Configurable echo chains.
Distortion Aggressive, gritty clipping and bit crushing.
Telephone Lo-fi compressed sound. Narrow bandpass + bit crushing.
Robot Voice Ring modulator for mechanical / robotic character.
Cheap Mic Degraded quality, poor recording simulation.
Underwater Muffled, wet, submerged sound. Lowpass + flanger.
Megaphone Projected bullhorn. Treble-boosted, punchy, bandpassed.
Worn Tape VHS/cassette degradation. Wow-flutter, lo-fi analog warble.
Intercom Hallway speaker box. Flat, compressed, confined. Adds crackling static noise.
Alien Voice Non-human vocal quality. Three variants: Insectoid, Dimensional, Warble.
Cave Physical stone space reverb. Three variants: Tunnel, Cave, Abyss.

Most effects have Off / Mild / Medium / Strong presets. Alien and Cave use named variants instead. Effects are combinable.


Tips

  • Write for TTS. Edge-TTS pronounces non-word vocalizations letter by letter — "grr" becomes "gee arr arr". Use real words instead:

    • Instead of "ugh" → "oh", "gosh", "nah"
    • Instead of "grr" → "gah", "bah", "come on"
    • Instead of "hnng" → "huh", "nope", "done"
  • Test each voice before generating everything. The Test Voice button in Tab 2 saves a preview clip and opens it immediately.

  • Cheap Mic at Mild is a subtle effect that adds a hint of realism to otherwise very clean TTS voices. Worth trying as a default.

  • Prompt templates — The !docs/prompt_templates/ folder has templates for using AI chatbots to write scripts or generate voice line banks. Open them in any text editor.


Included Docs (!docs/)

Guides

File Contents
!docs/guides/Script_Writing_Guide.md Writing for TTS, pacing with punctuation and pauses, Edge-TTS quirks, using effects as character design, AI-assisted workflow
!docs/guides/Audio_Effects_Guide.md Full reference for all 6 effects, preset levels, FFMPEG pipeline, Yell Impact, troubleshooting

Example Scripts

Ready-to-load .md script files — open any of them in Tab 1 to see the format in action.

File What it demonstrates
example_tiny.md Minimal 2-line script
example_small.md Short 2-character scene with SFX, pause, and comments
example_full_drama.md Full multi-character drama with SFX channels, inner thoughts, and scene structure
example_monologue.md Single narrator, no character interaction
example_meditation.md Atmospheric piece with long pauses and inner thought lines
example_oneliners.md Voice bank format — one character, many independent lines by category
example_game_scenes.md Multi-scene game dialogue with tactical characters, SFX, and inner thoughts

Prompt Templates

Fill-in-the-blank prompts for generating scripts with an AI chatbot. Copy, fill in characters/scenario, paste to a chatbot, save the output as a .md file, load in Tab 1.

File Use case
cohesive_script.md Continuous scene — characters talk to each other
separate_voice_lines.md Voice bank — independent lines per category
game_scene_pack.md Single game scene with character roles, SFX, and inner thoughts
narrator_monologue.md Single narrator — story, documentary, speech, essay
podcast_interview.md Two-person host/guest conversation
ambient_narration.md Slow, atmospheric, mood-driven spoken word

Troubleshooting

FFMPEG not found — Install FFMPEG and make sure it is in your system PATH. Use the automatic installer at https://reactorcore.itch.io/ffmpeg-to-path-installer then restart the program.

Parse errors on load — The parse log in Tab 1 lists every error with line numbers. Fix them in your text editor and click Reload Script.

Voice too quiet — The post-effects normalization pass ensures consistent loudness. If a speaker still sounds quiet relative to others, their Level slider may be below 100%.

Missing voice lines in output — Check the generation log in Tab 3 for per-line errors. A missing voice assignment or an FFMPEG issue on a specific line will be noted.

Test Voice not opening — The file is saved to output_test/ in the program folder. Open it manually if the auto-open fails.


Credits

  • Edge-TTS — Microsoft's text-to-speech engine
  • ttkbootstrap — Modern themed tkinter UI
  • FFMPEG — Audio processing and merging
  • Script to Voice Generator — By Reactorcore

Links

Check out everything else I do: ✨🚀 https://linktr.ee/reactorcore

About

Software that converts formatted scripts into fully voiced audio using Edge-TTS with per-character voices and effects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors