Skip to content

Add Text-to-Speech (TTS) pipeline#106

Open
yangshuwe1 wants to merge 1 commit intomainfrom
shuwei-tts
Open

Add Text-to-Speech (TTS) pipeline#106
yangshuwe1 wants to merge 1 commit intomainfrom
shuwei-tts

Conversation

@yangshuwe1
Copy link
Collaborator

Overview

This PR adds a full TTS pipeline that converts math content to audio, letting students listen to problem bodies, steps, and hints read aloud.

Changes

New backend pipeline (src/math-to-speech/):

  • autoTTSProcessor.js — main script that scans all content and generates pacedSpeech fields in hint/step/problem JSON files. Supports incremental updates (skips unchanged content via hash), --force, --dry-run, and per-type filtering (--types hints,steps,problems)
  • hintProcessor.js — handles LaTeX → MathML (Python/SRE) → readable speech text conversion, with a process pool (4 Python + 8 SRE workers) for fast parallel processing (~39s for 54k hints)
  • latexToMathML.py, sreNode.js — persistent worker processes for LaTeX conversion

New frontend components:

  • TTSPlayer — fetches audio from Lambda and handles play/pause/replay/segment chaining
  • TTSButtons — play/pause/replay button group
  • latexToReadable.js — lightweight LaTeX → readable text fallback when no pacedSpeech is available

Modified frontend (Problem.js, ProblemCard.js, HintSystem.js):

  • TTS buttons integrated into problem title, step title, and each hint
  • All gated behind enableTTS prop

Config (config.js):

  • Added TTS_API_URL read from REACT_APP_TTS_AWS_ENDPOINT environment variable (mirrors how DYNAMIC_HINT_URL works for the LLM agent)

How to enable

Add "allowTTS": true to a lesson in coursePlans.json — same pattern as allowDynamicHint. TTS is off by default everywhere. Default to be False now, making no change to current website.

Current impact

None on existing pages. All courses' configuration are False now. The content submodule is untouched — pacedSpeech fields don't exist yet, so all lessons behave exactly as before. The Lambda env var is also not configured in production yet, so no AWS calls will be made.

To generate speech text locally:

npm run process-tts          # incremental
npm run process-tts:force    # regenerate everything

Can be integrated into the incremental content update GitHub Actions workflow in the future.

@yangshuwe1 yangshuwe1 self-assigned this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant