A Python toolkit for speaker-diarized transcription and transcript analysis. Built on faster-whisper and pyannote.audio; extract word-level, speaker-labeled CSVs from audio, then search, format, and chunk them (one step at a time).
| Module | Description | Docs |
|---|---|---|
extract |
Transcribe audio with speaker diarization | → |
format |
Format CSV transcripts into readable scripts | → |
chunk |
Split audio into segments via YAML config | → |
search |
Fuzzy search transcripts by word or phrase | → |
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/beckettfrey/speech-mine
cd speech-mine
uv syncSee docs/installation.md for library dependency setup and HuggingFace token configuration.
Note
speech-mine is flexible and adapts to your use case. The commands below show a generalized example workflow. For more granular control, use the Python API directly.
# 1. (Optional) Chunk a long recording into segments
uv run speech-mine chunk recording.wav chunks.yaml chunks/
# 2. Extract a transcript
uv run speech-mine extract interview.mp3 output.csv \
--hf-token YOUR_TOKEN \
--num-speakers 2 \
--compute-type float32
# 3. Format into a readable script
uv run speech-mine format output.csv script.txt
# 4. Search it
uv run speech-mine search "topic of interest" output.csv --pretty
# 5. (Optional) Chunk the recording again around segments of interest
uv run speech-mine chunk recording.wav segments.yaml clips/# Serve docs locally
uv run mkdocs serveOr browse the docs/ folder directly.
MIT
