GitHub - BeckettFrey/speech-mine: Turn audio into searchable, speaker-labeled transcripts. Built for iterative pipelines, not one-shot runs.

A Python toolkit for speaker-diarized transcription and transcript analysis. Built on faster-whisper and pyannote.audio; extract word-level, speaker-labeled CSVs from audio, then search, format, and chunk them (one step at a time).

Modules

Module	Description	Docs
`extract`	Transcribe audio with speaker diarization	→
`format`	Format CSV transcripts into readable scripts	→
`chunk`	Split audio into segments via YAML config	→
`search`	Fuzzy search transcripts by word or phrase	→

Installation

curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/beckettfrey/speech-mine
cd speech-mine
uv sync

See docs/installation.md for library dependency setup and HuggingFace token configuration.

Quick Start

Note

speech-mine is flexible and adapts to your use case. The commands below show a generalized example workflow. For more granular control, use the Python API directly.

# 1. (Optional) Chunk a long recording into segments
uv run speech-mine chunk recording.wav chunks.yaml chunks/

# 2. Extract a transcript
uv run speech-mine extract interview.mp3 output.csv \
  --hf-token YOUR_TOKEN \
  --num-speakers 2 \
  --compute-type float32

# 3. Format into a readable script
uv run speech-mine format output.csv script.txt

# 4. Search it
uv run speech-mine search "topic of interest" output.csv --pretty

# 5. (Optional) Chunk the recording again around segments of interest
uv run speech-mine chunk recording.wav segments.yaml clips/

Documentation

# Serve docs locally
uv run mkdocs serve

Or browse the docs/ folder directly.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github		.github
docs		docs
examples		examples
src/speech_mine		src/speech_mine
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
README_PYPI.md		README_PYPI.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modules

Installation

Quick Start

Documentation

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modules

Installation

Quick Start

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages