Skip to content

lucharo/voice2text

Repository files navigation

voice2text

PyPI Downloads Total Downloads macOS Works on my machine

Local voice-to-text with Whisper + LLM cleanup. Push-to-talk (Right ⌘), pastes at cursor.

Voice-to-text tools like Wispr Flow, MacWhisper, and VoiceInk are becoming increasingly popular. It's a testament to our times that in 2025, ~270 lines of Python with local Whisper and a small ollama language model (Qwen 2.5-3B) can deliver a comparable experience on consumer hardware. Such tooling would have been unimaginable 3 years ago. This project is a proof of concept to demonstrate just that.

Note: Before anyone suggests splitting this into modules and submodules — this is an intentional design choice to demonstrate how this whole functionality fits in less than 300 lines of python code.

Note 2: This is macOS-only by design. We use:

  • mlx-whisper — optimized for Apple Silicon
  • osascript — for simulating Cmd+V paste via System Events
  • pbcopy/pbpaste — macOS clipboard
  • nowplaying-cli — macOS media control
  • System Preferences URLs for permissions

You're welcome to fork this and make it work on Linux or Windows!

Prerequisites

Skip this if using pixi — it handles ollama automatically.

brew install ollama
ollama pull qwen2.5:3b

Install

uvx (quick try)

The fastest way to try it out. Note: startup is slower because uvx creates a fresh virtual environment each time.

uvx --from voice2text v2t

Or from GitHub:

uvx --from git+https://github.com/lucharo/voice2text v2t

uv tool install (recommended for daily use)

Installs v2t as a persistent command — no virtual environment setup on each run, so startup is fast.

uv tool install voice2text
v2t

pip

pip install voice2text
v2t

Development install

git clone https://github.com/lucharo/voice2text.git
cd voice2text
uv sync
uv run v2t

Pixi

Pixi handles the ollama dependency automatically:

git clone https://github.com/lucharo/voice2text.git
cd voice2text
pixi run ollama pull qwen2.5:3b
pixi run v2t

Note: We don't publish to conda-forge/pixi channels yet, but may in the future.

Usage

v2t                      # strict mode (restructures sentences)
v2t --casual             # light cleanup (punctuation only)
v2t --pause-music        # pause media while recording (macOS only, requires nowplaying-cli via brew)

Hold Right Command to record, release to transcribe and paste.

Strict vs Casual Mode

Raw transcription Strict Casual
"Hey um I'll see you tomorrow at 9 actually no make it 10" "Hey, I'll see you tomorrow at 10." "Hey, I'll see you tomorrow at 9, actually no, make it 10."
"So basically I was thinking we could um you know maybe try the other approach" "I was thinking we could try the other approach." "So basically, I was thinking we could maybe try the other approach."

Strict (default): Removes filler words, restructures for clarity, condenses.

Casual: Only adds punctuation and removes "um/uh", keeps your phrasing.

--pause-music (macOS only)

Pauses any playing media while recording and resumes after. Requires:

brew install nowplaying-cli

Not available via pixi/conda-forge for now, maybe will publish later!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors