`voice2text`

Local voice-to-text with Whisper + LLM cleanup. Push-to-talk (Right ⌘), pastes at cursor.

Voice-to-text tools like Wispr Flow, MacWhisper, and VoiceInk are becoming increasingly popular. It's a testament to our times that in 2025, ~270 lines of Python with local Whisper and a small ollama language model (Qwen 2.5-3B) can deliver a comparable experience on consumer hardware. Such tooling would have been unimaginable 3 years ago. This project is a proof of concept to demonstrate just that.

Note: Before anyone suggests splitting this into modules and submodules — this is an intentional design choice to demonstrate how this whole functionality fits in less than 300 lines of python code.

Note 2: This is macOS-only by design. We use:

mlx-whisper — optimized for Apple Silicon

osascript — for simulating Cmd+V paste via System Events

pbcopy/pbpaste — macOS clipboard

nowplaying-cli — macOS media control

System Preferences URLs for permissions

You're welcome to fork this and make it work on Linux or Windows!

Prerequisites

Skip this if using pixi — it handles ollama automatically.

brew install ollama
ollama pull qwen2.5:3b

Install

uvx (quick try)

The fastest way to try it out. Note: startup is slower because uvx creates a fresh virtual environment each time.

uvx --from voice2text v2t

Or from GitHub:

uvx --from git+https://github.com/lucharo/voice2text v2t

uv tool install (recommended for daily use)

Installs v2t as a persistent command — no virtual environment setup on each run, so startup is fast.

uv tool install voice2text
v2t

pip

pip install voice2text
v2t

Development install

git clone https://github.com/lucharo/voice2text.git
cd voice2text
uv sync
uv run v2t

Pixi

Pixi handles the ollama dependency automatically:

git clone https://github.com/lucharo/voice2text.git
cd voice2text
pixi run ollama pull qwen2.5:3b
pixi run v2t

Note: We don't publish to conda-forge/pixi channels yet, but may in the future.

Usage

v2t                      # strict mode (restructures sentences)
v2t --casual             # light cleanup (punctuation only)
v2t --pause-music        # pause media while recording (macOS only, requires nowplaying-cli via brew)

Hold Right Command to record, release to transcribe and paste.

Strict vs Casual Mode

Raw transcription	Strict	Casual
"Hey um I'll see you tomorrow at 9 actually no make it 10"	"Hey, I'll see you tomorrow at 10."	"Hey, I'll see you tomorrow at 9, actually no, make it 10."
"So basically I was thinking we could um you know maybe try the other approach"	"I was thinking we could try the other approach."	"So basically, I was thinking we could maybe try the other approach."

Strict (default): Removes filler words, restructures for clarity, condenses.

Casual: Only adds punctuation and removes "um/uh", keeps your phrasing.

`--pause-music` (macOS only)

Pauses any playing media while recording and resumes after. Requires:

brew install nowplaying-cli

Not available via pixi/conda-forge for now, maybe will publish later!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
uv.lock		uv.lock
voice2text.py		voice2text.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`voice2text`

Prerequisites

Install

uvx (quick try)

uv tool install (recommended for daily use)

pip

Development install

Pixi

Usage

Strict vs Casual Mode

`--pause-music` (macOS only)

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

voice2text

Prerequisites

Install

uvx (quick try)

uv tool install (recommended for daily use)

pip

Development install

Pixi

Usage

Strict vs Casual Mode

--pause-music (macOS only)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`voice2text`

`--pause-music` (macOS only)

Packages