A desktop app that transcribes audio locally with whisper.cpp and builds a searchable document library using Ollama models.
- Audio import — Import mp3, m4a, wav, flac, ogg, opus, or webm files
- Microphone recording — Record directly in the app with device selection
- Local transcription — whisper.cpp transcribes audio with timestamped segments
- AI-powered metadata — Ollama (Gemma 3) generates titles, summaries, and tags
- Semantic search — Natural language search across all documents via embeddings
- Document library — Browse, sort, filter, and manage transcribed documents
- Subtitle generation — SRT and VTT files generated alongside transcripts
| Layer | Technology |
|---|---|
| Frontend | SolidJS, Tailwind v4, TypeScript |
| Backend | Rust, Tauri 2 |
| Database | SQLite (rusqlite) |
| AI | Ollama (gemma3:4b, nomic-embed-text) |
| Audio | whisper.cpp, ffmpeg |
- Ollama installed and running (
http://localhost:11434) - Internet connection on first run (to download models)
Important
For development, you also need whisper-cli, ffmpeg, and yt-dlp on your PATH (or let setup.sh create sidecar wrappers that forward to them).
pnpm installbash setup.shThis creates target-suffixed wrapper scripts in src-tauri/binaries/ that forward to your system PATH binaries (whisper-cli, ffmpeg, yt-dlp). This keeps development smooth without committing large binaries to the repo.
pnpm tauri devOn first launch, the app runs preflight checks and walks you through downloading the required models (whisper model + Ollama models).
Structure
src/ # Frontend (SolidJS + TypeScript)
views/ # App views (Splash, Setup, Record, Import, Library, Document, Settings)
state/ # Global app state (AppContext)
src-tauri/ # Backend (Rust + Tauri 2)
src/
commands.rs # Tauri IPC commands
bootstrap.rs # Dependency checking & setup
storage.rs # SQLite database & file management
models.rs # Data structures & constants
parsers.rs # Whisper/Ollama output parsing
binaries/ # Sidecar binaries (whisper-cli, ffmpeg, yt-dlp)
docs/
spec.md # Technical specification
roadmap.md # Development roadmapHow It Works
- Preflight — App checks for whisper-cli, ffmpeg, whisper model, Ollama, and required Ollama models
- Setup — First-run wizard downloads
ggml-base.en.binand pulls Ollama models - Import/Record — Audio is converted to 16kHz mono WAV via ffmpeg
- Transcribe — whisper.cpp produces timestamped transcript + SRT/VTT subtitles
- Enrich — Gemma 3 generates a title, summary, and tags from the transcript
- Embed — Transcript is chunked (~512 tokens) and embedded via nomic-embed-text
- Search — Queries are embedded and matched against chunks using cosine similarity
Sidecar Packaging
For production builds, place real target-suffixed binaries in src-tauri/binaries/ before packaging. Sidecar entries are configured in src-tauri/tauri.conf.json:
binaries/whisper-clibinaries/ffmpegbinaries/yt-dlp
See src-tauri/binaries/README.md for details.
Commands
pnpm dev # Vite dev server only
pnpm tauri dev # Full app (Vite + Tauri)
pnpm build # Build frontend
pnpm lint # ESLint
pnpm test # Vitest
pnpm check # TypeScript check (can use pnpm typecheck)cargo test --manifest-path src-tauri/Cargo.toml # Tests
cargo clippy --manifest-path src-tauri/Cargo.toml --fix --allow-dirty # Linting
cargo fmt --manifest-path src-tauri/Cargo.toml # Formatting
