Skip to content

omachala/diction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

122 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diction

You talk. We type.

Voice keyboard for iOS. Works in every app.
On-device, cloud, or self-hosted transcription. No limits.

Download on the App Store

WebsiteSelf-Hosting GuidePrivacy Policy

License Coverage


You talk. We type.  No limits. No word caps. No catch.  What you say stays with you.  Self-host. Your server, your rules.

Why Diction?

  • Deep audio engineering. State-of-the-art audio filtering, a fine-tuned speech recognition model, and context-aware processing — built by a real engineer who goes deep on one problem.
  • Self-hosted. docker compose up and paste the URL. Your server, your models, your data.
  • Any Whisper-compatible model. Point Diction at any endpoint. Medical, legal, accent-tuned - run whatever you want.
  • End-to-end text encrypted. AES-256-GCM text encryption with X25519 key exchange. Same encryption used by Signal and WireGuard.
  • Zero tracking. No analytics, no telemetry, no data collection. Audit the source yourself.
  • On-device. Whisper runs locally on your iPhone. No network, no server, nothing leaves the device.
  • AI enhancement. Optional LLM cleanup - only the transcript text is sent, never the audio.
  • Free and unlimited. On-device and self-hosted have no caps, no restrictions, no expiry.

How It Works

On-Device (Free, No Setup)

Install the app, add the keyboard, and start dictating. On-device transcription works offline with no server required.

Self-Hosted

Save this as docker-compose.yml and run docker compose up -d:

services:
  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"

  whisper-small:
    image: fedirz/faster-whisper-server:latest-cpu
    environment:
      WHISPER__MODEL: Systran/faster-whisper-small
      WHISPER__INFERENCE_DEVICE: cpu

Your server needs to be reachable from your phone. See No Public IP? for options like Cloudflare Tunnel, Tailscale, or ngrok.

Once reachable, open the Diction app, go to Self-Hosted, paste your server URL. Done.

More models

Swap or add models to your compose file. The gateway handles routing and streaming between them.

Model Parameters RAM
whisper-small 244M ~850 MB
whisper-medium 769M ~2.1 GB
whisper-large-v3 1.5B ~3.9 GB
whisper-large-v3-turbo 809M ~2.3 GB

Bring your own model

Already running a speech model on your homelab? You don't need to run ours. Set CUSTOM_BACKEND_URL to point the gateway at your existing server:

services:
  gateway:
    image: ghcr.io/omachala/diction-gateway:latest
    ports:
      - "8080:8080"
    environment:
      CUSTOM_BACKEND_URL: http://my-server:8000
      CUSTOM_BACKEND_MODEL: your-model-name  # model name your server expects

If your server only accepts WAV audio, add CUSTOM_BACKEND_NEEDS_WAV: "true" and the gateway converts automatically. For servers behind an API key, add CUSTOM_BACKEND_AUTH: "Bearer sk-xxx".

Works with any server that implements POST /v1/audio/transcriptions. See the full guide for more examples.

No Public IP?

You don't need to open ports on your router:

  • Cloudflare Tunnel - free, outbound-only connection. No port forwarding needed.
  • Tailscale - free WireGuard mesh VPN. Install on server + phone, connect from anywhere.
  • ngrok - instant public URL, great for testing.

See the Self-Hosting Guide for detailed instructions.

Privacy

Keyboards can read everything you type. Here's exactly what Diction does with your audio:

  • On-device: Everything stays on your phone. No network connection made.
  • Self-hosted: Audio goes to your server only. Nothing else sees it.
  • Diction One: Audio is transcribed and immediately discarded. Not stored, not used for training.
  • Zero third-party SDKs. No analytics, no tracking, no telemetry of any kind.
  • Full Access is required by iOS for any keyboard that makes network requests. Diction has no QWERTY input to log. It only uses the network to reach your transcription endpoint.

Read the full Privacy Policy.

Diction One

On-device and self-hosted are completely free with no word limits.

If you don't want to run a server, Diction One gives you a fine-tuned cloud model with advanced audio filters — without the setup. Audio is sent to the Diction endpoint, transcribed, and immediately discarded. Pricing and trial details are in the app.

Requirements

  • iOS 17.0+ (iPhone)
  • For self-hosting: any machine that can run Docker

Contributing

Contributions are welcome. See CONTRIBUTING.md.

License

MIT. See LICENSE.

The iOS app is distributed via the App Store. This repository contains the self-hosting infrastructure and documentation.

About

iOS keyboard that transcribes speech to text in any app

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages