Instant Voice Translate

Android app for real-time voice translation. Captures audio from microphone or system audio, recognizes speech locally on-device via Sherpa-ONNX, translates text online (Yandex API) or offline (NLLB-200), and speaks the result via Android TTS.

Overview

The app runs a full voice translation pipeline as an Android foreground service:

Audio input (mic / system audio)
        |
Streaming speech recognition (Sherpa-ONNX, on-device)
        |
Translation -- online (Yandex API) or offline (NLLB-200)
        |
Text-to-speech (Android TTS)

Speech recognition always runs on-device. Translation works in two modes: online (Yandex API, requires internet) or offline (NLLB-200-distilled-600M via ONNX Runtime, no internet needed after model download).

Features

Two audio sources -- microphone and system audio (AudioPlaybackCapture / MediaProjection)
Offline speech recognition -- streaming Zipformer models (Sherpa-ONNX), 26--188 MB per language, auto-downloaded from HuggingFace
Dual translation backend:
- Online -- Yandex Translate API + fallback, LRU cache (200 entries), 13 target languages
- Offline -- NLLB-200-distilled-600M via ONNX Runtime, ~1.3 GB download, 31 target languages, no internet needed
TTS playback -- Android TTS with queue, adjustable speed (0.5x--2.0x) and pitch
Feedback loop prevention -- microphone muting during TTS playback
7 source languages with ASR models, up to 31 target languages (offline mode)
Foreground service -- background operation, notification controls (pause/resume/stop)
Parallel translation -- up to 3 concurrent translations with ordered TTS delivery
i18n -- app UI available in English and Russian (Per-App Language Preferences)
Theming -- system / light / dark, Material You dynamic colors on Android 12+

Tech Stack

Component	Technology
Language	Kotlin 2.1.20, JVM 17
UI	Jetpack Compose + Material 3 (BOM 2026.02.01)
DI	Hilt 2.56.2 + KSP
Async	Kotlin Coroutines + Flow / StateFlow / Channel
Speech Recognition	Sherpa-ONNX (local AAR, streaming Zipformer)
Online Translation	Yandex Translate API + fallback (OkHttp 5.3.2)
Offline Translation	NLLB-200-distilled-600M (ONNX Runtime 1.17.1)
Text-to-Speech	Android TextToSpeech
Settings Storage	DataStore Preferences
Navigation	Navigation Compose 2.9.7
Build	AGP 8.9.3, Gradle
Min / Target SDK	29 (Android 10) / 36

Requirements

Android Studio Ladybug or newer
JDK 17
Android SDK 36 (compileSdk)
Device or emulator with API 29+ (Android 10 required for AudioPlaybackCapture)

Build & Run

# Debug APK
./gradlew assembleDebug

# Install on connected device
./gradlew installDebug

# Full build with lint
./gradlew build

# Release APK / AAB (requires keystore.properties)
./gradlew assembleRelease
./gradlew bundleRelease

Note: app/libs/sherpa-onnx.aar must be present -- it is a local dependency for Sherpa-ONNX speech recognition.

Headphones Recommended

When using microphone mode, use headphones to prevent feedback loops. Without them, the ASR model picks up TTS playback and re-translates its own output. Alternatively, enable "Mute mic during TTS" in Settings (may cause missed speech segments).

Supported Languages

Source languages require a local ASR model (26--188 MB, auto-downloaded). Target language availability depends on translation mode.

Language	Code	Source (ASR)	Target: Online	Target: Offline
English	en	yes	yes	yes
Russian	ru	yes	yes	yes
Spanish	es	yes	yes	yes
German	de	yes	yes	yes
French	fr	yes	yes	yes
Chinese	zh	yes	yes	yes
Korean	ko	yes	yes	yes
Japanese	ja	--	yes	yes
Portuguese	pt	--	yes	yes
Italian	it	--	yes	yes
Turkish	tr	--	yes	yes
Arabic	ar	--	yes	yes
Hindi	hi	--	yes	yes
Ukrainian	uk	--	--	yes
Polish	pl	--	--	yes
Dutch	nl	--	--	yes
Swedish	sv	--	--	yes
Czech	cs	--	--	yes
Romanian	ro	--	--	yes
Hungarian	hu	--	--	yes
Greek	el	--	--	yes
Danish	da	--	--	yes
Finnish	fi	--	--	yes
Norwegian	no	--	--	yes
Thai	th	--	--	yes
Vietnamese	vi	--	--	yes
Indonesian	id	--	--	yes
Malay	ms	--	--	yes
Hebrew	he	--	--	yes
Bulgarian	bg	--	--	yes
Georgian	ka	--	--	yes

Usage

On first launch, the app downloads the ASR model for the selected source language from HuggingFace.
Select audio source: Microphone or System Audio.
Tap the microphone button to start translation. System Audio requires screen capture permission.
Translated text appears on the main card; original phrase shown below.
Control via notification: pause, resume, stop.

Offline Translation Setup

Open Settings and scroll to Offline Translation.
Tap Download to get the NLLB-200 model (~1.3 GB, 5 files from HuggingFace).
Once downloaded, toggle Offline mode on. The target language list expands to 31 languages.
The model uses ~900 MB RAM when loaded.

Settings

App Language -- English, Russian, or system default
Languages -- source (7 with ASR) and target (13 online / 31 offline)
Offline Translation -- download/delete NLLB model, toggle offline mode
TTS -- speech rate, pitch, auto-playback, mic muting during TTS
Display -- show/hide original text and partial ASR results
Theme -- system / light / dark

Project Structure

app/src/main/java/com/example/instantvoicetranslate/
|
|-- MainActivity.kt                    # Activity, navigation, permissions
|-- InstantVoiceTranslateApp.kt        # Application, notification channel
|
|-- asr/
|   |-- SpeechRecognizer.kt            # ASR interface
|   |-- SherpaOnnxRecognizer.kt        # Sherpa-ONNX Zipformer implementation
|
|-- audio/
|   |-- AudioCaptureManager.kt         # Mic and system audio capture
|   |-- AudioDiagnostics.kt            # Raw audio WAV recording (debug)
|
|-- data/
|   |-- TranslationUiState.kt          # Singleton UI state (StateFlow)
|   |-- SettingsRepository.kt          # Settings via DataStore
|   |-- ModelDownloader.kt             # ASR model download from HuggingFace
|
|-- di/
|   |-- AppModule.kt                   # Hilt bindings
|
|-- service/
|   |-- TranslationService.kt          # Foreground service, pipeline orchestrator
|
|-- translation/
|   |-- TextTranslator.kt              # Translation interface
|   |-- FreeTranslator.kt              # Online: Yandex API + fallback, LRU cache
|   |-- NllbTranslator.kt              # Offline: NLLB-200 via ONNX Runtime
|   |-- NllbModelManager.kt            # NLLB model download and lifecycle
|   |-- NllbTokenizer.kt               # SentencePiece tokenizer for NLLB
|   |-- SentencePieceBpe.kt            # Pure Kotlin BPE tokenizer (no native deps)
|   |-- NllbLanguageCodes.kt           # ISO 639-1 to NLLB code mapping
|
|-- tts/
|   |-- TtsEngine.kt                   # Android TTS with queue
|
|-- ui/
    |-- screens/
    |   |-- MainScreen.kt              # Main screen: translation display
    |   |-- SettingsScreen.kt          # Settings screen
    |-- theme/
    |   |-- Theme.kt                   # Material 3, dynamic colors
    |-- utils/
    |   |-- LanguageUtils.kt           # Language lists and display names
    |-- viewmodel/
        |-- MainViewModel.kt           # Main screen ViewModel
        |-- SettingsViewModel.kt       # Settings ViewModel

Architecture

MVVM with a foreground service as the central orchestrator:

UI (Compose)  <-->  ViewModel  <-->  TranslationUiState (singleton StateFlow)
                                            ^
                                            |
                                    TranslationService (foreground)
                                            |
                        +-------------------+-------------------+
                        |                   |                   |
                AudioCaptureManager  SherpaOnnxRecognizer  TextTranslator
                (mic / system audio)  (streaming ASR)      |             |
                                                     FreeTranslator  NllbTranslator
                                                     (online)       (offline/ONNX)
                                                           |
                                                       TtsEngine
                                                   (playback queue)

TranslationUiState -- singleton StateFlow bridging the foreground service and Compose UI without Activity lifecycle binding. Keeps a rolling history of last 10 segments.
TranslationService -- manages audio capture, ASR, translation (parallel, up to 3 concurrent), and TTS. Selects FreeTranslator or NllbTranslator based on the offline mode setting.
NllbTranslator -- uses two separate decoder ONNX models (no-cache + with-past) to avoid an Android ONNX Runtime crash with the merged decoder's If-node. Pure Kotlin SentencePiece BPE tokenizer (no native dependencies).
TtsEngine -- Channel-based queue for sequential playback; isSpeaking StateFlow prevents feedback loops.

Permissions

Permission	Purpose
`RECORD_AUDIO`	Microphone recording
`FOREGROUND_SERVICE`	Background service operation
`FOREGROUND_SERVICE_MICROPHONE`	Foreground service type: microphone
`FOREGROUND_SERVICE_MEDIA_PROJECTION`	Foreground service type: system audio
`POST_NOTIFICATIONS`	Notifications (Android 13+)
`INTERNET`	Translation API and model downloads
`ACCESS_NETWORK_STATE`	Network availability check
`WAKE_LOCK`	Prevent sleep during service operation

Default Configuration

Parameter	Value
Source language	English (en)
Target language	Russian (ru)
Audio source	Microphone
Translation mode	Online
TTS speed	1.0x
TTS pitch	1.0x
Auto-playback	Enabled
Mute microphone on TTS	Disabled
Theme	System

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.claude		.claude
app		app
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instant Voice Translate

Overview

Features

Tech Stack

Requirements

Build & Run

Headphones Recommended

Supported Languages

Usage

Offline Translation Setup

Settings

Project Structure

Architecture

Permissions

Default Configuration

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Instant Voice Translate

Overview

Features

Tech Stack

Requirements

Build & Run

Headphones Recommended

Supported Languages

Usage

Offline Translation Setup

Settings

Project Structure

Architecture

Permissions

Default Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages