Text-to-speech for KOReader with synchronized word highlighting, automatic page turns, and Bluetooth audio support. Works offline on Kobo, Kindle, Android, and Linux.
Download the .zip file from the latest release (look for audiobook-koplugin-v*.zip under Assets). Do not download "Source code (zip)" -- that only contains the Lua sources without the bundled TTS engines.
Unzip it and copy the audiobook.koplugin folder into KOReader's plugins directory:
| Platform | Path |
|---|---|
| Kobo | .adds/koreader/plugins/ |
| Kindle | koreader/plugins/ |
| Linux | ~/.config/koreader/plugins/ |
| Android | /sdcard/koreader/plugins/ |
| PocketBook | applications/koreader/plugins/ |
Restart KOReader after copying.
The pre-built release from step 1 already includes espeak-ng and Piper -- no extra install needed on Kobo or Kindle. Skip to step 3.
If you cloned the repository instead:
Kobo -- install espeak-ng via SSH or the terminal emulator (Menu > More tools > Terminal emulator):
opkg update && opkg install espeak-ngIf opkg is unavailable, grab the .ipk from nickel-packages and run opkg install /mnt/onboard/espeak-ng*.ipk.
Linux -- sudo apt install espeak-ng
Android (Boox, etc.) -- the pre-built release includes tts_helper.dex, which bridges to the device's built-in TTS engine (Google, Samsung, etc.). Just unzip and copy the folder like any other platform. No extra steps needed.
If you cloned the repo instead of downloading a release, build the .dex manually (requires Android SDK):
cd audiobook.koplugin/android/
./build-dex.shThe bundled espeak-ng and Piper binaries are for Linux-based e-readers and will not run on Android. See Android support for details.
- Long-press a word to open the dictionary popup, then tap Read aloud from here.
- Or select a paragraph, then tap Read aloud from here in the selection menu.
- Or go to Tools > Audiobook Read-Along > Start reading from current page.
Piper sounds much more natural than espeak-ng. It runs fully offline on Kobo's ARM processor (~40 MB for engine + voice model). The pre-built release already includes Piper and a default voice (en_US-danny-low). For faster load times on Kobo, low quality voices like this one are recommended (see Choosing a voice). To build a bundle yourself, see Building from source.
Switch between espeak-ng and Piper any time from Tools > Audiobook Read-Along > Voice settings.
Listen to samples and pick a voice: rhasspy.github.io/piper-samples
Voices come in four quality levels:
| Quality | Sample rate | Size | Notes |
|---|---|---|---|
| low | 16 kHz | ~15 MB | Recommended for Kobo -- fast load, low RAM |
| medium | 22 kHz | ~60 MB | Better quality, but slower to load on Kobo |
| high | 22 kHz | ~100 MB | Best quality, more RAM/CPU |
On Kobo (512 MB RAM),
lowvoices are recommended.mediumworks but the model takes noticeably longer to load. Not every voice is available at every quality level -- check HuggingFace for what's offered.
Every voice needs two files: a .onnx model and a .onnx.json config. Place both in audiobook.koplugin/piper/.
Voices are hosted on HuggingFace. The URL pattern is:
https://huggingface.co/rhasspy/piper-voices/resolve/main/<lang>/<lang_REGION>/<speaker>/<quality>/
For example, to download en_US-lessac-medium:
cd audiobook.koplugin/piper/
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.jsonOr for en_US-ryan-low:
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/low/en_US-ryan-low.onnx
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/low/en_US-ryan-low.onnx.jsonBrowse all available voices: huggingface.co/rhasspy/piper-voices
The plugin outputs audio through a Bluetooth A2DP connection when a BT device is paired. The connection is managed through the plugin menu:
Tools > Audiobook Read-Along > Bluetooth settings
Two Bluetooth stacks are supported, auto-detected at runtime:
| Stack | Devices | Audio path |
|---|---|---|
| MTK (mtkbtmwrpc) | Clara 2E, Sage, Libra Colour | GStreamer persistent pipeline |
| BlueZ (bluetoothd) | Libra 2 / Io | aplay via ALSA |
On MTK devices the BT audio pipeline uses an exclusive abstract socket. If audio stops working after a crash, restart KOReader -- the plugin kills orphan processes on startup.
On MTK Kobo devices, the mtkbtmwrpc daemon binds a single abstract socket. Only one GStreamer pipeline can hold it at a time. The plugin keeps one persistent pipeline alive across sentences to avoid reconnection gaps. On BlueZ devices, the plugin starts
bluetoothdand resets the HCI adapter automatically when you power on Bluetooth.
For the full platform audio and Bluetooth architecture (Kobo generations, Kindle, Android), see docs/PLATFORM_AUDIO.md.
| Button | Action |
|---|---|
| Rewind | Previous sentence. Hold for 3x skip. |
| Play/Pause | Toggle playback. |
| Forward | Next sentence. Hold for 3x skip. |
| Close | Stop reading and dismiss the bar. |
Reading pauses automatically when you open a menu and resumes when you close it.
All settings are under Tools > Audiobook Read-Along:
- Bluetooth settings - pair, connect, disconnect alert interval
- Voice settings - TTS engine, voice, speech rate, pitch, volume, sentence/paragraph pauses (espeak-ng), sentence/paragraph gaps (Piper), word gap, clause pause
- Highlight style - background (default), invert (best for e-ink), underline, box
- Auto-advance pages - turn pages automatically
- Highlight words / sentences - toggle each independently
- Quick start with espeak - play first sentence with espeak-ng while Piper loads (avoids the ~3s cold start silence)
- Keep playing when lid is closed - prevents device suspend so audio continues with the case closed
- BT headset media buttons - use play/pause/next/prev on a Bluetooth headset or speaker to control TTS playback
audiobook.koplugin/
main.lua - entry point, menus, event hooks
synccontroller.lua - coordinates audio timing with highlights
ttsengine.lua - TTS synthesis, audio playback, backend detection
piperqueue.lua - persistent Piper server management
textparser.lua - sentence/word tokenization with positions
highlightmanager.lua - screen-coordinate highlight via crengine
playbackbar.lua - transport controls widget
menubuilder.lua - voice/highlight settings menus
btmanager.lua - Bluetooth device scanning and pairing (MTK + BlueZ)
btui.lua - BT menu UI and disconnect watcher
btmediacontrol.lua - BT headset media buttons (AVRCP play/pause/skip)
benchmarkrunner.lua - in-plugin TTS benchmark runner
wavutils.lua - WAV file reading, writing, and manipulation
androidtts.lua - Android TTS via JNI (DexClassLoader + TtsHelper)
utils.lua - shared helpers
Persistent Piper server. On Kobo's single-core ARM, loading the ONNX model takes ~4.5 seconds. A persistent server process keeps the model in memory and accepts sentences over a FIFO. Combined with 3-sentence batching this brings the realtime factor from 0.085x (old 2-server config) to 0.329x. See dev/benchmark/RESULTS.md for the full analysis.
Binary-search highlight alignment. CRe (crengine) snaps text selections to word boundaries, and proportional fonts make character-to-pixel estimates unreliable. The highlight manager uses the proportional estimate as an initial guess, then binary-searches the x coordinate by querying CRe until the selected text matches the target sentence. Converges in 2-4 queries.
Exclusive BT socket (MTK only). Kobo's MediaTek BT firmware exposes a single abstract socket (@kobo:mtkbtmwrpc). The plugin keeps one GStreamer pipeline alive for the entire reading session and feeds audio through a FIFO. Orphan pipelines from crashes are killed on startup via PID files and pkill. On BlueZ devices (Libra 2, etc.) audio goes through standard ALSA and this socket management is not needed.
Long-sentence splitting. Piper's attention mechanism scales quadratically with input length. On Kobo's 512 MB of RAM the server OOMs on sentences above ~1000 characters and throughput drops from ~7 ch/s at 300 chars to ~3 ch/s at 1400 chars. The text parser automatically splits any sentence longer than 300 characters at natural clause boundaries (; : , and/but/or... -) then merges fragments shorter than 80 characters with a neighbour (below that, ~90% of synthesis time is wasted on per-request overhead) and re-splits anything still over 300 at word boundaries. See dev/benchmark/RESULTS_LONG.md for the full data.
| Problem | Fix |
|---|---|
| Plugin not in menu | Folder must be audiobook.koplugin inside plugins/. The plugin only appears in the Tools menu when a book is open. Restart KOReader after copying. |
| No sound | Run espeak-ng "hello" -w /tmp/t.wav && aplay /tmp/t.wav over SSH. |
| No audio player found (Kindle) | Pair BT headphones via the Kindle top-swipe menu before starting playback. If already paired, restart KOReader so the plugin re-detects the audio output. |
| No TTS engine found | Install espeak-ng (see Quick start). |
| No TTS engine found (Android) | Ensure android/tts_helper.dex is present inside the plugin folder. The pre-built release includes it; if you cloned from source, run ./build-dex.sh in the android/ directory. The device must also have a TTS engine installed (most do by default). See Android support. |
| BT audio silent | Restart KOReader to kill orphan pipelines. Check BT is paired in the plugin menu. |
| SSH refused on port 22 | KOReader uses port 2222: ssh root@<ip> -p 2222 |
.adds not visible |
Enable hidden files on your OS. The folder starts with a dot. |
When reporting a problem, please attach both files described below. The bug report captures your device environment (hardware, audio, settings) while the crash log captures KOReader's runtime behavior (errors, warnings, timing). Together they give the full picture needed to diagnose an issue.
- Reproduce the problem (use the plugin normally until the issue occurs).
- Generate the plugin bug report (see below).
- Locate the KOReader crash log (see below).
- Attach both files to your GitHub issue.
Tip: Generate the report and grab the crash log before restarting KOReader. The crash log is truncated on every launch, so restarting may discard the relevant entries.
The plugin's diagnostic report captures device info, TTS engine detection, audio configuration, Bluetooth status, and plugin settings. There are two ways to generate it:
Option A: From the plugin menu
Tools > Audiobook Read-Along > Generate bug report
This saves a .txt file to your device's root storage (see locations below).
Option B: Standalone script (when the plugin menu is not visible)
If the plugin doesn't appear in the KOReader menu at all, you can run the report generator directly via SSH or KOReader's built-in terminal emulator (Menu > More tools > Terminal emulator):
sh /mnt/onboard/.adds/koreader/plugins/audiobook.koplugin/generate-report.sh # Kobo
sh /mnt/us/koreader/plugins/audiobook.koplugin/generate-report.sh # Kindle
sh /sdcard/koreader/plugins/audiobook.koplugin/generate-report.sh # AndroidThe report is printed to the terminal and also saved to a file. If using the terminal emulator, you can scroll up to read it on screen.
Report location:
| Platform | Report location |
|---|---|
| Kobo | /mnt/onboard/audiobook-bug-report-*.txt |
| Kindle | /mnt/us/audiobook-bug-report-*.txt |
| Android | /sdcard/audiobook-bug-report-*.txt |
| Linux | ~/audiobook-bug-report-*.txt |
What the report contains:
- Device model, platform, screen size, kernel version
- KOReader version
- TTS engine detection results (which backends were found/missing)
- Audio player availability (aplay, GStreamer, etc.)
- Plugin settings (speech rate, highlight style, etc.)
- Memory and disk info
What the report does NOT contain:
- Book titles, content, or reading positions
- File paths with usernames (sanitized automatically)
- Highlights, bookmarks, or notes
- Network information or credentials
KOReader logs all warnings, errors, and debug output to a file called crash.log in its installation directory. This is not generated by the plugin -- it is KOReader's own runtime log and captures everything that happens during a session, including TTS process spawning, fallback events, and Lua errors.
Crash log location:
| Platform | Path |
|---|---|
| Kobo | /mnt/onboard/.adds/koreader/crash.log |
| Kindle | /mnt/us/koreader/crash.log |
| PocketBook | /mnt/ext1/applications/koreader/crash.log |
| Linux | Inside the KOReader installation directory |
| Android | No crash.log file -- use adb logcat to capture KOReader output |
Connect your device via USB and copy the file. On Kobo the .adds folder is hidden -- enable hidden files in your file manager to see it.
KOReader truncates
crash.logto ~500 KB on every launch. If you restart KOReader before copying the file, earlier entries may be lost. Copy it while KOReader is still running or immediately after the issue occurs.
| Diagnostic question | Bug report | Crash log |
|---|---|---|
| Device model, hardware specs, KOReader version, plugin version, audio output (ALSA, BT, GStreamer), TTS engines installed (espeak, Piper, Android), plugin settings (rate, highlight, voice), Bluetooth pairing and connection state | yes | no |
| Lua errors and stack traces, TTS process spawning and fallback events, sentence progression and page turns, timing of operations (delays, freezes), Piper server startup and delivery, device freeze or resource exhaustion | no | yes |
The plugin includes a built-in benchmark that measures TTS synthesis speed on your device. It runs a fixed set of test sentences through each available engine (espeak-ng, Piper) and saves a report you can share on GitHub to help document device performance.
Tools > Audiobook Read-Along > Generate bug report > Run device benchmark
The benchmark synthesizes five sentences of varying length (short dialogue, narrative prose, technical text, academic text, and short fragments) with each engine and model it finds. espeak-ng tests finish in seconds; Piper tests may take several minutes on slow devices like Kobo.
A progress message is shown between engine runs. The screen may appear unresponsive during individual synthesis calls -- this is expected.
When complete, a .txt report is saved to your device's root storage:
| Platform | Report location |
|---|---|
| Kobo | /mnt/onboard/audiobook-benchmark-*.txt |
| Kindle | /mnt/us/audiobook-benchmark-*.txt |
| Android | /sdcard/audiobook-benchmark-*.txt |
| Linux | ~/audiobook-benchmark-*.txt |
The report contains:
- Device info (platform, model, CPU cores, RAM, kernel)
- Plugin version
- Per-sentence synthesis time, audio duration, file size, and realtime factor for each engine/model
- Aggregate totals and average realtime factor
No book content, highlights, or personal data is included.
=== Audiobook TTS Benchmark (v0.1.5.10) ===
Generated: 2026-03-27T12:00:00Z
── Device ──
platform: kobo
model: Kobo Clara 2E
cpu_cores: 1
memory: 510396 kB
kernel: 4.1.15
── Test sentences ──
[1] short_dialogue (57 chars)
[2] medium_narrative (268 chars)
[3] medium_technical (254 chars)
[4] long_academic (362 chars)
[5] short_fragments (79 chars)
── espeak-ng ──
short_dialogue synth= 82ms audio= 3200ms size= 102444B rt=0.03x
medium_narrative synth= 310ms audio=15800ms size= 505244B rt=0.02x
...
── Piper danny-low (size=15.8MB, sr=16000Hz) ──
short_dialogue synth= 4200ms audio= 3100ms size= 99244B rt=1.35x
medium_narrative synth=18200ms audio=16800ms size= 537644B rt=1.08x
...
=== End of Benchmark ===
A realtime factor below 1.0x means synthesis is faster than playback (good). Above 1.0x means the user will hear pauses between sentences while the engine catches up.
Attach the report file to a GitHub issue or include it in a bug report. Benchmark data from different devices helps the project tune batch sizes, choose default voices, and set realistic expectations for each platform.
Android TTS is supported via a JNI bridge to the device's built-in TextToSpeech engine (Google, Samsung, etc.). No Termux, no extra APKs, no root required.
The pre-built release from GitHub Releases includes android/tts_helper.dex. Just unzip and copy:
- Download the release zip and extract it.
- Copy
audiobook.koplugin/to/sdcard/koreader/plugins/. - Restart KOReader. The plugin auto-detects Android and initializes the JNI bridge to the device's TTS engine.
If you cloned the repo instead of using a release, build the .dex first (requires Android SDK + Java):
cd audiobook.koplugin/android/
./build-dex.shThe plugin loads a small .dex file (tts_helper.dex, ~4KB) at runtime via Android's DexClassLoader. This helper wraps android.speech.tts.TextToSpeech with a polling-friendly API (since LuaJIT cannot implement Java callback interfaces). Synthesis produces standard WAV files that feed into the same pipeline used by espeak-ng and Piper.
Audio playback uses Android's MediaPlayer instead of aplay or GStreamer. Pause, resume, and stop all work through the MediaPlayer API.
For the full technical analysis, see docs/ANDROID_TTS.md.
- Uses the device's default TTS voice (voice picker UI not yet implemented)
- Word timing is estimated (Android TTS does not provide per-word callbacks when synthesizing to file)
- First sentence may have a brief delay while the TTS engine initializes
The package-for-kobo.sh script cross-compiles espeak-ng for ARM and bundles the plugin into a ready-to-deploy directory. It requires Nix for the cross-compilation toolchain.
# Plugin + espeak-ng only
bash package-for-kobo.sh
# Plugin + espeak-ng + Piper neural TTS
bash package-for-kobo.sh --with-piper
# Use a specific Piper voice (default: en_US-danny-low)
bash package-for-kobo.sh --piper-voice en_US-ryan-lowThe output is placed in kobo-tts-bundle/audiobook.koplugin/. Copy it to your device:
scp -P 2222 -r kobo-tts-bundle/audiobook.koplugin root@<kobo-ip>:/mnt/onboard/.adds/koreader/plugins/If you don't want to use the packaging script, you can assemble the Piper runtime yourself:
- Download the armv7l binary from Piper releases (2023.11.14-2).
- Extract
piper, itslib/directory, andespeak-ng-data/intoaudiobook.koplugin/piper/. - Download a voice model (
.onnx+.onnx.json) as described in Downloading additional voices and place them in the samepiper/directory.
Note: The rhasspy/piper repository was archived in October 2025. The binaries on the releases page still work. The project continues as OHF-Voice/piper1-gpl.
- Implement real word-level timing from TTS engines (SSML / phoneme callbacks)
- Add PDF/DjVu highlight support (currently EPUB only)
- Voice picker for Android TTS engines and voices
- Integrate more TTS backends
- Improve accessibility
- Support whole audiobook production with hash-based verification
- Evaluate plugin with other TTS models (e.g., KittenTTS)
- Test and optimize for ultralow-quality/size voice models
Copyright 2025-2026 gespitia - AGPL-3.0. See LICENSE.
| Bundled component | License |
|---|---|
| KOReader | AGPL-3.0 |
| espeak-ng | GPL-3.0+ |
| Piper | MIT |
| Piper voices | MIT |
| glibc (bundled .so) | LGPL-2.1 |