Audiobook Read-Along Plugin for KOReader

Consider supporting:

Text-to-speech for KOReader with synchronized word highlighting, automatic page turns, and Bluetooth audio support. Works offline on Kobo, Kindle, Android, and Linux.

Quick start

1. Download and copy the plugin

Download the .zip file from the latest release (look for audiobook-koplugin-v*.zip under Assets). Do not download "Source code (zip)" -- that only contains the Lua sources without the bundled TTS engines.

Unzip it and copy the audiobook.koplugin folder into KOReader's plugins directory:

Platform	Path
Kobo	`.adds/koreader/plugins/`
Kindle	`koreader/plugins/`
Linux	`~/.config/koreader/plugins/`
Android	`/sdcard/koreader/plugins/`
PocketBook	`applications/koreader/plugins/`

Restart KOReader after copying.

2. Install a TTS engine (if not using the pre-built release)

The pre-built release from step 1 already includes espeak-ng and Piper -- no extra install needed on Kobo or Kindle. Skip to step 3.

If you cloned the repository instead:

Kobo -- install espeak-ng via SSH or the terminal emulator (Menu > More tools > Terminal emulator):

opkg update && opkg install espeak-ng

If opkg is unavailable, grab the .ipk from nickel-packages and run opkg install /mnt/onboard/espeak-ng*.ipk.

Linux -- sudo apt install espeak-ng

Android (Boox, etc.) -- the pre-built release includes tts_helper.dex, which bridges to the device's built-in TTS engine (Google, Samsung, etc.). Just unzip and copy the folder like any other platform. No extra steps needed.

If you cloned the repo instead of downloading a release, build the .dex manually (requires Android SDK):

cd audiobook.koplugin/android/
./build-dex.sh

The bundled espeak-ng and Piper binaries are for Linux-based e-readers and will not run on Android. See Android support for details.

3. Start reading

Long-press a word to open the dictionary popup, then tap Read aloud from here.
Or select a paragraph, then tap Read aloud from here in the selection menu.
Or go to Tools > Audiobook Read-Along > Start reading from current page.

Optional: Piper neural TTS

Piper sounds much more natural than espeak-ng. It runs fully offline on Kobo's ARM processor (~40 MB for engine + voice model). The pre-built release already includes Piper and a default voice (en_US-danny-low). For faster load times on Kobo, low quality voices like this one are recommended (see Choosing a voice). To build a bundle yourself, see Building from source.

Switch between espeak-ng and Piper any time from Tools > Audiobook Read-Along > Voice settings.

Choosing a voice

Listen to samples and pick a voice: rhasspy.github.io/piper-samples

Voices come in four quality levels:

Quality	Sample rate	Size	Notes
low	16 kHz	~15 MB	Recommended for Kobo -- fast load, low RAM
medium	22 kHz	~60 MB	Better quality, but slower to load on Kobo
high	22 kHz	~100 MB	Best quality, more RAM/CPU

On Kobo (512 MB RAM), low voices are recommended. medium works but the model takes noticeably longer to load. Not every voice is available at every quality level -- check HuggingFace for what's offered.

Downloading additional voices

Every voice needs two files: a .onnx model and a .onnx.json config. Place both in audiobook.koplugin/piper/.

Voices are hosted on HuggingFace. The URL pattern is:

https://huggingface.co/rhasspy/piper-voices/resolve/main/<lang>/<lang_REGION>/<speaker>/<quality>/

For example, to download en_US-lessac-medium:

cd audiobook.koplugin/piper/
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

Or for en_US-ryan-low:

curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/low/en_US-ryan-low.onnx
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/low/en_US-ryan-low.onnx.json

Browse all available voices: huggingface.co/rhasspy/piper-voices

Bluetooth audio (Kobo)

The plugin outputs audio through a Bluetooth A2DP connection when a BT device is paired. The connection is managed through the plugin menu:

Tools > Audiobook Read-Along > Bluetooth settings

Two Bluetooth stacks are supported, auto-detected at runtime:

Stack	Devices	Audio path
MTK (mtkbtmwrpc)	Clara 2E, Sage, Libra Colour	GStreamer persistent pipeline
BlueZ (bluetoothd)	Libra 2 / Io	aplay via ALSA

On MTK devices the BT audio pipeline uses an exclusive abstract socket. If audio stops working after a crash, restart KOReader -- the plugin kills orphan processes on startup.

On MTK Kobo devices, the mtkbtmwrpc daemon binds a single abstract socket. Only one GStreamer pipeline can hold it at a time. The plugin keeps one persistent pipeline alive across sentences to avoid reconnection gaps. On BlueZ devices, the plugin starts bluetoothd and resets the HCI adapter automatically when you power on Bluetooth.

For the full platform audio and Bluetooth architecture (Kobo generations, Kindle, Android), see docs/PLATFORM_AUDIO.md.

Playback controls

Button	Action
Rewind	Previous sentence. Hold for 3x skip.
Play/Pause	Toggle playback.
Forward	Next sentence. Hold for 3x skip.
Close	Stop reading and dismiss the bar.

Reading pauses automatically when you open a menu and resumes when you close it.

Settings

All settings are under Tools > Audiobook Read-Along:

Bluetooth settings - pair, connect, disconnect alert interval
Voice settings - TTS engine, voice, speech rate, pitch, volume, sentence/paragraph pauses (espeak-ng), sentence/paragraph gaps (Piper), word gap, clause pause
Highlight style - background (default), invert (best for e-ink), underline, box
Auto-advance pages - turn pages automatically
Highlight words / sentences - toggle each independently
Quick start with espeak - play first sentence with espeak-ng while Piper loads (avoids the ~3s cold start silence)
Keep playing when lid is closed - prevents device suspend so audio continues with the case closed
BT headset media buttons - use play/pause/next/prev on a Bluetooth headset or speaker to control TTS playback

Architecture

audiobook.koplugin/
  main.lua             - entry point, menus, event hooks
  synccontroller.lua   - coordinates audio timing with highlights
  ttsengine.lua        - TTS synthesis, audio playback, backend detection
  piperqueue.lua       - persistent Piper server management
  textparser.lua       - sentence/word tokenization with positions
  highlightmanager.lua - screen-coordinate highlight via crengine
  playbackbar.lua      - transport controls widget
  menubuilder.lua      - voice/highlight settings menus
  btmanager.lua        - Bluetooth device scanning and pairing (MTK + BlueZ)
  btui.lua             - BT menu UI and disconnect watcher
  btmediacontrol.lua   - BT headset media buttons (AVRCP play/pause/skip)
  benchmarkrunner.lua  - in-plugin TTS benchmark runner
  wavutils.lua         - WAV file reading, writing, and manipulation
  androidtts.lua       - Android TTS via JNI (DexClassLoader + TtsHelper)
  utils.lua            - shared helpers

Design notes

Persistent Piper server. On Kobo's single-core ARM, loading the ONNX model takes ~4.5 seconds. A persistent server process keeps the model in memory and accepts sentences over a FIFO. Combined with 3-sentence batching this brings the realtime factor from 0.085x (old 2-server config) to 0.329x. See dev/benchmark/RESULTS.md for the full analysis.

Binary-search highlight alignment. CRe (crengine) snaps text selections to word boundaries, and proportional fonts make character-to-pixel estimates unreliable. The highlight manager uses the proportional estimate as an initial guess, then binary-searches the x coordinate by querying CRe until the selected text matches the target sentence. Converges in 2-4 queries.

Exclusive BT socket (MTK only). Kobo's MediaTek BT firmware exposes a single abstract socket (@kobo:mtkbtmwrpc). The plugin keeps one GStreamer pipeline alive for the entire reading session and feeds audio through a FIFO. Orphan pipelines from crashes are killed on startup via PID files and pkill. On BlueZ devices (Libra 2, etc.) audio goes through standard ALSA and this socket management is not needed.

Long-sentence splitting. Piper's attention mechanism scales quadratically with input length. On Kobo's 512 MB of RAM the server OOMs on sentences above ~1000 characters and throughput drops from ~7 ch/s at 300 chars to ~3 ch/s at 1400 chars. The text parser automatically splits any sentence longer than 300 characters at natural clause boundaries (; : , and/but/or... -) then merges fragments shorter than 80 characters with a neighbour (below that, ~90% of synthesis time is wasted on per-request overhead) and re-splits anything still over 300 at word boundaries. See dev/benchmark/RESULTS_LONG.md for the full data.

Troubleshooting

Problem	Fix
Plugin not in menu	Folder must be `audiobook.koplugin` inside `plugins/`. The plugin only appears in the Tools menu when a book is open. Restart KOReader after copying.
No sound	Run `espeak-ng "hello" -w /tmp/t.wav && aplay /tmp/t.wav` over SSH.
No audio player found (Kindle)	Pair BT headphones via the Kindle top-swipe menu before starting playback. If already paired, restart KOReader so the plugin re-detects the audio output.
No TTS engine found	Install espeak-ng (see Quick start).
No TTS engine found (Android)	Ensure `android/tts_helper.dex` is present inside the plugin folder. The pre-built release includes it; if you cloned from source, run `./build-dex.sh` in the `android/` directory. The device must also have a TTS engine installed (most do by default). See Android support.
BT audio silent	Restart KOReader to kill orphan pipelines. Check BT is paired in the plugin menu.
SSH refused on port 22	KOReader uses port 2222: `ssh root@<ip> -p 2222`
`.adds` not visible	Enable hidden files on your OS. The folder starts with a dot.

Filing a bug report

When reporting a problem, please attach both files described below. The bug report captures your device environment (hardware, audio, settings) while the crash log captures KOReader's runtime behavior (errors, warnings, timing). Together they give the full picture needed to diagnose an issue.

Reproduce the problem (use the plugin normally until the issue occurs).
Generate the plugin bug report (see below).
Locate the KOReader crash log (see below).
Attach both files to your GitHub issue.

Tip: Generate the report and grab the crash log before restarting KOReader. The crash log is truncated on every launch, so restarting may discard the relevant entries.

1. Plugin bug report

The plugin's diagnostic report captures device info, TTS engine detection, audio configuration, Bluetooth status, and plugin settings. There are two ways to generate it:

Option A: From the plugin menu

Tools > Audiobook Read-Along > Generate bug report

This saves a .txt file to your device's root storage (see locations below).

Option B: Standalone script (when the plugin menu is not visible)

If the plugin doesn't appear in the KOReader menu at all, you can run the report generator directly via SSH or KOReader's built-in terminal emulator (Menu > More tools > Terminal emulator):

sh /mnt/onboard/.adds/koreader/plugins/audiobook.koplugin/generate-report.sh   # Kobo
sh /mnt/us/koreader/plugins/audiobook.koplugin/generate-report.sh              # Kindle
sh /sdcard/koreader/plugins/audiobook.koplugin/generate-report.sh              # Android

The report is printed to the terminal and also saved to a file. If using the terminal emulator, you can scroll up to read it on screen.

Report location:

Platform	Report location
Kobo	`/mnt/onboard/audiobook-bug-report-*.txt`
Kindle	`/mnt/us/audiobook-bug-report-*.txt`
Android	`/sdcard/audiobook-bug-report-*.txt`
Linux	`~/audiobook-bug-report-*.txt`

What the report contains:

Device model, platform, screen size, kernel version
KOReader version
TTS engine detection results (which backends were found/missing)
Audio player availability (aplay, GStreamer, etc.)
Plugin settings (speech rate, highlight style, etc.)
Memory and disk info

What the report does NOT contain:

Book titles, content, or reading positions
File paths with usernames (sanitized automatically)
Highlights, bookmarks, or notes
Network information or credentials

2. KOReader crash log

KOReader logs all warnings, errors, and debug output to a file called crash.log in its installation directory. This is not generated by the plugin -- it is KOReader's own runtime log and captures everything that happens during a session, including TTS process spawning, fallback events, and Lua errors.

Crash log location:

Platform	Path
Kobo	`/mnt/onboard/.adds/koreader/crash.log`
Kindle	`/mnt/us/koreader/crash.log`
PocketBook	`/mnt/ext1/applications/koreader/crash.log`
Linux	Inside the KOReader installation directory
Android	No `crash.log` file -- use `adb logcat` to capture KOReader output

Connect your device via USB and copy the file. On Kobo the .adds folder is hidden -- enable hidden files in your file manager to see it.

KOReader truncates crash.log to ~500 KB on every launch. If you restart KOReader before copying the file, earlier entries may be lost. Copy it while KOReader is still running or immediately after the issue occurs.

Why both files matter

Diagnostic question	Bug report	Crash log
Device model, hardware specs, KOReader version, plugin version, audio output (ALSA, BT, GStreamer), TTS engines installed (espeak, Piper, Android), plugin settings (rate, highlight, voice), Bluetooth pairing and connection state	yes	no
Lua errors and stack traces, TTS process spawning and fallback events, sentence progression and page turns, timing of operations (delays, freezes), Piper server startup and delivery, device freeze or resource exhaustion	no	yes

Device benchmark

The plugin includes a built-in benchmark that measures TTS synthesis speed on your device. It runs a fixed set of test sentences through each available engine (espeak-ng, Piper) and saves a report you can share on GitHub to help document device performance.

Running the benchmark

Tools > Audiobook Read-Along > Generate bug report > Run device benchmark

The benchmark synthesizes five sentences of varying length (short dialogue, narrative prose, technical text, academic text, and short fragments) with each engine and model it finds. espeak-ng tests finish in seconds; Piper tests may take several minutes on slow devices like Kobo.

A progress message is shown between engine runs. The screen may appear unresponsive during individual synthesis calls -- this is expected.

Output

When complete, a .txt report is saved to your device's root storage:

Platform	Report location
Kobo	`/mnt/onboard/audiobook-benchmark-*.txt`
Kindle	`/mnt/us/audiobook-benchmark-*.txt`
Android	`/sdcard/audiobook-benchmark-*.txt`
Linux	`~/audiobook-benchmark-*.txt`

The report contains:

Device info (platform, model, CPU cores, RAM, kernel)
Plugin version
Per-sentence synthesis time, audio duration, file size, and realtime factor for each engine/model
Aggregate totals and average realtime factor

No book content, highlights, or personal data is included.

Example output

=== Audiobook TTS Benchmark (v0.1.5.10) ===
Generated: 2026-03-27T12:00:00Z

── Device ──
  platform: kobo
  model: Kobo Clara 2E
  cpu_cores: 1
  memory: 510396 kB
  kernel: 4.1.15

── Test sentences ──
  [1] short_dialogue (57 chars)
  [2] medium_narrative (268 chars)
  [3] medium_technical (254 chars)
  [4] long_academic (362 chars)
  [5] short_fragments (79 chars)

── espeak-ng ──
  short_dialogue          synth=   82ms  audio= 3200ms  size= 102444B  rt=0.03x
  medium_narrative        synth=  310ms  audio=15800ms  size= 505244B  rt=0.02x
  ...

── Piper danny-low  (size=15.8MB, sr=16000Hz) ──
  short_dialogue          synth= 4200ms  audio= 3100ms  size=  99244B  rt=1.35x
  medium_narrative        synth=18200ms  audio=16800ms  size= 537644B  rt=1.08x
  ...

=== End of Benchmark ===

A realtime factor below 1.0x means synthesis is faster than playback (good). Above 1.0x means the user will hear pauses between sentences while the engine catches up.

Sharing your results

Attach the report file to a GitHub issue or include it in a bug report. Benchmark data from different devices helps the project tune batch sizes, choose default voices, and set realistic expectations for each platform.

Android support

Android TTS is supported via a JNI bridge to the device's built-in TextToSpeech engine (Google, Samsung, etc.). No Termux, no extra APKs, no root required.

Feature	Status
Plugin loads in KOReader
Text parsing & highlighting
Android system TTS	Via JNI bridge to `TextToSpeech` API
Audio playback	Via Android `MediaPlayer`
Bundled espeak-ng / Piper	Linux binaries, won't run on Android
espeak-ng via Termux	May work if `espeak-ng` is in PATH

Setup

The pre-built release from GitHub Releases includes android/tts_helper.dex. Just unzip and copy:

Download the release zip and extract it.
Copy audiobook.koplugin/ to /sdcard/koreader/plugins/.
Restart KOReader. The plugin auto-detects Android and initializes the JNI bridge to the device's TTS engine.

If you cloned the repo instead of using a release, build the .dex first (requires Android SDK + Java):

cd audiobook.koplugin/android/
./build-dex.sh

How it works

The plugin loads a small .dex file (tts_helper.dex, ~4KB) at runtime via Android's DexClassLoader. This helper wraps android.speech.tts.TextToSpeech with a polling-friendly API (since LuaJIT cannot implement Java callback interfaces). Synthesis produces standard WAV files that feed into the same pipeline used by espeak-ng and Piper.

Audio playback uses Android's MediaPlayer instead of aplay or GStreamer. Pause, resume, and stop all work through the MediaPlayer API.

For the full technical analysis, see docs/ANDROID_TTS.md.

Limitations

Uses the device's default TTS voice (voice picker UI not yet implemented)
Word timing is estimated (Android TTS does not provide per-word callbacks when synthesizing to file)
First sentence may have a brief delay while the TTS engine initializes

Building from source

The package-for-kobo.sh script cross-compiles espeak-ng for ARM and bundles the plugin into a ready-to-deploy directory. It requires Nix for the cross-compilation toolchain.

# Plugin + espeak-ng only
bash package-for-kobo.sh

# Plugin + espeak-ng + Piper neural TTS
bash package-for-kobo.sh --with-piper

# Use a specific Piper voice (default: en_US-danny-low)
bash package-for-kobo.sh --piper-voice en_US-ryan-low

The output is placed in kobo-tts-bundle/audiobook.koplugin/. Copy it to your device:

scp -P 2222 -r kobo-tts-bundle/audiobook.koplugin root@<kobo-ip>:/mnt/onboard/.adds/koreader/plugins/

Installing the Piper binary manually

If you don't want to use the packaging script, you can assemble the Piper runtime yourself:

Download the armv7l binary from Piper releases (2023.11.14-2).
Extract piper, its lib/ directory, and espeak-ng-data/ into audiobook.koplugin/piper/.
Download a voice model (.onnx + .onnx.json) as described in Downloading additional voices and place them in the same piper/ directory.

Note: The rhasspy/piper repository was archived in October 2025. The binaries on the releases page still work. The project continues as OHF-Voice/piper1-gpl.

To Do

Implement real word-level timing from TTS engines (SSML / phoneme callbacks)
Add PDF/DjVu highlight support (currently EPUB only)
Voice picker for Android TTS engines and voices
Integrate more TTS backends
Improve accessibility
Support whole audiobook production with hash-based verification
Evaluate plugin with other TTS models (e.g., KittenTTS)
Test and optimize for ultralow-quality/size voice models

License

Bundled component	License
KOReader	AGPL-3.0
espeak-ng	GPL-3.0+
Piper	MIT
Piper voices	MIT
glibc (bundled .so)	LGPL-2.1

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
android		android
dev		dev
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_meta.lua		_meta.lua
androidtts.lua		androidtts.lua
benchmarkrunner.lua		benchmarkrunner.lua
btmanager.lua		btmanager.lua
btmediacontrol.lua		btmediacontrol.lua
btui.lua		btui.lua
bugreport.lua		bugreport.lua
cross-build-bluealsa.nix		cross-build-bluealsa.nix
cross-build-espeak.nix		cross-build-espeak.nix
flake.nix		flake.nix
generate-report.sh		generate-report.sh
highlightmanager.lua		highlightmanager.lua
main.lua		main.lua
menubuilder.lua		menubuilder.lua
package-for-kobo.sh		package-for-kobo.sh
piperqueue.lua		piperqueue.lua
playbackbar.lua		playbackbar.lua
synccontroller.lua		synccontroller.lua
textparser.lua		textparser.lua
ttsengine.lua		ttsengine.lua
utils.lua		utils.lua
wavutils.lua		wavutils.lua

Folders and files

Latest commit

History

Repository files navigation

Audiobook Read-Along Plugin for KOReader

Consider supporting:

Quick start

1. Download and copy the plugin

2. Install a TTS engine (if not using the pre-built release)

3. Start reading

Optional: Piper neural TTS

Choosing a voice

Downloading additional voices

Bluetooth audio (Kobo)

Playback controls

Settings

Architecture

Design notes

Troubleshooting

Filing a bug report

1. Plugin bug report

2. KOReader crash log

Why both files matter

Device benchmark

Running the benchmark

Output

Example output

Sharing your results

Android support

Setup

How it works

Limitations

Building from source

Installing the Piper binary manually

To Do

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 34

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages