Skip to content

stradichenko/audiobook.koplugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audiobook Read-Along Plugin for KOReader

License: AGPL-3.0 Platform Android TTS

Consider supporting:

Patreon GitHub Sponsors BuyMeACoffee

Share on X

Text-to-speech for KOReader with synchronized word highlighting, automatic page turns, and Bluetooth audio support. Works offline on Kobo, Kindle, Android, and Linux.

Quick start

1. Download and copy the plugin

Download the .zip file from the latest release (look for audiobook-koplugin-v*.zip under Assets). Do not download "Source code (zip)" -- that only contains the Lua sources without the bundled TTS engines.

Unzip it and copy the audiobook.koplugin folder into KOReader's plugins directory:

Platform Path
Kobo .adds/koreader/plugins/
Kindle koreader/plugins/
Linux ~/.config/koreader/plugins/
Android /sdcard/koreader/plugins/
PocketBook applications/koreader/plugins/

Restart KOReader after copying.

2. Install a TTS engine (if not using the pre-built release)

The pre-built release from step 1 already includes espeak-ng and Piper -- no extra install needed on Kobo or Kindle. Skip to step 3.

If you cloned the repository instead:

Kobo -- install espeak-ng via SSH or the terminal emulator (Menu > More tools > Terminal emulator):

opkg update && opkg install espeak-ng

If opkg is unavailable, grab the .ipk from nickel-packages and run opkg install /mnt/onboard/espeak-ng*.ipk.

Linux -- sudo apt install espeak-ng

Android (Boox, etc.) -- the pre-built release includes tts_helper.dex, which bridges to the device's built-in TTS engine (Google, Samsung, etc.). Just unzip and copy the folder like any other platform. No extra steps needed.

If you cloned the repo instead of downloading a release, build the .dex manually (requires Android SDK):

cd audiobook.koplugin/android/
./build-dex.sh

The bundled espeak-ng and Piper binaries are for Linux-based e-readers and will not run on Android. See Android support for details.

3. Start reading

  • Long-press a word to open the dictionary popup, then tap Read aloud from here.
  • Or select a paragraph, then tap Read aloud from here in the selection menu.
  • Or go to Tools > Audiobook Read-Along > Start reading from current page.

Optional: Piper neural TTS

Piper sounds much more natural than espeak-ng. It runs fully offline on Kobo's ARM processor (~40 MB for engine + voice model). The pre-built release already includes Piper and a default voice (en_US-danny-low). For faster load times on Kobo, low quality voices like this one are recommended (see Choosing a voice). To build a bundle yourself, see Building from source.

Switch between espeak-ng and Piper any time from Tools > Audiobook Read-Along > Voice settings.

Choosing a voice

Listen to samples and pick a voice: rhasspy.github.io/piper-samples

Voices come in four quality levels:

Quality Sample rate Size Notes
low 16 kHz ~15 MB Recommended for Kobo -- fast load, low RAM
medium 22 kHz ~60 MB Better quality, but slower to load on Kobo
high 22 kHz ~100 MB Best quality, more RAM/CPU

On Kobo (512 MB RAM), low voices are recommended. medium works but the model takes noticeably longer to load. Not every voice is available at every quality level -- check HuggingFace for what's offered.

Downloading additional voices

Every voice needs two files: a .onnx model and a .onnx.json config. Place both in audiobook.koplugin/piper/.

Voices are hosted on HuggingFace. The URL pattern is:

https://huggingface.co/rhasspy/piper-voices/resolve/main/<lang>/<lang_REGION>/<speaker>/<quality>/

For example, to download en_US-lessac-medium:

cd audiobook.koplugin/piper/
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

Or for en_US-ryan-low:

curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/low/en_US-ryan-low.onnx
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/low/en_US-ryan-low.onnx.json

Browse all available voices: huggingface.co/rhasspy/piper-voices

Bluetooth audio (Kobo)

The plugin outputs audio through a Bluetooth A2DP connection when a BT device is paired. The connection is managed through the plugin menu:

Tools > Audiobook Read-Along > Bluetooth settings

Two Bluetooth stacks are supported, auto-detected at runtime:

Stack Devices Audio path
MTK (mtkbtmwrpc) Clara 2E, Sage, Libra Colour GStreamer persistent pipeline
BlueZ (bluetoothd) Libra 2 / Io aplay via ALSA

On MTK devices the BT audio pipeline uses an exclusive abstract socket. If audio stops working after a crash, restart KOReader -- the plugin kills orphan processes on startup.

On MTK Kobo devices, the mtkbtmwrpc daemon binds a single abstract socket. Only one GStreamer pipeline can hold it at a time. The plugin keeps one persistent pipeline alive across sentences to avoid reconnection gaps. On BlueZ devices, the plugin starts bluetoothd and resets the HCI adapter automatically when you power on Bluetooth.

For the full platform audio and Bluetooth architecture (Kobo generations, Kindle, Android), see docs/PLATFORM_AUDIO.md.

Playback controls

Button Action
Rewind Previous sentence. Hold for 3x skip.
Play/Pause Toggle playback.
Forward Next sentence. Hold for 3x skip.
Close Stop reading and dismiss the bar.

Reading pauses automatically when you open a menu and resumes when you close it.

Settings

All settings are under Tools > Audiobook Read-Along:

  • Bluetooth settings - pair, connect, disconnect alert interval
  • Voice settings - TTS engine, voice, speech rate, pitch, volume, sentence/paragraph pauses (espeak-ng), sentence/paragraph gaps (Piper), word gap, clause pause
  • Highlight style - background (default), invert (best for e-ink), underline, box
  • Auto-advance pages - turn pages automatically
  • Highlight words / sentences - toggle each independently
  • Quick start with espeak - play first sentence with espeak-ng while Piper loads (avoids the ~3s cold start silence)
  • Keep playing when lid is closed - prevents device suspend so audio continues with the case closed
  • BT headset media buttons - use play/pause/next/prev on a Bluetooth headset or speaker to control TTS playback

Architecture

audiobook.koplugin/
  main.lua             - entry point, menus, event hooks
  synccontroller.lua   - coordinates audio timing with highlights
  ttsengine.lua        - TTS synthesis, audio playback, backend detection
  piperqueue.lua       - persistent Piper server management
  textparser.lua       - sentence/word tokenization with positions
  highlightmanager.lua - screen-coordinate highlight via crengine
  playbackbar.lua      - transport controls widget
  menubuilder.lua      - voice/highlight settings menus
  btmanager.lua        - Bluetooth device scanning and pairing (MTK + BlueZ)
  btui.lua             - BT menu UI and disconnect watcher
  btmediacontrol.lua   - BT headset media buttons (AVRCP play/pause/skip)
  benchmarkrunner.lua  - in-plugin TTS benchmark runner
  wavutils.lua         - WAV file reading, writing, and manipulation
  androidtts.lua       - Android TTS via JNI (DexClassLoader + TtsHelper)
  utils.lua            - shared helpers

Design notes

Persistent Piper server. On Kobo's single-core ARM, loading the ONNX model takes ~4.5 seconds. A persistent server process keeps the model in memory and accepts sentences over a FIFO. Combined with 3-sentence batching this brings the realtime factor from 0.085x (old 2-server config) to 0.329x. See dev/benchmark/RESULTS.md for the full analysis.

Binary-search highlight alignment. CRe (crengine) snaps text selections to word boundaries, and proportional fonts make character-to-pixel estimates unreliable. The highlight manager uses the proportional estimate as an initial guess, then binary-searches the x coordinate by querying CRe until the selected text matches the target sentence. Converges in 2-4 queries.

Exclusive BT socket (MTK only). Kobo's MediaTek BT firmware exposes a single abstract socket (@kobo:mtkbtmwrpc). The plugin keeps one GStreamer pipeline alive for the entire reading session and feeds audio through a FIFO. Orphan pipelines from crashes are killed on startup via PID files and pkill. On BlueZ devices (Libra 2, etc.) audio goes through standard ALSA and this socket management is not needed.

Long-sentence splitting. Piper's attention mechanism scales quadratically with input length. On Kobo's 512 MB of RAM the server OOMs on sentences above ~1000 characters and throughput drops from ~7 ch/s at 300 chars to ~3 ch/s at 1400 chars. The text parser automatically splits any sentence longer than 300 characters at natural clause boundaries (; : , and/but/or... -) then merges fragments shorter than 80 characters with a neighbour (below that, ~90% of synthesis time is wasted on per-request overhead) and re-splits anything still over 300 at word boundaries. See dev/benchmark/RESULTS_LONG.md for the full data.

Troubleshooting

Problem Fix
Plugin not in menu Folder must be audiobook.koplugin inside plugins/. The plugin only appears in the Tools menu when a book is open. Restart KOReader after copying.
No sound Run espeak-ng "hello" -w /tmp/t.wav && aplay /tmp/t.wav over SSH.
No audio player found (Kindle) Pair BT headphones via the Kindle top-swipe menu before starting playback. If already paired, restart KOReader so the plugin re-detects the audio output.
No TTS engine found Install espeak-ng (see Quick start).
No TTS engine found (Android) Ensure android/tts_helper.dex is present inside the plugin folder. The pre-built release includes it; if you cloned from source, run ./build-dex.sh in the android/ directory. The device must also have a TTS engine installed (most do by default). See Android support.
BT audio silent Restart KOReader to kill orphan pipelines. Check BT is paired in the plugin menu.
SSH refused on port 22 KOReader uses port 2222: ssh root@<ip> -p 2222
.adds not visible Enable hidden files on your OS. The folder starts with a dot.

Filing a bug report

When reporting a problem, please attach both files described below. The bug report captures your device environment (hardware, audio, settings) while the crash log captures KOReader's runtime behavior (errors, warnings, timing). Together they give the full picture needed to diagnose an issue.

  1. Reproduce the problem (use the plugin normally until the issue occurs).
  2. Generate the plugin bug report (see below).
  3. Locate the KOReader crash log (see below).
  4. Attach both files to your GitHub issue.

Tip: Generate the report and grab the crash log before restarting KOReader. The crash log is truncated on every launch, so restarting may discard the relevant entries.


1. Plugin bug report

The plugin's diagnostic report captures device info, TTS engine detection, audio configuration, Bluetooth status, and plugin settings. There are two ways to generate it:

Option A: From the plugin menu

Tools > Audiobook Read-Along > Generate bug report

This saves a .txt file to your device's root storage (see locations below).

Option B: Standalone script (when the plugin menu is not visible)

If the plugin doesn't appear in the KOReader menu at all, you can run the report generator directly via SSH or KOReader's built-in terminal emulator (Menu > More tools > Terminal emulator):

sh /mnt/onboard/.adds/koreader/plugins/audiobook.koplugin/generate-report.sh   # Kobo
sh /mnt/us/koreader/plugins/audiobook.koplugin/generate-report.sh              # Kindle
sh /sdcard/koreader/plugins/audiobook.koplugin/generate-report.sh              # Android

The report is printed to the terminal and also saved to a file. If using the terminal emulator, you can scroll up to read it on screen.

Report location:

Platform Report location
Kobo /mnt/onboard/audiobook-bug-report-*.txt
Kindle /mnt/us/audiobook-bug-report-*.txt
Android /sdcard/audiobook-bug-report-*.txt
Linux ~/audiobook-bug-report-*.txt

What the report contains:

  • Device model, platform, screen size, kernel version
  • KOReader version
  • TTS engine detection results (which backends were found/missing)
  • Audio player availability (aplay, GStreamer, etc.)
  • Plugin settings (speech rate, highlight style, etc.)
  • Memory and disk info

What the report does NOT contain:

  • Book titles, content, or reading positions
  • File paths with usernames (sanitized automatically)
  • Highlights, bookmarks, or notes
  • Network information or credentials

2. KOReader crash log

KOReader logs all warnings, errors, and debug output to a file called crash.log in its installation directory. This is not generated by the plugin -- it is KOReader's own runtime log and captures everything that happens during a session, including TTS process spawning, fallback events, and Lua errors.

Crash log location:

Platform Path
Kobo /mnt/onboard/.adds/koreader/crash.log
Kindle /mnt/us/koreader/crash.log
PocketBook /mnt/ext1/applications/koreader/crash.log
Linux Inside the KOReader installation directory
Android No crash.log file -- use adb logcat to capture KOReader output

Connect your device via USB and copy the file. On Kobo the .adds folder is hidden -- enable hidden files in your file manager to see it.

KOReader truncates crash.log to ~500 KB on every launch. If you restart KOReader before copying the file, earlier entries may be lost. Copy it while KOReader is still running or immediately after the issue occurs.


Why both files matter

Diagnostic question Bug report Crash log
Device model, hardware specs, KOReader version, plugin version, audio output (ALSA, BT, GStreamer), TTS engines installed (espeak, Piper, Android), plugin settings (rate, highlight, voice), Bluetooth pairing and connection state yes no
Lua errors and stack traces, TTS process spawning and fallback events, sentence progression and page turns, timing of operations (delays, freezes), Piper server startup and delivery, device freeze or resource exhaustion no yes

Device benchmark

The plugin includes a built-in benchmark that measures TTS synthesis speed on your device. It runs a fixed set of test sentences through each available engine (espeak-ng, Piper) and saves a report you can share on GitHub to help document device performance.

Running the benchmark

Tools > Audiobook Read-Along > Generate bug report > Run device benchmark

The benchmark synthesizes five sentences of varying length (short dialogue, narrative prose, technical text, academic text, and short fragments) with each engine and model it finds. espeak-ng tests finish in seconds; Piper tests may take several minutes on slow devices like Kobo.

A progress message is shown between engine runs. The screen may appear unresponsive during individual synthesis calls -- this is expected.

Output

When complete, a .txt report is saved to your device's root storage:

Platform Report location
Kobo /mnt/onboard/audiobook-benchmark-*.txt
Kindle /mnt/us/audiobook-benchmark-*.txt
Android /sdcard/audiobook-benchmark-*.txt
Linux ~/audiobook-benchmark-*.txt

The report contains:

  • Device info (platform, model, CPU cores, RAM, kernel)
  • Plugin version
  • Per-sentence synthesis time, audio duration, file size, and realtime factor for each engine/model
  • Aggregate totals and average realtime factor

No book content, highlights, or personal data is included.

Example output

=== Audiobook TTS Benchmark (v0.1.5.10) ===
Generated: 2026-03-27T12:00:00Z

── Device ──
  platform: kobo
  model: Kobo Clara 2E
  cpu_cores: 1
  memory: 510396 kB
  kernel: 4.1.15

── Test sentences ──
  [1] short_dialogue (57 chars)
  [2] medium_narrative (268 chars)
  [3] medium_technical (254 chars)
  [4] long_academic (362 chars)
  [5] short_fragments (79 chars)

── espeak-ng ──
  short_dialogue          synth=   82ms  audio= 3200ms  size= 102444B  rt=0.03x
  medium_narrative        synth=  310ms  audio=15800ms  size= 505244B  rt=0.02x
  ...

── Piper danny-low  (size=15.8MB, sr=16000Hz) ──
  short_dialogue          synth= 4200ms  audio= 3100ms  size=  99244B  rt=1.35x
  medium_narrative        synth=18200ms  audio=16800ms  size= 537644B  rt=1.08x
  ...

=== End of Benchmark ===

A realtime factor below 1.0x means synthesis is faster than playback (good). Above 1.0x means the user will hear pauses between sentences while the engine catches up.

Sharing your results

Attach the report file to a GitHub issue or include it in a bug report. Benchmark data from different devices helps the project tune batch sizes, choose default voices, and set realistic expectations for each platform.

Android support

Android TTS is supported via a JNI bridge to the device's built-in TextToSpeech engine (Google, Samsung, etc.). No Termux, no extra APKs, no root required.

Feature Status
Plugin loads in KOReader Works
Text parsing & highlighting Works
Android system TTS Works Via JNI bridge to TextToSpeech API
Audio playback Works Via Android MediaPlayer
Bundled espeak-ng / Piper No Linux binaries, won't run on Android
espeak-ng via Termux Untested May work if espeak-ng is in PATH

Setup

The pre-built release from GitHub Releases includes android/tts_helper.dex. Just unzip and copy:

  1. Download the release zip and extract it.
  2. Copy audiobook.koplugin/ to /sdcard/koreader/plugins/.
  3. Restart KOReader. The plugin auto-detects Android and initializes the JNI bridge to the device's TTS engine.

If you cloned the repo instead of using a release, build the .dex first (requires Android SDK + Java):

cd audiobook.koplugin/android/
./build-dex.sh

How it works

The plugin loads a small .dex file (tts_helper.dex, ~4KB) at runtime via Android's DexClassLoader. This helper wraps android.speech.tts.TextToSpeech with a polling-friendly API (since LuaJIT cannot implement Java callback interfaces). Synthesis produces standard WAV files that feed into the same pipeline used by espeak-ng and Piper.

Audio playback uses Android's MediaPlayer instead of aplay or GStreamer. Pause, resume, and stop all work through the MediaPlayer API.

For the full technical analysis, see docs/ANDROID_TTS.md.

Limitations

  • Uses the device's default TTS voice (voice picker UI not yet implemented)
  • Word timing is estimated (Android TTS does not provide per-word callbacks when synthesizing to file)
  • First sentence may have a brief delay while the TTS engine initializes

Building from source

The package-for-kobo.sh script cross-compiles espeak-ng for ARM and bundles the plugin into a ready-to-deploy directory. It requires Nix for the cross-compilation toolchain.

# Plugin + espeak-ng only
bash package-for-kobo.sh

# Plugin + espeak-ng + Piper neural TTS
bash package-for-kobo.sh --with-piper

# Use a specific Piper voice (default: en_US-danny-low)
bash package-for-kobo.sh --piper-voice en_US-ryan-low

The output is placed in kobo-tts-bundle/audiobook.koplugin/. Copy it to your device:

scp -P 2222 -r kobo-tts-bundle/audiobook.koplugin root@<kobo-ip>:/mnt/onboard/.adds/koreader/plugins/

Installing the Piper binary manually

If you don't want to use the packaging script, you can assemble the Piper runtime yourself:

  1. Download the armv7l binary from Piper releases (2023.11.14-2).
  2. Extract piper, its lib/ directory, and espeak-ng-data/ into audiobook.koplugin/piper/.
  3. Download a voice model (.onnx + .onnx.json) as described in Downloading additional voices and place them in the same piper/ directory.

Note: The rhasspy/piper repository was archived in October 2025. The binaries on the releases page still work. The project continues as OHF-Voice/piper1-gpl.

To Do

  • Implement real word-level timing from TTS engines (SSML / phoneme callbacks)
  • Add PDF/DjVu highlight support (currently EPUB only)
  • Voice picker for Android TTS engines and voices
  • Integrate more TTS backends
  • Improve accessibility
  • Support whole audiobook production with hash-based verification
  • Evaluate plugin with other TTS models (e.g., KittenTTS)
  • Test and optimize for ultralow-quality/size voice models

License

Copyright 2025-2026 gespitia - AGPL-3.0. See LICENSE.

Bundled component License
KOReader AGPL-3.0
espeak-ng GPL-3.0+
Piper MIT
Piper voices MIT
glibc (bundled .so) LGPL-2.1

About

A plugin to give TTS-capabilities, includes bluetooth controllers.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors