epub4j

Java library for EPUB read, validate, repair, normalize, transform, and write workflows.

What it does

Read EPUB from path, stream, or resources
Write EPUB with package and metadata updates
Lazy load resources for lower memory usage
Validate structure, metadata, manifest, spine, references, and accessibility
Run diagnostics with severity, error codes, and auto fix hints
Auto repair common issues in malformed EPUB files
Prune broken TOC entries and promote valid child entries
Remove unreferenced JavaScript resources from manifest resources
Remove common non-content artifact files (iTunes metadata, authoring tool bookmarks, OS leftovers)
Validate EPUB mimetype entry and report strict/recover behavior
Normalize invalid language tags and remove stray img tags with missing src
Rebuild and normalize spine reading order from manifest XHTML resources
Reconcile spine href/idref alias drift to canonical manifest resources
Harden XHTML pre-parse well-formedness before downstream XML processing
Repair broken internal href/src/url link graph using safe alias rewrite heuristics
Generate KOReader-compatible partial MD5 checksums for dedupe/progress-sync IDs
Normalize mixed encodings to UTF-8
Normalize metadata fields and infer missing metadata
Detect cover and synthesize missing table of contents
Manipulate spine and split or merge XHTML
Run search and replace across content resources
Estimate word count
Deduplicate resources
Convert to kepub

Reliability and safety

Strict and recover processing modes
Archive path traversal protection
Duplicate entry detection
Archive level byte budget
Per entry byte budget
Total uncompressed byte budget
Bounded stream copy for input streams
Case stable path deduplication using Locale.ROOT

Quick start

import org.grimmory.epub4j.domain.Book;
import org.grimmory.epub4j.epub.EpubProcessingPolicy;
import org.grimmory.epub4j.epub.EpubReader;

EpubProcessingPolicy policy = EpubProcessingPolicy.defaultPolicy()
    .withMaxArchiveBytes(256L * 1024 * 1024)
    .withMaxEntryBytes(32L * 1024 * 1024)
    .withMaxTotalUncompressedBytes(512L * 1024 * 1024);

EpubReader reader = new EpubReader(null, policy);
Book book = reader.readEpub(java.nio.file.Path.of("book.epub"));

Strict mode

import org.grimmory.epub4j.epub.EpubProcessingPolicy;
import org.grimmory.epub4j.epub.EpubReader;

EpubReader reader = new EpubReader(null, EpubProcessingPolicy.strictPolicy());
var book = reader.readEpubStrict(java.nio.file.Path.of("book.epub"));

Recover mode report

import org.grimmory.epub4j.epub.EpubReader;

EpubReader reader = new EpubReader();
EpubReader.ReadResult result = reader.readEpubWithReport(java.nio.file.Path.of("book.epub"));

if (result.report().hasWarnings()) {
    result.report().warnings().forEach(w ->
        System.out.println(w.code() + ": " + w.message())
    );
}

if (result.report().hasCorrections()) {
    result.report().corrections().forEach(System.out::println);
}

Advanced repair pass

BookRepair now includes a stricter cleanup pass for XHTML content:

Guarded lowercasing of legacy HTML tag and attribute names in XHTML resources
Preservation of namespaced attributes (for example xlink:href)
Removal of Adobe DRM meta markers and inline script artifacts
Pruning of broken TOC references against actual XHTML resources
Optional JavaScript resource pruning when files are no longer referenced
Removal of common non-content artifact files
Mimetype validation with strict failure or recover-mode warnings
Language tag normalization and stray <img> cleanup
Ebooklib-style spine normalization: drop invalid/duplicate/non-XHTML spine refs and append missing XHTML content docs
Manifest/spine alias reconciliation for href/idref drift in mixed-encoding paths
XHTML pre-parse hardening inspired by html5lib/lxml/xmllint defensive parsing workflows
Link graph repair pass for broken internal href/src/url targets with conservative rewrites

import org.grimmory.epub4j.epub.BookRepair;

BookRepair repair = new BookRepair();
BookRepair.RepairResult repaired = repair.repair(book);

repaired.actions().forEach(a ->
    System.out.println(a.code() + " -> " + a.description())
);

KOReader-compatible checksum

Ported from CWA/KOReader checksum behavior for lightweight file identity workflows:

import org.grimmory.epub4j.util.KoReaderChecksum;

var byPath = KoReaderChecksum.calculate(java.nio.file.Path.of("book.epub"));
var byBytes = KoReaderChecksum.calculate(epubBytes);

Roadmap

Broken link validation and auto-repair for guide, TOC, and in-document href/src
Unused CSS and unused image detection/removal
OPF metadata schema cleanup and stronger namespace normalization
Optional OPF2 to OPF3 upgrade helpers with nav document regeneration
Batch/background job execution API for large repair/validation runs
Metadata backup snapshot export and restore hooks
Ingest-safe MIME/content sniffing beyond extension checks
Optional duplicate detection heuristics for library hygiene

Build

./gradlew build

Runtime And Toolchain Requirements

Java 25
JVM flags for preview and native interop paths:

--enable-preview --enable-native-access=ALL-UNNAMED

Quality Workflow

Run the verification path used by CI:

./gradlew check --warning-mode all

For focused module checks while iterating:

./gradlew :comic4j:check

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
comic4j		comic4j
config		config
doc		doc
epub4j-core		epub4j-core
epub4j-native		epub4j-native
epub4j-tools		epub4j-tools
gradle		gradle
.gitattributes		.gitattributes
.gitignore		.gitignore
.sdkmanrc		.sdkmanrc
.upstream		.upstream
CREDITS		CREDITS
LICENSE		LICENSE
Makefile		Makefile
Makefile.txt		Makefile.txt
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

epub4j

What it does

Reliability and safety

Quick start

Strict mode

Recover mode report

Advanced repair pass

KOReader-compatible checksum

Roadmap

Build

Runtime And Toolchain Requirements

Quality Workflow

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

epub4j

What it does

Reliability and safety

Quick start

Strict mode

Recover mode report

Advanced repair pass

KOReader-compatible checksum

Roadmap

Build

Runtime And Toolchain Requirements

Quality Workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages