-
Notifications
You must be signed in to change notification settings - Fork 50
Can't handle JBIG2 pdf images #19
Copy link
Copy link
Open
Description
When I run the training step on some .pdfs of historical Indian census files, I get the following error:
Extracting text line images from ../data/district_reports/raw_pdfs/1981/27582_1981_MAI.pdf, page 3
Error reading image
com.sun.pdfview.PDFParseException: Unknown coding method:JBIG2Decode
and then
java.lang.NullPointerException
at com.sun.pdfview.font.TTFFont.getOutline(TTFFont.java:170)
at com.sun.pdfview.font.CIDFontType2.getOutline(CIDFontType2.java:270)
at com.sun.pdfview.font.OutlineFont.getGlyph(OutlineFont.java:130)
at com.sun.pdfview.font.PDFFont.getCachedGlyph(PDFFont.java:308)
at com.sun.pdfview.font.PDFFontEncoding.getGlyphFromCMap(PDFFontEncoding.java:155)
at com.sun.pdfview.font.PDFFontEncoding.getGlyphs(PDFFontEncoding.java:115)
at com.sun.pdfview.font.PDFFont.getGlyphs(PDFFont.java:274)
at com.sun.pdfview.PDFTextFormat.doText(PDFTextFormat.java:269)
at com.sun.pdfview.PDFParser.iterate(PDFParser.java:752)
at com.sun.pdfview.BaseWatchable.run(BaseWatchable.java:101)
at java.base/java.lang.Thread.run(Thread.java:834)
I think what's going on here is that the .pdf contains .jbig2 images, but the program doesn't know how to read these.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels