-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
First page saved to jpeg via this site: https://smallpdf.com
Result of the left column is quite readable at the right screen-resolution.
ocrmypdf --pdfa-image-compression lossless -O0 0001.jpg formulierhocrjpg.pdf
Input file is not a PDF, checking if it is an image...
Input file is an image
Input image has no ICC profile, assuming sRGB
Image seems valid. Try converting to PDF...
Successfully converted to PDF, processing...
Scanning contents: 100%|████████████████████████| 1/1 [00:00<00:00, 73.93page/s]
OCR: 100%|██████████████████████████████████| 1.0/1.0 [00:09<00:00, 9.92s/page]
Postprocessing...
PDF/A conversion: 100%|█████████████████████████| 1/1 [00:00<00:00, 2.46page/s]
Optimize ratio: 1.00 savings: 0.0%
Output file is a PDF/A-2B (as expected)
pdfcomp formulierhocrjpg.pdf formulierhocrjpgkleiner.pdf
Compression factor: 9.617848822158944
Contains unreadable text on the left. The hocr contains "Toelichting 1.1", it is completely unreadable.
My patch for the inversion ratio makes it better readable:
formulierhocrjpgkleinerpatch.pdf
However if you lookup the mask-picture it doesn't contain this text in the left column at all.
So my patch isn't the only needed change for that routine.
Metadata
Metadata
Assignees
Labels
No labels
