When dealing with scanned documents in PDF, the output file should not be much bigger than the input

I OCR'd a multi-page scanned document in PDF that is 119 pages long. The original file is 16MB; the output is 41MB. I doubt the text accounts for the additional 25MB... Perhaps it would be possible to add the text to the PDF without re-processing the images?

40 votes

Matthew Green shared this idea · June 30, 2009 · Report… · Admin →

planned ·

AdminDuncan McGregor (Admin, VelOCRaptor) responded · July 02, 2009

Thanks for the suggestion. We currently re-encode the image into the new PDF, which may change its size. Was the original black and white or greyscale rather than colour?

An error occurred while saving the comment

Anonymous commented · September 05, 2012 02:21 · Report

I also have a black-and-white. Started as 8mb, became 66mb afterwards.

Submitting...
fcchambers commented · February 18, 2010 08:40 · Report

I use a ScanSnap and your product is sooooooo close to astonishingly awesome (Sorry, for now it's just regular "awesome"...)

Are you actively working on the "files get kinda big" after OCR? Any timelines?

Also, I haven't purchased a license yet...(Just double checking the implications of the file size issue... IE, if I'd need to purchase more storage and backup, "way too expensive" Acrobat might actually be cheaper for me <Sniff, because I like your product much better for my purposes...>

Also, It would be kinda cool if license holders got more votes for features... :)

Submitting...
Matthew Green commented · July 02, 2009 08:38 · Report

The original was black and white (probably compressed TIFF files within the PDF file). Perhaps you need a way for the program to take note of the original format of the images and re-compress them using the same algorithm, if recompress you must (i.e. if you can't transfer the original images from the original PDF to the new one without reprocessing them).

Submitting...

I suggest you ...

Feedback

General

Searching…

VelOCRaptor

When dealing with scanned documents in PDF, the output file should not be much bigger than the input

General

Categories

Searching…

VelOCRaptor

When dealing with scanned documents in PDF, the output file should not be much bigger than the input

We're glad you're here

We're glad you're here

We're glad you're here