Settings and activity
4 results found
-
32 votes
Too be frank, I’d like to get English working better first, but I’ll look at this then.
An error occurred while saving the comment -
22 votes
An error occurred while saving the comment Anonymous commentedI know of only two free tools that do a decent job of text extraction from pdf (considering layout complexities). They are pdftotext (there are several by that name i mean the one coming with xpdf) and multivalent. There needs to be two modes (which at least pdftotext supports): one that attempts to preserve layout even in plain text, and the other that just gets out raw strings (this is easier).
-
When dealing with scanned documents in PDF, the output file should not be much bigger than the input
40 votesThanks for the suggestion. We currently re-encode the image into the new PDF, which may change its size. Was the original black and white or greyscale rather than colour?
An error occurred while saving the comment Anonymous commentedI also have a black-and-white. Started as 8mb, became 66mb afterwards.
-
69 votes
An error occurred while saving the comment Anonymous commentedIn addition to a progress bar, an optional log window would be useful. Surely the underlying engine has some sort of debug mode. It would help to know for example when it is getting lost or uncertain.
I don't know if this has to do with the GUI, or with the documents being parsed? Presumably the google folks are working on foreign language font parsing?