16 results found
-
Support reading from screen grabs
Screen grabs have too low a resolution for VelOCRaptor to work well. Rescale them before reading so that they work.
98 votes -
Make the recognition accuracy better
Lets face it - the accuracy isn't great. Make it better.
89 votes -
provide a progress bar to indicate how long the ocr will take
This should give an indication of progress, not just the spinny wheel
69 votes -
Add ability to manually correct and edit OCR'd text
First of all, Thank you for this software - it is brilliant! I would like to suggest adding the ability to correct OCR'd text a la Paper Capture in Acrobat 5.0
67 votes -
When dealing with scanned documents in PDF, the output file should not be much bigger than the input
I OCR'd a multi-page scanned document in PDF that is 119 pages long. The original file is 16MB; the output is 41MB. I doubt the text accounts for the additional 25MB... Perhaps it would be possible to add the text to the PDF without re-processing the images?
40 votesThanks for the suggestion. We currently re-encode the image into the new PDF, which may change its size. Was the original black and white or greyscale rather than colour?
-
Support multi-page tiffs
The app currently falls over with ...recognize.lua:180: cannot open file for reading
35 votes -
Allow saving over original filename
At the moment VelOCRaptor saves its processed files as a duplicate, adding "with text" to the names. This means, when used in a batch workflow (or just when being used quickly with multiple documents) you end up with twice the number of files. I can see this system would be useful for some, but I'd like to be able to select an option just to overwrite the original file with the processed one.
24 votesFor version 1.0 I’ve been a bit conservative – loosing peoples files would be bad, and there are data-loss issues when the images are re-encoded into the PDF file. But you’re right, a preference would make sense.
-
Add AppleScript support
Let me tell you where to save etc
21 votes -
add automatic file naming
Support formatted automatic file naming, e.g.
* The first "n" recognized characters
* The date and time of OCR (or last modified date of input file)So, the file might be named "Vanguard Statement - 20090403:1022".
This would give a simple way to organize output, especially when a bunch of paper is being scanned.
12 votesNice suggestion, thanks. We’ll look at customisation and preferences after our first release.
-
Use multicore/multithreading
You can double speed of the program by using multiple threads (if algorithm is not parallelizable, at least scan more than one page at a time).
10 votesThanks for the suggestion. The next version of our engine supports multi-threading, but it will probably be some time before it is integrated. In the meantime, while each file is processed serially, if you read more than one file at once they should use different cores.
-
Keep the outline in source PDF files
I had carefully merged 35 separate scanned chapters of a doc in PDF so as to have an outline (using PDF Lab).. and when I OCR'd it in VelOCRapter (keep the name!) I lost the outline. I know, the quick solution is to OCR them <i>before</i> I merge them, but it would be nice not to have to do them one by one.
8 votesThanks for this feedback, to be honest I hadn’t even considered this, so it’s good to know when there are issues.
We currently extract images from the source PDF, feed it through the OCR, and then reassemble a new PDF blind. This leads to the sort of problems that you are having. It would be quite a major architectural change to support updating an existing file, but I can see advantages, and will look into how it might be achieved. -
Add Growl notifications
Great software! Recognition is not perfect but your product shows amazing promise. If you could add support for Growl, simply to notify the user when the file is finished being processed that would be very helpful.
8 votesThanks for the suggestion, I’ll get on to it
-
Add detailed progress report
I've dropped large image-laden PDF and velocraptor was spinning, and spinning. I can't judge whether it's going to take 10 minutes or 10 days.
2 votesIt’s in the pipeline, thanks for reminding us that it’s important.
-
Add ability to recognise tables
Lots of documents contain tables and currently the recognition is quirky, so it leaves you having to recreate the table. In a large document, this is very time consuming as I have just found out. Main reason why I won't purchase the product. Apart from that keep up the good work
2 votesYou’re right – tables are our nemesis! I hope to be able to put this right with a new version of the engine.
-
1 vote
-
Overlay/replace image with OCRed text
it would be much easier to judge quality of OCR if application drawn it on top of the image.
If you could remove letters from image and replace them with rendered fonts, that would be awesome.
1 voteGiven how good our OCR currently is, I think that it suits us that you can’t judge!
When our engine gets good enough, I think that we plan this as an option
- Don't see your idea?