47 results found
-
Keep the outline in source PDF files
I had carefully merged 35 separate scanned chapters of a doc in PDF so as to have an outline (using PDF Lab).. and when I OCR'd it in VelOCRapter (keep the name!) I lost the outline. I know, the quick solution is to OCR them <i>before</i> I merge them, but it would be nice not to have to do them one by one.
8 votesThanks for this feedback, to be honest I hadn’t even considered this, so it’s good to know when there are issues.
We currently extract images from the source PDF, feed it through the OCR, and then reassemble a new PDF blind. This leads to the sort of problems that you are having. It would be quite a major architectural change to support updating an existing file, but I can see advantages, and will look into how it might be achieved. -
When dealing with scanned documents in PDF, the output file should not be much bigger than the input
I OCR'd a multi-page scanned document in PDF that is 119 pages long. The original file is 16MB; the output is 41MB. I doubt the text accounts for the additional 25MB... Perhaps it would be possible to add the text to the PDF without re-processing the images?
40 votesThanks for the suggestion. We currently re-encode the image into the new PDF, which may change its size. Was the original black and white or greyscale rather than colour?
-
Keep your product name!
Some people like it, some hate it. Personally I like it as I think it sums up our output sometimes, mis-spelt and with crap in the middle, but then it was my idea.
There's an entry for change your product name as well. If you feel strongly one way or the other please vote for this or that.
65 votes -
Let me save as raw text
The PDF is all very pretty, but I want to get at the raw text. Let me save as text rather than PDF.
22 votes -
1 vote
-
Use Document Window as the Progress Indicator
How about make the little view of the document scroll itself to the page being processed as a way to indicate progress.
2 votesThanks for the suggestion – it’s very clever.
Personally I like to be able to read the document as VelOCRaptor is doing its stuff, and I worry that we’d fight over control. But we do need a progress indication, so I’ll think about it.
-
Allow saving over original filename
At the moment VelOCRaptor saves its processed files as a duplicate, adding "with text" to the names. This means, when used in a batch workflow (or just when being used quickly with multiple documents) you end up with twice the number of files. I can see this system would be useful for some, but I'd like to be able to select an option just to overwrite the original file with the processed one.
24 votesFor version 1.0 I’ve been a bit conservative – loosing peoples files would be bad, and there are data-loss issues when the images are re-encoded into the PDF file. But you’re right, a preference would make sense.
-
Support reading from screen grabs
Screen grabs have too low a resolution for VelOCRaptor to work well. Rescale them before reading so that they work.
98 votes -
Use dictionary to improve accuracy
Try finding words, and don't put special characters everywhere.
That's "text" recognized by velocraptor:
AM0nuNRmnArssAN¬¢xmMmmvAL Mn¬ nn.An ¬
1b¬¢hy¬§hew¬ m‚Äòmrd¬§lar¬§‚Äòinm¬§11yasu¬§v¬§dmh:‚Ä¢yrm1nynno‚Ä¢nswid‚Ä
‚ÄòA1i1nEghringm‚Äò.11nisisnnnnrprise‚Ä¢in¬§¬§pcpuhrm¬ªdinxrcnaux¬ in¬§
fnrmimzprumdng medieval 5§1ring,TI¤¢1n•¤di¢w.l
wurim¬ ‚Äôscn.fti
uhmreducedmdzemytlndurmrnbaununncrclyaxudclyhludgnunndouz
umh¤¤rluch=dmdsIuh¤d¤v•gelynYetwd1¤¤h|i1hed,l¤igl11y
nphisticsmd European Gglutingxymms existed. Eumpun ‚Äòmm¬§1afdd'¬§mc‚Äôpmdu¬§¬§dhundredsofd¬§idweH-iHu¬§:1r¬§dr¬§chr|i¬§x1mm|d1 voteWe do use the dictionary, but if that’s your output, something is going badly wrong. Please email support@velocraptor.com, preferably with the image that you tried, so that we can see what’s up.
Thanks
-
Use multicore/multithreading
You can double speed of the program by using multiple threads (if algorithm is not parallelizable, at least scan more than one page at a time).
10 votesThanks for the suggestion. The next version of our engine supports multi-threading, but it will probably be some time before it is integrated. In the meantime, while each file is processed serially, if you read more than one file at once they should use different cores.
-
provide a progress bar to indicate how long the ocr will take
This should give an indication of progress, not just the spinny wheel
69 votes -
Update the Spotlight index without writing a PDF file
If I don't care about getting the text from a file, but simply want to find it, it would be really cool if you could just add the text to the Spotlight index
5 votes -
21 votes
Hey, we like this sort of feedback.
-
Overlay/replace image with OCRed text
it would be much easier to judge quality of OCR if application drawn it on top of the image.
If you could remove letters from image and replace them with rendered fonts, that would be awesome.
1 voteGiven how good our OCR currently is, I think that it suits us that you can’t judge!
When our engine gets good enough, I think that we plan this as an option -
Add detailed progress report
I've dropped large image-laden PDF and velocraptor was spinning, and spinning. I can't judge whether it's going to take 10 minutes or 10 days.
2 votesIt’s in the pipeline, thanks for reminding us that it’s important.
-
Add ability to manually correct and edit OCR'd text
First of all, Thank you for this software - it is brilliant! I would like to suggest adding the ability to correct OCR'd text a la Paper Capture in Acrobat 5.0
67 votes -
Add support for JSTalk rather than AppleScript
...go on, it's easier than adding applescript and the syntax wouldn't suck
http://gusmueller.com/blog/archives/2009/03/introducing_jstalk__an_alternative_to_applescript.html
1 voteI’ve some sympathy for this suggestion, although it would have to follow after AppleScript, because we can support pretty much all scripting languages through that interface.
-
add vCard (vcf) option
Having the ability to transfer/convert into Address Book would be very useful
4 votes -
add automatic file naming
Support formatted automatic file naming, e.g.
* The first "n" recognized characters
* The date and time of OCR (or last modified date of input file)So, the file might be named "Vanguard Statement - 20090403:1022".
This would give a simple way to organize output, especially when a bunch of paper is being scanned.
12 votesNice suggestion, thanks. We’ll look at customisation and preferences after our first release.
-
Add text-to-speech support
See Apple discussion http://discussions.apple.com/thread.jsp
1 voteThanks for the suggestion.
If you open the app without dropping a file on it – then drop a file, the window will stay open after OCR. You can then select all and use services to read it without saving as PDF.
- Don't see your idea?