Make the recognition accuracy better
Lets face it - the accuracy isn't great. Make it better.
-
Lloyd commented
I like the idea and it works quickly, but the accuracy was a problem. Lots on fiddly corrections. I will come back again later and see if it is more accurate.
-
Leigh Grossman commented
I'd love to be able to use this, but the recognition on clean a test file was unacceptable; it would have taken forever to clean up compared to the older programs I'm still using. I'll keep an eye out, and if the accuracy improves I'd be happy to use (and recommend) it.
-
Katie commented
Yes please! I'd love to be able to not only search, but do the occasional copy-paste of a sentence or a paragraph.
-
fcchambers commented
Kind of a marketing/positioning comment here: I was drawn to this product primarily to make my PDFs searchable via spotlight... so in my case if 90% of the text is accurate, it would probably be enough for me! Elsewhere in the comments was a suggestion to pass the text to searchlight... I'm wondering if "stripped down" version where *all* it did was index a doc for seachlight, might have merit?
-
HandyMac commented
I picked up VelOCRaptor on a whim when I saw it mentioned at MacInTouch last summer, mostly because I liked the spirit evident in the name and icon.
I have occasional need for OCR, say to copy a paragraph or a few pages out of a book I'm reading. In the classic Mac OS days I used OmniPage, which came with my scanner; it worked, but was somewhat confusingly complex for my needs. In the OS X era I've used OmniPage (awkwardly "ported" from its old version) occasionally, then tried ReadIRIS, which was really daunting in its complexity. Lately I've just been saving the scans for when I might have time and energy to master the software.
Until today, when I finally got around to trying VelOCRaptor, and found it, as the little bear said, "just right". I scanned about 15 pages out of a book (300dpi TIFFs), dropped the scans on the icon, then copied the text out of the PDFs VelOCRaptor created, pasted it into TextEdit and went over it. Added an extra return for each paragraph, then used Devon's excellent WordService service to delete all the superfluous returns (cmd-shift-7).
Yeah, it could be a little more accurate, which is why I'm posting in this thread -- but after all, "more accuracy" is what any serious developer of OCR software would be working toward all the time anyway, no? Does any OCR engine understand hyphenated words? And it was a little funny that it consistently read "McLean" as "McAllen", and read lower case "o" as "0" (zero) in some scans. So it's true that the OCR'd material requires a fairly close reading and a fair number of corrections, but it's sure a lot faster than typing it all in (and I'm a pretty fast typist when I get going).
Anyway, I'm sure VelOCRaptor will become more accurate as time goes on, but otherwise I like it fine. And I'm sure you have some good ideas to make it better, but don't go adding a whole lot of "kitchen sink" features trying to please everyone. There are already industrial-strength OCR programs available (with byzantine interfaces) for those who need them.
-
-
godffreypratt commented
I agree with Dyno wholeheartedly. Not to sound discouraging but it's really of no use in its current state. Even using the crispest font, it doesn't recognise half the number characters. And the PDF output is blurry. It's actually rather cheeky getting users to en masse as beta testers in this way. (More dubious practices to follow, no doubt).
-
dyno commented
The accuracy is so poor at present (even with excellent quality source files) that I'm afraid the app isn't really in a useable state. However the interface is delightfully simple and straightforward, which is why I would like to strongly encourage further development.
-
Our current engine is OCRopus, which does layout analysis and then hands over to Tesseract as its character recognizer. So yes!
I'm hoping to be able to announce big news on the accuracy front soon.
-
pornel commented
Have you considered tesseract engine?
http://code.google.com/p/tesseract-ocr/ -
pornel commented
I've converted blurry image and got the following result:
-¬ _
[¬ª¬
1
11*1111111111.11111.¬ J111‚Äò1-1111:1tF ,I::¬
.111 1111`1t‘11.11’11111i11.11,111 :11%