OCR Data Acquisition
Posted on 2008-01-30
This is mostly a linux question, but at the bottom there is also a Windows question.
Does anyone know of non-commercial OCR software that can be configured to look for specific words (data recognition) in a scanned document? For example, on a scanned document I have the software search for "Purchase Order" or "PO" and other abbreviations of "Purchase Order." The software might return to my program "Found/Non-Found", the actual string found or even the location on the image where the string was found, (e.g. [x1,y1] upper left and [x2,y2] lower right).
Further more, can the OCR software scan the immediate area where the "Purchase Order" string was found and return that string (e.g. the purchase order number). The "immediate area" could be defined buy the program as upper-left & lower-right coordinates.
Another Purchase Order example is to find the name & address information of purchaser, delivery date, delivery location, item number, quantity, etc. Very ambitious.
I'm running Ubuntu, 7.10 gusty gibbon. That gives me immediate access to debian packages. Are there any Fedora RPMS, SuSE packages, Slackware, Mandriva ,Gentoo, Xandros, etc. that have packages, rpm or whatever they use to manage application software?
Are there any C libraries that aid in (1) OCR and looking for specific strings and (2) scanning a specific area of an image and return any data found?
For Windows Experts, are there OCX controls to do OCR and data acquisition?