OCR Data Acquisition

Hi Experts,

This is mostly a linux question, but at the bottom there is also a Windows question.

Does anyone know of non-commercial OCR software that can be configured to look for specific words (data recognition) in a scanned document?  For example, on a scanned document I have the software search for "Purchase Order" or "PO"  and other abbreviations of "Purchase Order."  The software might return to my program "Found/Non-Found", the actual string found or even the location on the image where the string was found, (e.g. [x1,y1] upper left and [x2,y2] lower right).

Further more, can the OCR software scan the immediate area  where the "Purchase Order" string was found and return that string (e.g. the purchase order number). The "immediate area" could be defined buy the program as upper-left & lower-right  coordinates.

Another Purchase Order example is to find the name & address information of purchaser, delivery date, delivery location, item number, quantity, etc.  Very ambitious.

I'm running Ubuntu, 7.10 gusty gibbon.  That gives me immediate access to debian packages.  Are there any Fedora RPMS, SuSE packages, Slackware, Mandriva ,Gentoo, Xandros, etc. that have packages, rpm or whatever they use to manage application software?

Are there any C libraries that aid in (1) OCR and looking for specific strings and (2) scanning a specific area of an image and return any data found?

For Windows Experts, are there OCX controls to do OCR and data acquisition?

Thanks much!!!

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

http:// thevpn.guruCommented:
Well for linux I know about these two progs you might wana have a look at..since they are open source you might wana have a look at their source code..they both have OCR features.

kooka - scanner program for KDE
unpaper - post-processing tool for scanned pages

As for windows compoents you can surely find something relevant

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
You can use tesseract: http://code.google.com/p/tesseract-ocr/ to scan the documents and then create a script/program to handle the actual searching.

nociSoftware EngineerCommented:
gocr is another one  http://jocr.sourceforge.net
IT79637Author Commented:
The part of question regarding data acquisition is a very difficult one.  I'm looking for key word, such as Purchase Order on an image. Then want to find the purchase order data around the key word.  That type of intelligence is significantly more difficult than vanilla OCR.  The experts responses pointed me to several  linux based packages.
Thank you all very much!!!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.