Using OCR on PDF images with PHP

I have adobe 9 std installed on a windows 2003 server and I am able to convert scanned documents back to text with its OCR function like a champ (manually through the program).

What I would like to be able to do is do that using PHP that I can then extract that text for manipulation.

Does acrobat have a command line interface that will convert a pdf using OCR then allowing me to save it?

Can you use acrobat with PHP's COM functions?

Is there another program or class out there that will let me convert a PDF image to text through PHP or command line?  
Rock_LobsterAsked:
Who is Participating?
 
a1aaitCommented:
OmniPage is not cheap, but its considered one of the best OCR apps out there.  It has a lot more flexibility for OCR than anything that comes with Acrobat Std., like blocking areas of a page, or dropping out form structures to capture data only.   Could be overkill.  Perhaps they have an API you could license cheaper than buying the app.
0
 
a1aaitCommented:
Not exactly the method you are looking for, but a possible solution:

You might be able to do this with OmniPage Pro using Watched Folders.  You would not even require any PHP scripting except to upload or indicate the file to run OCR on, perhaps.  Hard to tell from your post if you will be sitting at the machine, scheduling tasks, or running a web service, or ?

http://www.nuance.com
0
 
Rock_LobsterAuthor Commented:
I'll take a look.....it doesn't look like they have trial versions so maybe tricky to test without dropping a few hundred.


The text extraction will be a piece to a larger script that will be run by an end user so the less interaction on their end the better.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.