• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 605
  • Last Modified:

Using OCR on PDF images with PHP

I have adobe 9 std installed on a windows 2003 server and I am able to convert scanned documents back to text with its OCR function like a champ (manually through the program).

What I would like to be able to do is do that using PHP that I can then extract that text for manipulation.

Does acrobat have a command line interface that will convert a pdf using OCR then allowing me to save it?

Can you use acrobat with PHP's COM functions?

Is there another program or class out there that will let me convert a PDF image to text through PHP or command line?  
0
Rock_Lobster
Asked:
Rock_Lobster
  • 2
1 Solution
 
a1aaitCommented:
Not exactly the method you are looking for, but a possible solution:

You might be able to do this with OmniPage Pro using Watched Folders.  You would not even require any PHP scripting except to upload or indicate the file to run OCR on, perhaps.  Hard to tell from your post if you will be sitting at the machine, scheduling tasks, or running a web service, or ?

http://www.nuance.com
0
 
Rock_LobsterAuthor Commented:
I'll take a look.....it doesn't look like they have trial versions so maybe tricky to test without dropping a few hundred.


The text extraction will be a piece to a larger script that will be run by an end user so the less interaction on their end the better.
0
 
a1aaitCommented:
OmniPage is not cheap, but its considered one of the best OCR apps out there.  It has a lot more flexibility for OCR than anything that comes with Acrobat Std., like blocking areas of a page, or dropping out form structures to capture data only.   Could be overkill.  Perhaps they have an API you could license cheaper than buying the app.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now