Using OCR on PDF images with PHP

Posted on 2010-01-06
Last Modified: 2013-12-13
I have adobe 9 std installed on a windows 2003 server and I am able to convert scanned documents back to text with its OCR function like a champ (manually through the program).

What I would like to be able to do is do that using PHP that I can then extract that text for manipulation.

Does acrobat have a command line interface that will convert a pdf using OCR then allowing me to save it?

Can you use acrobat with PHP's COM functions?

Is there another program or class out there that will let me convert a PDF image to text through PHP or command line?  
Question by:Rock_Lobster
    LVL 4

    Expert Comment

    Not exactly the method you are looking for, but a possible solution:

    You might be able to do this with OmniPage Pro using Watched Folders.  You would not even require any PHP scripting except to upload or indicate the file to run OCR on, perhaps.  Hard to tell from your post if you will be sitting at the machine, scheduling tasks, or running a web service, or ?

    Author Comment

    I'll take a doesn't look like they have trial versions so maybe tricky to test without dropping a few hundred.

    The text extraction will be a piece to a larger script that will be run by an end user so the less interaction on their end the better.
    LVL 4

    Accepted Solution

    OmniPage is not cheap, but its considered one of the best OCR apps out there.  It has a lot more flexibility for OCR than anything that comes with Acrobat Std., like blocking areas of a page, or dropping out form structures to capture data only.   Could be overkill.  Perhaps they have an API you could license cheaper than buying the app.

    Featured Post

    What Is Threat Intelligence?

    Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

    Join & Write a Comment

    PaperPort is a popular document imaging/management product from Nuance Communications ( It is in widespread use by both individuals ( and businesses (http:/…
    Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
    In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
    This video Micro Tutorial is the second in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 ( But the ability to create custom scanning profiles a…

    732 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    22 Experts available now in Live!

    Get 1:1 Help Now