pdf ocr library

Hi there,
My company have a lot of documents pdf with ocr.
Do you know if it exists a free solution of library of pdf with ocr ? or not free ?
I need advice
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Joe Winograd, Fellow&MVEDeveloperCommented:
Hi Simlip,

Here are some free OCR tools:

(1) Tesseract OCR Engine, an open source product now maintained by Google:

It has numerous add-ons:

(2) FreeOCR, which uses a compiled version of the Tesseract engine:

(3) GOCR/JOCR, an open source OCR package developed under the GNU Public License:

(4) Boxoft Free OCR (I use several Boxoft free tools):

(5) Google Drive/Docs has an option to perform OCR on uploaded files, but the resulting PDF doesn't hide the text layer, so the files look ugly.

Here are some non-free OCR packages. Two very well regarded ones are Nuance OmniPage and ABBYY FineReader. Here are links to more information:


Here are links to feature comparison charts:


I use both and can say that both are very accurate, but I can't say that one is always better than the other. I've tested them on the same documents, and sometimes one is better, sometimes the other is, but for the most part, the accuracy is similar - both very good! They both can make searchable PDF files (i.e., a PDF file with both the scanned image and a layer of text created by the OCR process).

Another (non-free) idea is Nuance's PaperPort product, which is not a dedicated OCR package, but can perform OCR via Nuance's OmniPage, which is included "under the covers" (the OmniPage OCR engine is built into PaperPort):

PaperPort is a robust scanning/imaging package that does a lot more than just OCR (but for pure OCR, is not as robust as OmniPage and FineReader). I use PaperPort extensively (more than OmniPage and FineReader combined) to create PDF Searchable Image files. Unless you have extreme OCR requirements, I recommend PaperPort (in terms of the non-free products). Its OCR capabilities (via the built-in OmniPage) will likely be adequate for your purposes. But if not, then go with OmniPage or FineReader.

Yet another (non-free) possibility is Adobe Acrobat (not Adobe Reader), which is also a lot more than just OCR:

I'm not a big fan of Acrobat (it's too expensive for what it does, in my opinion), but many folks like it and its built-in OCR is good.

Another non-free product, but much less expensive than the other non-free products mentioned above (just $27), is A-PDF OCR:

This gives you a lot to experiment with, which I strongly recommend...try them on your documents. Regards, Joe

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Joe Winograd, Fellow&MVEDeveloperCommented:
Hi Simlip,
Where do things stand on this project? Have you tried any of my suggestions? It will be helpful if you provide some feedback so we can keep this moving forward towards a solution. Thanks, Joe
Hi Joe,

Is it worth considering tesseract for enterprise use? I have to scan several PDF documents for OCRing? Will it do my job or should I use Tiff filter in sharepoint?

Joe Winograd, Fellow&MVEDeveloperCommented:
Hi Vagesh,
I have done a fair amount of experimenting with Tesseract. Based on my results, I would not consider it for enterprise use. Plain and simply, its OCR accuracy is not good enough. I would go with a high-quality, commercial OCR package for enterprise use, such as ABBYY's FineReader or Nuance's OmniPage. I am unfamiliar with the TIFF filter in SharePoint, so I can't speak to that. Regards, Joe
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Adobe Acrobat

From novice to tech pro — start learning today.