pdf ocr library

Hi there,
My company have a lot of documents pdf with ocr.
Do you know if it exists a free solution of library of pdf with ocr ? or not free ?
I need advice
Who is Participating?
Joe Winograd - EE Fellow & MVEConnect With a Mentor DeveloperCommented:
Hi Simlip,

Here are some free OCR tools:

(1) Tesseract OCR Engine, an open source product now maintained by Google:

It has numerous add-ons:

(2) FreeOCR, which uses a compiled version of the Tesseract engine:

(3) GOCR/JOCR, an open source OCR package developed under the GNU Public License:

(4) Boxoft Free OCR (I use several Boxoft free tools):

(5) Google Drive/Docs has an option to perform OCR on uploaded files, but the resulting PDF doesn't hide the text layer, so the files look ugly.

Here are some non-free OCR packages. Two very well regarded ones are Nuance OmniPage and ABBYY FineReader. Here are links to more information:


Here are links to feature comparison charts:


I use both and can say that both are very accurate, but I can't say that one is always better than the other. I've tested them on the same documents, and sometimes one is better, sometimes the other is, but for the most part, the accuracy is similar - both very good! They both can make searchable PDF files (i.e., a PDF file with both the scanned image and a layer of text created by the OCR process).

Another (non-free) idea is Nuance's PaperPort product, which is not a dedicated OCR package, but can perform OCR via Nuance's OmniPage, which is included "under the covers" (the OmniPage OCR engine is built into PaperPort):

PaperPort is a robust scanning/imaging package that does a lot more than just OCR (but for pure OCR, is not as robust as OmniPage and FineReader). I use PaperPort extensively (more than OmniPage and FineReader combined) to create PDF Searchable Image files. Unless you have extreme OCR requirements, I recommend PaperPort (in terms of the non-free products). Its OCR capabilities (via the built-in OmniPage) will likely be adequate for your purposes. But if not, then go with OmniPage or FineReader.

Yet another (non-free) possibility is Adobe Acrobat (not Adobe Reader), which is also a lot more than just OCR:

I'm not a big fan of Acrobat (it's too expensive for what it does, in my opinion), but many folks like it and its built-in OCR is good.

Another non-free product, but much less expensive than the other non-free products mentioned above (just $27), is A-PDF OCR:

This gives you a lot to experiment with, which I strongly recommend...try them on your documents. Regards, Joe
Joe Winograd - EE Fellow & MVEDeveloperCommented:
Hi Simlip,
Where do things stand on this project? Have you tried any of my suggestions? It will be helpful if you provide some feedback so we can keep this moving forward towards a solution. Thanks, Joe
Hi Joe,

Is it worth considering tesseract for enterprise use? I have to scan several PDF documents for OCRing? Will it do my job or should I use Tiff filter in sharepoint?

Joe Winograd - EE Fellow & MVEDeveloperCommented:
Hi Vagesh,
I have done a fair amount of experimenting with Tesseract. Based on my results, I would not consider it for enterprise use. Plain and simply, its OCR accuracy is not good enough. I would go with a high-quality, commercial OCR package for enterprise use, such as ABBYY's FineReader or Nuance's OmniPage. I am unfamiliar with the TIFF filter in SharePoint, so I can't speak to that. Regards, Joe
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.