• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 745
  • Last Modified:

SharePoint 2007 PDF full text search (using OCR)

I have a SharePoint 2007 Document Library that contains scanned contracts in PDF format (image PDFs).
I want to enable Full text search of these contracts. I want to retain the original image format of the PDF files.

I am looking for an OCR solution that would
1. run on a pre-defined schedule
2. Extract the text in the contract PDF files
3. Make this text searchable using the SharePoint default Search service.

Note:I don't ant the document to be OCRed at the time of uploading to the Document Library. It should be OCRed at a pre-defined schedule (for e.g. 7am and 7pm )

Does anyone know of any solutions ?

2 Solutions
I do not have an answer for OCR, but have you installed the Adobe PDF iFilter in the SharePoint environment to allow for full-text indexing?

The attached document is from Adobe:

Joe Winograd - EE Fellow & MVEDeveloperCommented:
Nuance's OmniPage Professional 18 does what you want. From its website:

"Automatically batch convert files."
"Archive documents directly into Microsoft SharePoint."

Here's a link to more information:

ABBYY FineReader 11 Corporate Edition is another product that does what you want. From its website:

"Users can...schedule conversion for specific times."
"Export to SharePoint"

Here's a link to more information:

Both of these products have been highly regarded in imaging/OCR circles for a long time. Regards, Joe
KnowledgeLake has an OCR solution that my client is using with reasonable success.  I'm not sure about the "on schedule" capabilities of the product but it does a reasonable number OCR'ing the PDF's.  The real question is how are these PDF's being generated (are they images of pages embedded in the file or are they "printed" to PDF with text layers stored in the file?  If the latter, you likely do not need OCR.  SharePoint can index text layer PDF's with the right add in.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now