Solved

SharePoint 2007 PDF full text search (using OCR)

Posted on 2011-10-01
3
685 Views
Last Modified: 2012-08-14
Hi,
I have a SharePoint 2007 Document Library that contains scanned contracts in PDF format (image PDFs).
I want to enable Full text search of these contracts. I want to retain the original image format of the PDF files.

I am looking for an OCR solution that would
1. run on a pre-defined schedule
2. Extract the text in the contract PDF files
3. Make this text searchable using the SharePoint default Search service.

Note:I don't ant the document to be OCRed at the time of uploading to the Document Library. It should be OCRed at a pre-defined schedule (for e.g. 7am and 7pm )

Does anyone know of any solutions ?

Thanks
0
Comment
Question by:TetraSA
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 16

Expert Comment

by:jessc7
ID: 36897011
I do not have an answer for OCR, but have you installed the Adobe PDF iFilter in the SharePoint environment to allow for full-text indexing?

The attached document is from Adobe:

http://www.adobe.com/special/acrobat/configuring_pdf_ifilter_for_ms_sharepoint_2007.pdf
configuring-pdf-ifilter-for-ms-s.pdf
0
 
LVL 54

Accepted Solution

by:
Joe Winograd, EE MVE 2015&2016 earned 300 total points
ID: 36897038
Nuance's OmniPage Professional 18 does what you want. From its website:

"Automatically batch convert files."
"Archive documents directly into Microsoft SharePoint."

Here's a link to more information:
http://nuance.com/for-business/by-product/omnipage/professional/index.htm

ABBYY FineReader 11 Corporate Edition is another product that does what you want. From its website:

"Users can...schedule conversion for specific times."
"Export to SharePoint"

Here's a link to more information:
http://finereader.abbyy.com/corporate/full_feature_list/

Both of these products have been highly regarded in imaging/OCR circles for a long time. Regards, Joe
0
 
LVL 2

Assisted Solution

by:typerracer
typerracer earned 200 total points
ID: 36912332
KnowledgeLake has an OCR solution that my client is using with reasonable success.  I'm not sure about the "on schedule" capabilities of the product but it does a reasonable number OCR'ing the PDF's.  The real question is how are these PDF's being generated (are they images of pages embedded in the file or are they "printed" to PDF with text layers stored in the file?  If the latter, you likely do not need OCR.  SharePoint can index text layer PDF's with the right add in.
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Note:  There are two main ways to deploy InfoPath forms:  Server-side and directly through the SharePoint site.  Deploying a server-side InfoPath form means the form is approved by the Administrator, thus allowing greater functionality in the form. …
When using a search centre, I'm going to show you how to configure Sharepoint's search to only return results from the current site collection. Very useful when using Office 365 with multiple site collections.
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…
There's a multitude of different network monitoring solutions out there, and you're probably wondering what makes NetCrunch so special. It's completely agentless, but does let you create an agent, if you desire. It offers powerful scalability …

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question