Solved

Converting pdf to searchable text format

Posted on 2004-08-11
5
3,148 Views
Last Modified: 2006-11-17
Hi all,

I have a pdf file which seems to be just pages of scanned pages (I can't search for specific words). I would like to convert this file to a pdf where I can search the text. Is there some kind of OCR package which would do this?

Thanks,
Freerider.
0
Comment
Question by:Freerider
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
5 Comments
 
LVL 11

Expert Comment

by:lbertacco
ID: 11770899
If you have office2003 you can print it to "Microsoft Office Image Writer" printer, then open it with "Microsoft Office Document Imaging" and click on Tools->send text to word
0
 
LVL 44

Accepted Solution

by:
Karl Heinz Kremer earned 100 total points
ID: 11770962
You can use Adobe Acrobat (the full version): It comes with "Paper Capture", which is an OCR engine. If you don't have Acrobat. Other options are ScanSoft's OmniPage Pro (http://www.scansoft.com/omnipage/) or the Abbyy FineReader (http://www.abbyy.com/finereader/).
You have several options when you convert your image-only PDF: You can convert everything to "real" text and graphics, which may not be your best solution, because you very likely will end up with a mix of recognized text and not recognized text, which will stay as scanned image. This means that your characters in your text will change from read characters to the scanned images, and this is visible even to the untrained eye. You can avoid this by selecting "image with hidden text", where the original scanned image will be used for display and printing purposes, but the recognized text will be stored behind the image (in the correct location). This means that you can index and search the document. When you find a term, the correct section of the document will be highlighted, but you still have the high quality scan that you started with when you view or print the document.
0
 

Author Comment

by:Freerider
ID: 11863285
Thanks khkremer,
Finereader does the job. The only problem I have now is the bookmarks from the original document have been removed. Any idea how to get them back? I've downloaded a few trial programs but nothing seems to work.

Freerider.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 11863758
Try this: Take the original file (with the bookmarks) and open it in Acrobat, then select Document>Pages>Replace and select to replace all pages with the pages from your OCR'ed document.
0

Featured Post

[Webinar] Learn How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever come up with a need of emailing only few pages of PDF file to one of yourfriend or colleague, instead of whole Adobe file? If yes, then surely you have face problems in doing that! Read this section as I have suggested multiple solutio…
The Adobe PDF proprietary file format is recognized as secure and formulated. But these PDF files are also prone to corruption and any external threat like virus attacks, improper storage can hit PDF file integrity.This type of damages can make cruc…
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question