Solved

Converting pdf to searchable text format

Posted on 2004-08-11
5
3,132 Views
Last Modified: 2006-11-17
Hi all,

I have a pdf file which seems to be just pages of scanned pages (I can't search for specific words). I would like to convert this file to a pdf where I can search the text. Is there some kind of OCR package which would do this?

Thanks,
Freerider.
0
Comment
Question by:Freerider
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
5 Comments
 
LVL 11

Expert Comment

by:lbertacco
ID: 11770899
If you have office2003 you can print it to "Microsoft Office Image Writer" printer, then open it with "Microsoft Office Document Imaging" and click on Tools->send text to word
0
 
LVL 44

Accepted Solution

by:
Karl Heinz Kremer earned 100 total points
ID: 11770962
You can use Adobe Acrobat (the full version): It comes with "Paper Capture", which is an OCR engine. If you don't have Acrobat. Other options are ScanSoft's OmniPage Pro (http://www.scansoft.com/omnipage/) or the Abbyy FineReader (http://www.abbyy.com/finereader/).
You have several options when you convert your image-only PDF: You can convert everything to "real" text and graphics, which may not be your best solution, because you very likely will end up with a mix of recognized text and not recognized text, which will stay as scanned image. This means that your characters in your text will change from read characters to the scanned images, and this is visible even to the untrained eye. You can avoid this by selecting "image with hidden text", where the original scanned image will be used for display and printing purposes, but the recognized text will be stored behind the image (in the correct location). This means that you can index and search the document. When you find a term, the correct section of the document will be highlighted, but you still have the high quality scan that you started with when you view or print the document.
0
 

Author Comment

by:Freerider
ID: 11863285
Thanks khkremer,
Finereader does the job. The only problem I have now is the bookmarks from the original document have been removed. Any idea how to get them back? I've downloaded a few trial programs but nothing seems to work.

Freerider.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 11863758
Try this: Take the original file (with the bookmarks) and open it in Acrobat, then select Document>Pages>Replace and select to replace all pages with the pages from your OCR'ed document.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Acrobat’s JavaScript is a great tool to extend the application, or to automate recurring tasks. There are several ways a JavaScript can be added to the application or a document (e.g. folder level scripts, validation scripts, event handling scripts,…
Inserting page numbers in Portable Document Files not only enhances manageability but also makes them look professional. With numbered pages, the file appears more organized and it becomes easier to search for a particular page. The size and the vol…
In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question