<

Convert PDF to text with Adobe Reader and MS Office

Published on
9,427 Points
3,227 Views
2 Endorsements
Last Modified:
Approved
The ability to edit PDF documents can be useful, however it may not be a straight forward process. Many non-technical people don't realise that a PDF document is basically an image rather than a text file, even if it contains nothing but text.

If the PDF document was created via tools in MS Word or similar, then a simple copy and paste should get you the text and most formatting. However if the PDF was a scanned document or created from a bitmap image, that option is not available.

At times you may also receive a protected PDF that doesn't allow copying of text but does allow printing.

You may also have an old version of Adobe Reader that doesn't have the tool to copy text, and for various reasons you cannot perform the update.

In these cases to extract the text from a PDF document, you need to perform Optical Character Recognition (OCR). The level of success with doing this depends on the quality of the original document you are converting, and the quality of the OCR software.

Products such as Adobe Acrobat Professional do have quality OCR and the Export tool is specifically designed to convert PDF into a Word document or other format. This software doesn't come cheap, however there are some alternatives. Depending on the level of security on your network, you may not be able to install the free alternatives. Before seeking your IT department's assistance, you can utilise software they most likely have already provided you.

This article makes the assumption that you have Adobe Reader 7 or higher and Microsoft Office 2003 or higher installed. Most businesses have these available for all staff.

1. Open the PDF document and go to File > Print

2. Chose Microsoft Office Document Image Writer from the list of printers.

Make any other changes required such as pages to print.
Click OK. This will create a TIF file, which is required in the next steps.Print dialogue box

3. Choose a location to save the file.

The Desktop is the preferred location for simplicity.

4. Load the MS Office Document Imaging software.

This can be launched by double clicking on C:\Program Files\Common Files\Microsoft Shared\MODI\11.0\MSPVIEW.EXE
Depending on your version of Word, the exact location may differ. For Office 2003 it is ...\MODI\11.0. For Office 2007 it is ...\MODI\12.0. The number should be the only folder in that directory unless you have multiple versions of Office installed.

You then need to open the file saved from the previous step.

5. Select Tools > Send Text to Word.

Send text to word menu item

6. Retain the default values.

The default folder can be changed to anything you like. This file will be the Word document and will have the same name as you selected in step 3.
Click OK.Default options and save location

7. Click OK on the confirmation message

Confirm process before starting

8. The program will process the request.

Progress bar

9. Once complete MS Word will open with the text for editing. Edit and save as necessary.

Here is a sample of the source PDF followed by the output.
Source PDFoutput example
As you can see, this process is not perfect and your results will vary. Formatting may be lost and images may not transfer, however any text on the images such as logos or letterheads will. The text is either presented in tables and left aligned, or as html. This differs on the version of Word you have installed.

This method is useful for obtaining the text from a scanned document to then be reformatted into a different layout. If you want an exact conversion with formatting and images intact, then you will require other OCR software which will usually entail an investment.

While there are several free versions that do an adequate job - often better than the above - they are freeware and many IT departments won't allow them on the network. Adobe Acrobat Standard offers a nice halfway point between Reader and Professional in both price and features. I would recommend that product if you require better text recognition.

Enjoy.
2
Comment
Author:Rartemass
1 Comment
 
 

Administrative Comment

by:Netminder
Rartemass,

Thank you for your submission; your article has now been published. Congratulations!

Netminder
Senior Admin
0

Featured Post

Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

Join & Write a Comment

In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This vide…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month