<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

Convert PDF to text with Adobe Reader and MS Office

Published on
9,549 Points
3,349 Views
2 Endorsements
Last Modified:
Approved
The ability to edit PDF documents can be useful, however it may not be a straight forward process. Many non-technical people don't realise that a PDF document is basically an image rather than a text file, even if it contains nothing but text.

If the PDF document was created via tools in MS Word or similar, then a simple copy and paste should get you the text and most formatting. However if the PDF was a scanned document or created from a bitmap image, that option is not available.

At times you may also receive a protected PDF that doesn't allow copying of text but does allow printing.

You may also have an old version of Adobe Reader that doesn't have the tool to copy text, and for various reasons you cannot perform the update.

In these cases to extract the text from a PDF document, you need to perform Optical Character Recognition (OCR). The level of success with doing this depends on the quality of the original document you are converting, and the quality of the OCR software.

Products such as Adobe Acrobat Professional do have quality OCR and the Export tool is specifically designed to convert PDF into a Word document or other format. This software doesn't come cheap, however there are some alternatives. Depending on the level of security on your network, you may not be able to install the free alternatives. Before seeking your IT department's assistance, you can utilise software they most likely have already provided you.

This article makes the assumption that you have Adobe Reader 7 or higher and Microsoft Office 2003 or higher installed. Most businesses have these available for all staff.

1. Open the PDF document and go to File > Print

2. Chose Microsoft Office Document Image Writer from the list of printers.

Make any other changes required such as pages to print.
Click OK. This will create a TIF file, which is required in the next steps.Print dialogue box

3. Choose a location to save the file.

The Desktop is the preferred location for simplicity.

4. Load the MS Office Document Imaging software.

This can be launched by double clicking on C:\Program Files\Common Files\Microsoft Shared\MODI\11.0\MSPVIEW.EXE
Depending on your version of Word, the exact location may differ. For Office 2003 it is ...\MODI\11.0. For Office 2007 it is ...\MODI\12.0. The number should be the only folder in that directory unless you have multiple versions of Office installed.

You then need to open the file saved from the previous step.

5. Select Tools > Send Text to Word.

Send text to word menu item

6. Retain the default values.

The default folder can be changed to anything you like. This file will be the Word document and will have the same name as you selected in step 3.
Click OK.Default options and save location

7. Click OK on the confirmation message

Confirm process before starting

8. The program will process the request.

Progress bar

9. Once complete MS Word will open with the text for editing. Edit and save as necessary.

Here is a sample of the source PDF followed by the output.
Source PDFoutput example
As you can see, this process is not perfect and your results will vary. Formatting may be lost and images may not transfer, however any text on the images such as logos or letterheads will. The text is either presented in tables and left aligned, or as html. This differs on the version of Word you have installed.

This method is useful for obtaining the text from a scanned document to then be reformatted into a different layout. If you want an exact conversion with formatting and images intact, then you will require other OCR software which will usually entail an investment.

While there are several free versions that do an adequate job - often better than the above - they are freeware and many IT departments won't allow them on the network. Adobe Acrobat Standard offers a nice halfway point between Reader and Professional in both price and features. I would recommend that product if you require better text recognition.

Enjoy.
2
Comment
Author:Rartemass
1 Comment
 

Administrative Comment

by:Netminder
Rartemass,

Thank you for your submission; your article has now been published. Congratulations!

Netminder
Senior Admin
0

Featured Post

Exploring SQL Server 2016: Fundamentals

Learn the fundamentals of Microsoft SQL Server, a relational database management system that stores and retrieves data when requested by other software applications.

Join & Write a Comment

In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month