Solved

Convert .PDF

Posted on 2016-11-25
6
69 Views
Last Modified: 2016-11-30
I need to convert a .PDF document that contains tables into Excel or Word then write a macro to extract the data.  The document is 36,000 pages.  I have tried extracting a few pages then using some on-line sites to convert it.  The results are not exactly right, and I don't want to spend days breaking the document into smaller pieces, although I might have to.
Can someone recommend some software to do this?
0
Comment
Question by:rrhandle8
  • 3
  • 2
6 Comments
 
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 41901768
Is there any reason for converting it to Excel or Word? It sounds as if the goal is to extract the data. If that's the case, then there's no reason to convert it to Excel or Word. Instead, convert it to plain text and then write a program/script to extract the data into whatever you want (could be Excel or Word or most anything else).

Here's a 5-minute EE video Micro Tutorial that explains how to download the Xpdf utilities — Xpdf - Command Line Utility for PDF Files - Part 1:
https://www.experts-exchange.com/videos/213/

Here's another 5-minute EE video Micro Tutorial that discusses PDFtoText, the Xpdf tool that can convert a PDF file into plain text: — Xpdf - Convert PDF Files to Plain Text Files - Part 3:
https://www.experts-exchange.com/videos/217/

After creating the plain text file from the PDF, extract the data however you want. I would use a powerful programming/scripting language, and one with native COM support if you want to create an Excel or Word file, such as AutoHotkey, which is discussed in the EE article, AutoHotkey - Getting Started:
https://www.experts-exchange.com/articles/18346/

But, of course, use whatever language you prefer. Regards, Joe
0
 

Author Comment

by:rrhandle8
ID: 41901777
If converted to plain text how would I know what data is what?  If converted to tables in Word, it is easy to write some VBA code to loop through the tables and I always know that column 4 is the Address field.
0
 
LVL 53

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 41901807
> If converted to plain text how would I know what data is what?

By column headings and/or the table layout. I would experiment with the -layout and -table options of PDFtoText (https://www.experts-exchange.com/videos/217/).

But if you really want to convert to Excel or Word, here are some ideas for you.

For PDF to Word, I've had good (not perfect) results with this free online tool:
http://www.pdftoword.com/

If you prefer a local install, I've also had good (also not perfect) results with this free tool:
http://www.boxoft.com/pdf-to-word/

You may get better results with non-free products. I've gotten better (but still not perfect) results with Nuance's Power PDF (comes in both Standard and Advanced editions):
http://www.nuance.com/for-business/document-imaging-and-scanning/power-pdf-converter/index.htm

There's a free trial for the Advanced edition (but not Standard) so you can see how well it works for you before buying it:
http://www.nuance.com/for-business/imaging-solutions/document-conversion/power-pdf-converter/free-trial/index.htm

Another good (non-free) product is Able2Extract PDF Converter:
http://www.investintech.com/prod_downloadsa2e.htm

It also offers a free trial.

The first link in this post is to the (free) Nitro cloud. Nitro is a well-known name in PDF tools and their Nitro Pro has a PDF to Word feature:
http://www.nitropdf.com/pro/features/convert-export

There's also a free trial for this, but I've never used it, so can't vouch for its performance. However, it uses the same engine as the online tool, which I have used and is very good, so I would expect the same of Nitro Pro.

One more non-free product (but reasonably priced at $39) is CAD-KAS's PDF to Word:
http://www.cadkas.com/downengpdf9.php

I haven't used this product, but I have used their PDF Editor Objects, which is excellent. Based on the quality of PDF Editor Objects, I think that their PDF to Word is worth a try, and there's a free trial:
http://www.cadkas.com/pdf2word!.exe

It probably goes without saying, but Adobe Acrobat can do it — both Standard and Professional (but not Reader). As with everything, results aren't perfect.

I've been on previous threads here at EE where other experts have recommended these three (free) online tools:
http://www.convertpdftoword.org
http://www.pdfonline.com/pdf-to-word-converter
http://www.wondershare.net/pdf-converter/pdf-to-word-converter.html

I can't personally vouch for these, but based on the positive comments from other members, I'm passing them along for your consideration.

No matter which way you go, keep in mind that PDF-to-Word conversion is tricky business – maintaining the formatting/layout is tough stuff! I haven't found anything that is perfect, and results vary from one document to the next. So my suggestion is to put some, or all, of these products on your short list for evaluation. Define a few test docs – your docs! Compare the resulting Word files to see which, if any, of the tools produces Word files that are satisfactory.

For PDF to Excel, I've had good (not perfect) results with this free online tool:
http://www.pdftoexcel.org/

It does a decent job of maintaining the formatting, which is always the trick with any PDF-to-Excel (or PDF-to-Word) conversion. As mentioned above about Word, I don't know if it will work well on your particular PDFs, but it's worth a (free!) shot. If you do like it and would prefer a local install rather than the online tool, it is available for purchase and download (not free, but it has a 7-day free trial):
http://www.investintech.com/prod_downloadsa2e.htm

Another local install (not free, but reasonably priced at $27) is Boxoft PDF to Excel:
http://www.boxoft.com/pdf-to-excel/

Yet another local install worth trying is A-PDF to Excel (also not free, but reasonably priced at $39, and there's a free trial):
http://www.a-pdf.com/to-excel/index.htm

As mentioned in the Word section, Adobe Acrobat can also create Excel files — and also not perfect results.

As a disclaimer, I want to emphasize that I have no affiliation with any of the companies mentioned in my posts and no financial interest in them whatsoever. I am simply a happy user/customer. Regards, Joe
0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 
LVL 5

Expert Comment

by:Mdlinnett
ID: 41902072
Joe knows his stuff when it comes to PDF conversion, he's helped me in the past so stick with him.

I had good results with the Nuance tool he suggested above.  The free trial is very useful for testing in advance of a purchase.

Office 2016 does a pretty fine job of converting from pdf too, MS really worked on that functionality for this version.  2013 was pretty garbage.
0
 

Author Closing Comment

by:rrhandle8
ID: 41902121
Thanks for the help Joe.  I found a couple of the desktop version that worked well: Able2Extract and Nuance.  Some of the others were horrible.
0
 
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 41907385
Mdlinnett,
Thanks for the nice words — I appreciate that!

rrhandle8,
I'm glad to hear that a couple of them worked well for you.

Regards, Joe
0

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Getting rid of #VALUE! 7 23
SKip past fields with no data 6 17
EXCEL FORMULA - first three characters in a cell 2 27
Filtering - Visible Rows 22 28
Nice table. Huge mess. Maybe this was something you created way back before you figured out tabs or a document you received from someone else. Either way, using the spacebar to separate the columns resulted in a mess. Trying to convert text to t…
Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.
This Experts Exchange video Micro Tutorial shows how to tell Microsoft Office that a word is NOT spelled correctly. Microsoft Office has a built-in, main dictionary that is shared by Office apps, including Excel, Outlook, PowerPoint, and Word. When …

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question