Solved

Convert .PDF

Posted on 2016-11-25
6
42 Views
Last Modified: 2016-11-30
I need to convert a .PDF document that contains tables into Excel or Word then write a macro to extract the data.  The document is 36,000 pages.  I have tried extracting a few pages then using some on-line sites to convert it.  The results are not exactly right, and I don't want to spend days breaking the document into smaller pieces, although I might have to.
Can someone recommend some software to do this?
0
Comment
Question by:rrhandle8
  • 3
  • 2
6 Comments
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 41901768
Is there any reason for converting it to Excel or Word? It sounds as if the goal is to extract the data. If that's the case, then there's no reason to convert it to Excel or Word. Instead, convert it to plain text and then write a program/script to extract the data into whatever you want (could be Excel or Word or most anything else).

Here's a 5-minute EE video Micro Tutorial that explains how to download the Xpdf utilities — Xpdf - Command Line Utility for PDF Files - Part 1:
https://www.experts-exchange.com/videos/213/

Here's another 5-minute EE video Micro Tutorial that discusses PDFtoText, the Xpdf tool that can convert a PDF file into plain text: — Xpdf - Convert PDF Files to Plain Text Files - Part 3:
https://www.experts-exchange.com/videos/217/

After creating the plain text file from the PDF, extract the data however you want. I would use a powerful programming/scripting language, and one with native COM support if you want to create an Excel or Word file, such as AutoHotkey, which is discussed in the EE article, AutoHotkey - Getting Started:
https://www.experts-exchange.com/articles/18346/

But, of course, use whatever language you prefer. Regards, Joe
0
 

Author Comment

by:rrhandle8
ID: 41901777
If converted to plain text how would I know what data is what?  If converted to tables in Word, it is easy to write some VBA code to loop through the tables and I always know that column 4 is the Address field.
0
 
LVL 51

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 41901807
> If converted to plain text how would I know what data is what?

By column headings and/or the table layout. I would experiment with the -layout and -table options of PDFtoText (https://www.experts-exchange.com/videos/217/).

But if you really want to convert to Excel or Word, here are some ideas for you.

For PDF to Word, I've had good (not perfect) results with this free online tool:
http://www.pdftoword.com/

If you prefer a local install, I've also had good (also not perfect) results with this free tool:
http://www.boxoft.com/pdf-to-word/

You may get better results with non-free products. I've gotten better (but still not perfect) results with Nuance's Power PDF (comes in both Standard and Advanced editions):
http://www.nuance.com/for-business/document-imaging-and-scanning/power-pdf-converter/index.htm

There's a free trial for the Advanced edition (but not Standard) so you can see how well it works for you before buying it:
http://www.nuance.com/for-business/imaging-solutions/document-conversion/power-pdf-converter/free-trial/index.htm

Another good (non-free) product is Able2Extract PDF Converter:
http://www.investintech.com/prod_downloadsa2e.htm

It also offers a free trial.

The first link in this post is to the (free) Nitro cloud. Nitro is a well-known name in PDF tools and their Nitro Pro has a PDF to Word feature:
http://www.nitropdf.com/pro/features/convert-export

There's also a free trial for this, but I've never used it, so can't vouch for its performance. However, it uses the same engine as the online tool, which I have used and is very good, so I would expect the same of Nitro Pro.

One more non-free product (but reasonably priced at $39) is CAD-KAS's PDF to Word:
http://www.cadkas.com/downengpdf9.php

I haven't used this product, but I have used their PDF Editor Objects, which is excellent. Based on the quality of PDF Editor Objects, I think that their PDF to Word is worth a try, and there's a free trial:
http://www.cadkas.com/pdf2word!.exe

It probably goes without saying, but Adobe Acrobat can do it — both Standard and Professional (but not Reader). As with everything, results aren't perfect.

I've been on previous threads here at EE where other experts have recommended these three (free) online tools:
http://www.convertpdftoword.org
http://www.pdfonline.com/pdf-to-word-converter
http://www.wondershare.net/pdf-converter/pdf-to-word-converter.html

I can't personally vouch for these, but based on the positive comments from other members, I'm passing them along for your consideration.

No matter which way you go, keep in mind that PDF-to-Word conversion is tricky business – maintaining the formatting/layout is tough stuff! I haven't found anything that is perfect, and results vary from one document to the next. So my suggestion is to put some, or all, of these products on your short list for evaluation. Define a few test docs – your docs! Compare the resulting Word files to see which, if any, of the tools produces Word files that are satisfactory.

For PDF to Excel, I've had good (not perfect) results with this free online tool:
http://www.pdftoexcel.org/

It does a decent job of maintaining the formatting, which is always the trick with any PDF-to-Excel (or PDF-to-Word) conversion. As mentioned above about Word, I don't know if it will work well on your particular PDFs, but it's worth a (free!) shot. If you do like it and would prefer a local install rather than the online tool, it is available for purchase and download (not free, but it has a 7-day free trial):
http://www.investintech.com/prod_downloadsa2e.htm

Another local install (not free, but reasonably priced at $27) is Boxoft PDF to Excel:
http://www.boxoft.com/pdf-to-excel/

Yet another local install worth trying is A-PDF to Excel (also not free, but reasonably priced at $39, and there's a free trial):
http://www.a-pdf.com/to-excel/index.htm

As mentioned in the Word section, Adobe Acrobat can also create Excel files — and also not perfect results.

As a disclaimer, I want to emphasize that I have no affiliation with any of the companies mentioned in my posts and no financial interest in them whatsoever. I am simply a happy user/customer. Regards, Joe
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 5

Expert Comment

by:Mdlinnett
ID: 41902072
Joe knows his stuff when it comes to PDF conversion, he's helped me in the past so stick with him.

I had good results with the Nuance tool he suggested above.  The free trial is very useful for testing in advance of a purchase.

Office 2016 does a pretty fine job of converting from pdf too, MS really worked on that functionality for this version.  2013 was pretty garbage.
0
 

Author Closing Comment

by:rrhandle8
ID: 41902121
Thanks for the help Joe.  I found a couple of the desktop version that worked well: Able2Extract and Nuance.  Some of the others were horrible.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 41907385
Mdlinnett,
Thanks for the nice words — I appreciate that!

rrhandle8,
I'm glad to hear that a couple of them worked well for you.

Regards, Joe
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

I would like to show you some basics you can do with Mailings in MS Word. It´s quite handy feature you can use for creating envelopes, labels, personalized letters etc. First question could be what is this feature good for? Mailing can really he…
This is written from a 'VBA for MS Word' perspective, but I am sure it applies to most other MS Office components where VBA is used.  One thing that really bugs me is slow code, ESPECIALLY when it's mine!  In programming there are so many ways to…
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now