[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 146
  • Last Modified:

Convert .PDF

I need to convert a .PDF document that contains tables into Excel or Word then write a macro to extract the data.  The document is 36,000 pages.  I have tried extracting a few pages then using some on-line sites to convert it.  The results are not exactly right, and I don't want to spend days breaking the document into smaller pieces, although I might have to.
Can someone recommend some software to do this?
0
rrhandle8
Asked:
rrhandle8
  • 3
  • 2
1 Solution
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
Is there any reason for converting it to Excel or Word? It sounds as if the goal is to extract the data. If that's the case, then there's no reason to convert it to Excel or Word. Instead, convert it to plain text and then write a program/script to extract the data into whatever you want (could be Excel or Word or most anything else).

Here's a 5-minute EE video Micro Tutorial that explains how to download the Xpdf utilities — Xpdf - Command Line Utility for PDF Files - Part 1:
https://www.experts-exchange.com/videos/213/

Here's another 5-minute EE video Micro Tutorial that discusses PDFtoText, the Xpdf tool that can convert a PDF file into plain text: — Xpdf - Convert PDF Files to Plain Text Files - Part 3:
https://www.experts-exchange.com/videos/217/

After creating the plain text file from the PDF, extract the data however you want. I would use a powerful programming/scripting language, and one with native COM support if you want to create an Excel or Word file, such as AutoHotkey, which is discussed in the EE article, AutoHotkey - Getting Started:
https://www.experts-exchange.com/articles/18346/

But, of course, use whatever language you prefer. Regards, Joe
0
 
rrhandle8Author Commented:
If converted to plain text how would I know what data is what?  If converted to tables in Word, it is easy to write some VBA code to loop through the tables and I always know that column 4 is the Address field.
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
> If converted to plain text how would I know what data is what?

By column headings and/or the table layout. I would experiment with the -layout and -table options of PDFtoText (https://www.experts-exchange.com/videos/217/).

But if you really want to convert to Excel or Word, here are some ideas for you.

For PDF to Word, I've had good (not perfect) results with this free online tool:
http://www.pdftoword.com/

If you prefer a local install, I've also had good (also not perfect) results with this free tool:
http://www.boxoft.com/pdf-to-word/

You may get better results with non-free products. I've gotten better (but still not perfect) results with Nuance's Power PDF (comes in both Standard and Advanced editions):
http://www.nuance.com/for-business/document-imaging-and-scanning/power-pdf-converter/index.htm

There's a free trial for the Advanced edition (but not Standard) so you can see how well it works for you before buying it:
http://www.nuance.com/for-business/imaging-solutions/document-conversion/power-pdf-converter/free-trial/index.htm

Another good (non-free) product is Able2Extract PDF Converter:
http://www.investintech.com/prod_downloadsa2e.htm

It also offers a free trial.

The first link in this post is to the (free) Nitro cloud. Nitro is a well-known name in PDF tools and their Nitro Pro has a PDF to Word feature:
http://www.nitropdf.com/pro/features/convert-export

There's also a free trial for this, but I've never used it, so can't vouch for its performance. However, it uses the same engine as the online tool, which I have used and is very good, so I would expect the same of Nitro Pro.

One more non-free product (but reasonably priced at $39) is CAD-KAS's PDF to Word:
http://www.cadkas.com/downengpdf9.php

I haven't used this product, but I have used their PDF Editor Objects, which is excellent. Based on the quality of PDF Editor Objects, I think that their PDF to Word is worth a try, and there's a free trial:
http://www.cadkas.com/pdf2word!.exe

It probably goes without saying, but Adobe Acrobat can do it — both Standard and Professional (but not Reader). As with everything, results aren't perfect.

I've been on previous threads here at EE where other experts have recommended these three (free) online tools:
http://www.convertpdftoword.org
http://www.pdfonline.com/pdf-to-word-converter
http://www.wondershare.net/pdf-converter/pdf-to-word-converter.html

I can't personally vouch for these, but based on the positive comments from other members, I'm passing them along for your consideration.

No matter which way you go, keep in mind that PDF-to-Word conversion is tricky business – maintaining the formatting/layout is tough stuff! I haven't found anything that is perfect, and results vary from one document to the next. So my suggestion is to put some, or all, of these products on your short list for evaluation. Define a few test docs – your docs! Compare the resulting Word files to see which, if any, of the tools produces Word files that are satisfactory.

For PDF to Excel, I've had good (not perfect) results with this free online tool:
http://www.pdftoexcel.org/

It does a decent job of maintaining the formatting, which is always the trick with any PDF-to-Excel (or PDF-to-Word) conversion. As mentioned above about Word, I don't know if it will work well on your particular PDFs, but it's worth a (free!) shot. If you do like it and would prefer a local install rather than the online tool, it is available for purchase and download (not free, but it has a 7-day free trial):
http://www.investintech.com/prod_downloadsa2e.htm

Another local install (not free, but reasonably priced at $27) is Boxoft PDF to Excel:
http://www.boxoft.com/pdf-to-excel/

Yet another local install worth trying is A-PDF to Excel (also not free, but reasonably priced at $39, and there's a free trial):
http://www.a-pdf.com/to-excel/index.htm

As mentioned in the Word section, Adobe Acrobat can also create Excel files — and also not perfect results.

As a disclaimer, I want to emphasize that I have no affiliation with any of the companies mentioned in my posts and no financial interest in them whatsoever. I am simply a happy user/customer. Regards, Joe
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
MdlinnettCommented:
Joe knows his stuff when it comes to PDF conversion, he's helped me in the past so stick with him.

I had good results with the Nuance tool he suggested above.  The free trial is very useful for testing in advance of a purchase.

Office 2016 does a pretty fine job of converting from pdf too, MS really worked on that functionality for this version.  2013 was pretty garbage.
0
 
rrhandle8Author Commented:
Thanks for the help Joe.  I found a couple of the desktop version that worked well: Able2Extract and Nuance.  Some of the others were horrible.
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
Mdlinnett,
Thanks for the nice words — I appreciate that!

rrhandle8,
I'm glad to hear that a couple of them worked well for you.

Regards, Joe
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now