?
Solved

Convert .PDF

Posted on 2016-11-25
6
Medium Priority
?
105 Views
Last Modified: 2016-11-30
I need to convert a .PDF document that contains tables into Excel or Word then write a macro to extract the data.  The document is 36,000 pages.  I have tried extracting a few pages then using some on-line sites to convert it.  The results are not exactly right, and I don't want to spend days breaking the document into smaller pieces, although I might have to.
Can someone recommend some software to do this?
0
Comment
Question by:rrhandle8
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 41901768
Is there any reason for converting it to Excel or Word? It sounds as if the goal is to extract the data. If that's the case, then there's no reason to convert it to Excel or Word. Instead, convert it to plain text and then write a program/script to extract the data into whatever you want (could be Excel or Word or most anything else).

Here's a 5-minute EE video Micro Tutorial that explains how to download the Xpdf utilities — Xpdf - Command Line Utility for PDF Files - Part 1:
https://www.experts-exchange.com/videos/213/

Here's another 5-minute EE video Micro Tutorial that discusses PDFtoText, the Xpdf tool that can convert a PDF file into plain text: — Xpdf - Convert PDF Files to Plain Text Files - Part 3:
https://www.experts-exchange.com/videos/217/

After creating the plain text file from the PDF, extract the data however you want. I would use a powerful programming/scripting language, and one with native COM support if you want to create an Excel or Word file, such as AutoHotkey, which is discussed in the EE article, AutoHotkey - Getting Started:
https://www.experts-exchange.com/articles/18346/

But, of course, use whatever language you prefer. Regards, Joe
0
 

Author Comment

by:rrhandle8
ID: 41901777
If converted to plain text how would I know what data is what?  If converted to tables in Word, it is easy to write some VBA code to loop through the tables and I always know that column 4 is the Address field.
0
 
LVL 55

Accepted Solution

by:
Joe Winograd, EE MVE 2015&2016 earned 2000 total points
ID: 41901807
> If converted to plain text how would I know what data is what?

By column headings and/or the table layout. I would experiment with the -layout and -table options of PDFtoText (https://www.experts-exchange.com/videos/217/).

But if you really want to convert to Excel or Word, here are some ideas for you.

For PDF to Word, I've had good (not perfect) results with this free online tool:
http://www.pdftoword.com/

If you prefer a local install, I've also had good (also not perfect) results with this free tool:
http://www.boxoft.com/pdf-to-word/

You may get better results with non-free products. I've gotten better (but still not perfect) results with Nuance's Power PDF (comes in both Standard and Advanced editions):
http://www.nuance.com/for-business/document-imaging-and-scanning/power-pdf-converter/index.htm

There's a free trial for the Advanced edition (but not Standard) so you can see how well it works for you before buying it:
http://www.nuance.com/for-business/imaging-solutions/document-conversion/power-pdf-converter/free-trial/index.htm

Another good (non-free) product is Able2Extract PDF Converter:
http://www.investintech.com/prod_downloadsa2e.htm

It also offers a free trial.

The first link in this post is to the (free) Nitro cloud. Nitro is a well-known name in PDF tools and their Nitro Pro has a PDF to Word feature:
http://www.nitropdf.com/pro/features/convert-export

There's also a free trial for this, but I've never used it, so can't vouch for its performance. However, it uses the same engine as the online tool, which I have used and is very good, so I would expect the same of Nitro Pro.

One more non-free product (but reasonably priced at $39) is CAD-KAS's PDF to Word:
http://www.cadkas.com/downengpdf9.php

I haven't used this product, but I have used their PDF Editor Objects, which is excellent. Based on the quality of PDF Editor Objects, I think that their PDF to Word is worth a try, and there's a free trial:
http://www.cadkas.com/pdf2word!.exe

It probably goes without saying, but Adobe Acrobat can do it — both Standard and Professional (but not Reader). As with everything, results aren't perfect.

I've been on previous threads here at EE where other experts have recommended these three (free) online tools:
http://www.convertpdftoword.org
http://www.pdfonline.com/pdf-to-word-converter
http://www.wondershare.net/pdf-converter/pdf-to-word-converter.html

I can't personally vouch for these, but based on the positive comments from other members, I'm passing them along for your consideration.

No matter which way you go, keep in mind that PDF-to-Word conversion is tricky business – maintaining the formatting/layout is tough stuff! I haven't found anything that is perfect, and results vary from one document to the next. So my suggestion is to put some, or all, of these products on your short list for evaluation. Define a few test docs – your docs! Compare the resulting Word files to see which, if any, of the tools produces Word files that are satisfactory.

For PDF to Excel, I've had good (not perfect) results with this free online tool:
http://www.pdftoexcel.org/

It does a decent job of maintaining the formatting, which is always the trick with any PDF-to-Excel (or PDF-to-Word) conversion. As mentioned above about Word, I don't know if it will work well on your particular PDFs, but it's worth a (free!) shot. If you do like it and would prefer a local install rather than the online tool, it is available for purchase and download (not free, but it has a 7-day free trial):
http://www.investintech.com/prod_downloadsa2e.htm

Another local install (not free, but reasonably priced at $27) is Boxoft PDF to Excel:
http://www.boxoft.com/pdf-to-excel/

Yet another local install worth trying is A-PDF to Excel (also not free, but reasonably priced at $39, and there's a free trial):
http://www.a-pdf.com/to-excel/index.htm

As mentioned in the Word section, Adobe Acrobat can also create Excel files — and also not perfect results.

As a disclaimer, I want to emphasize that I have no affiliation with any of the companies mentioned in my posts and no financial interest in them whatsoever. I am simply a happy user/customer. Regards, Joe
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 5

Expert Comment

by:Mdlinnett
ID: 41902072
Joe knows his stuff when it comes to PDF conversion, he's helped me in the past so stick with him.

I had good results with the Nuance tool he suggested above.  The free trial is very useful for testing in advance of a purchase.

Office 2016 does a pretty fine job of converting from pdf too, MS really worked on that functionality for this version.  2013 was pretty garbage.
0
 

Author Closing Comment

by:rrhandle8
ID: 41902121
Thanks for the help Joe.  I found a couple of the desktop version that worked well: Able2Extract and Nuance.  Some of the others were horrible.
0
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 41907385
Mdlinnett,
Thanks for the nice words — I appreciate that!

rrhandle8,
I'm glad to hear that a couple of them worked well for you.

Regards, Joe
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
Ever visit a website where you spotted a really cool looking Font, yet couldn't figure out which font family it belonged to, or how to get a copy of it for your own use? This article explains the process of doing exactly that, as well as showing how…
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question