VB6 code to programmatically convert pdf to excel

Looking for VB6 code to programmatically convert PDF to Excel.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

KimputerIT ManagerCommented:
To convert PDF to Excel, you need EXTREME knowledge of PDF (and Excel for that matter). To have this knowledge, means you can earn money with it. Marketleader seems to be Nuance (for decades).
If they make money on it, it probably means you will need about 2000 man hours of VB6 coding to finish a successful PDF to Excel conversion. I don't think that's really worth it, and therefore I suggest you just buy the product (it's 100 bucks).
There are many free online and offline solutions, most are ad supported or adware, so try at your own risk.
fbroccoliAuthor Commented:
Thank you Kimputer for your prompt response.  I will pursue the product you suggest if I cannot get this to work (below).  I started playing with this after submitting my question:

Public acroavdoc As Acrobat.CAcroAVDoc
Public acroPDDoc As Acrobat.CAcroPDDoc

Set acroavdoc = CreateObject("AcroExch.AVDoc")
acroavdoc.Open("c:\test.pdf", "Accessing PDF's")
Set acroPDDoc = acroavdoc.GetPDDoc()

The parms for acropddoc.Save are (nType as integer, sFullPath as string)
The above code snippet will save to pdf so I'm thinking/hoping that if I can find the correct nType enumeration for an Excel file type, then it will create an Excel file.  And then, finding all the enumerations, I will also be able to save to MSWord, HTML, XML....etc...

Thank you again for your response,
KimputerIT ManagerCommented:
Acrobat only exposes the most simple routines to the user. The advanced coding, will never be made public (it's how they earn their money).
I've seen workarounds, but that involves the VB6 code opening Acrobat, and using SendKeys. Not a very nice solution.
Ensure Business Longevity with As-A-Service

Using the as-a-service approach for your business model allows you to grow your revenue stream with new practice areas, without forcing you to part ways with existing clients just because they don’t fit the mold of your new service offerings.

fbroccoliAuthor Commented:
I looked into Power PDF Converter, but it appears it requires user intervention.  For what I need, I can just have the user click on Save As=>Excel.  I do appreciate your help and sorry I said "prompt response", I meant "quick response".  I rarely use Experts Exchange, but have never received a response as quickly as I did yours.  I used to waiting much longer.  Thanks again.
KimputerIT ManagerCommented:
Most software like these don't expose API to the public (especially if a lot of settings need to be adjusted in GUI environment). I suggest you try again with Power PDF Converter, IN COMBINATION with AutoHotKey (since the windows/buttons/screens will probably quite predictable, it means you can "click" quickly).
Depending on the speed of the PC, you might not even see the window open/close and flashing
Joe Winograd, Fellow 2017DeveloperCommented:
> but it appears it requires user intervention

It does if you use its GUI, but not if you use its command line interface (CLI). Here's an EE article that shows how to use the CLI of Power PDF Advanced:
Batch Conversion of PDF, TIFF, and Other Image Formats via Command Line Interface to PDF, PDF Searchable, and TIFF

Following up on kimputer's excellent comments, PDF-to-Excel is tough stuff! I've had excellent results with this free online tool:

It does a good (but not perfect) job of maintaining the formatting, which is always the trick with any PDF-to-Excel (or PDF-to-Word) conversion. Another nice feature of this tool is that it performs OCR if the file is an image-only PDF, thereby automatically creating the text. Of course, you want local (VB6) code, so an online tool won't do it for you, but there is a local install of it, available for purchase and download (not free, but it has a 7-day free trial):

Another local install (not free, but reasonably priced at $27) is Boxoft PDF to Excel:

Yet another local install worth trying is A-PDF to Excel (also not free, but reasonably priced at $39, and there's a free trial):

Both Boxoft PDF to Excel and A-PDF to Excel require that the PDF have text (not just an image), e.g., if your docs are scanned images, the PDFs need to have been processed with OCR software. You may do that with any OCR tool that you have. Then the A-PDF and Boxoft products will be able to process the text generated by OCR and attempt to create a properly formatted Excel spreadsheet. If you don't have an OCR tool, Boxoft has a free one:

And A-PDF has a reasonably priced one ($27):

And here's a 5-minute EE video Micro Tutorial that shows another free product to perform OCR on PDFs:
How to OCR pages in a PDF with free software

Here's another approach instead of getting two products (one for OCR and one for conversion to Excel). Take a look at Nuance's PaperPort. Nuance, of course, is the same company that makes Power PDF. The standard edition of PaperPort (i.e., non-Professional) can do what you want. Although the list price is $99, the street price is usually less than half that. Right now, the latest version (14) is just $25 at Amazon:

It has been built-in OCR and a built-in feature to convert PDF docs to Excel. It can even scan directly to an Excel spreadsheet, as explained in these 5-minute EE video Micro Tutorials:
How to create custom scanning profiles in PaperPort - Part 1
How to create custom scanning profiles in PaperPort - Part 2

The Amazon purchase is for PaperPort 14.0, but this EE article explains how to upgrade for free to the latest point release, 14.5:
PaperPort 14 - Free Upgrade to Version 14.5

And this article shows how to apply the latest patch update (Patch 1):
How to install the Patch 1 update for PaperPort 14.5

Of course, if your PDF already has text (i.e., is not just a raster image/bitmap/graphic), then ignore all of my recommendations with respect to OCR.

As a disclaimer, I want to emphasize that I have no affiliation with any of the companies mentioned in this post and no financial interest in them whatsoever. I am simply a happy user/customer. Regards, Joe

Update: When I submitted this post, I see that kimputer commented on AutoHotkey. If you'd like to learn more about that, this EE article should help:
AutoHotkey - Getting Started

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
fbroccoliAuthor Commented:
Thank you Kimputer and Mr. Joe Winograd,

I have parsed many a pdf in the past but these new ones appear to have been scanned (or at least contain no text I can extract).   So, which software, or combination of software, would you suggest I purchase to convert these pdf's to Excel using a CLI with zero end user intervention?  

Thanks again for all your time.
Joe Winograd, Fellow 2017DeveloperCommented:
If they contain no text, then they're surely image-only PDFs, containing only raster graphics/images, so the method you use will have to perform OCR in order for conversion to Excel to be meaningful.

If you have many files with varying layouts, I don't think you'll get anything to work well — it's going to be error-prone. But if all the files have the same exact layout (only the data values are different), then you may be able to get an automated method to be reasonably effective in creating an Excel spreadsheet. But it's still not going to be 100% accurate, and for two reasons — OCR errors and PDF-to-Excel conversion errors.

So, first question: do all the files have the same exact layout or are the layouts different? Can you post a sample file (being careful not to include private/sensitive info)? Regards, Joe
fbroccoliAuthor Commented:
They all have the same layout, however, they do contain both private and sensitive information.
Joe Winograd, Fellow 2017DeveloperCommented:
Since it's a raster image, it should be easy to redact the private/sensitive info with an imaging tool. I'd like to see the table structure — i.e., what will become the columns and rows of the Excel spreadsheet.
fbroccoliAuthor Commented:
Well, I'm not sure if this will be of any help.  With the few tools I have, I captured the screen, blacked out the sensitive info, converted back to pdf.  These two attachments actually appear on one page of the original pdf that I wish I could send to you.
Joe Winograd, Fellow 2017DeveloperCommented:
> appear on one page of the original pdf that I wish I could send to you

Attached are the two images on one page. Also, I OCRed the PDF, which was just a raster image, as we suspected. It now has text from the OCR process in addition to the raster image.
fbroccoliAuthor Commented:
That is great!  So what/which software do you recommend I purchase to either convert to pdf where I can parse text or convert to Excel using CLI?   Thank you so much!
fbroccoliAuthor Commented:
Sorry, I should have included this in my last comment.  I'm kinda leaning toward ocr then conversion to Excel since these documents are apparently being scanned.  Grabbing the text via position may not be as consistent as reading cells from Excel.
Joe Winograd, Fellow 2017DeveloperCommented:
Yes, they're probably being scanned.

I really don't know what is going to work best for your particular docs. I don't even know if anything will work well. You should experiment with the software I mentioned earlier.

Here's a good first test. OCR the docs with the free software described in my 5-minute EE video Micro Tutorial:
How to OCR pages in a PDF with free software

You won't be able to call this with your VB6 code, but we'll worry about that later. For now, we just need the docs to have text in them via OCR. Then use the A-PDF To Excel product for the conversion (free trial available), which does have a command line interface that you could call from VB6 code. I emphasize that I have no idea how well it will do on your docs. Regards, Joe
fbroccoliAuthor Commented:
Understood and thanks Joe.  The OCR'd doc you sent me did not do a great job, however, I did have to screen capture the pdf, black out the sensitive information, save as image and convert back to pdf for the test.  So, I'm not giving up hope.  Worst case, I have the users open the pdf, click Save as, and export to text, or Excel or Word, and then run my program.
Joe Winograd, Fellow 2017DeveloperCommented:
> Worst case, I have the users open the pdf, click Save as, and export to text, or Excel or Word, and then run my program.

No need for that! If the PDF has text, then you can easily grab it in your VB6 code by calling one of the (free!) command line Xpdf utilities. Here's a 5-minute EE video Micro Tutorial that shows how to download and install them:
Xpdf - Command Line Utility for PDF Files

And here's another one that shows specifically how to extract the text:
Xpdf - Convert PDF Files to Plain Text Files

Before calling this from your VB6 code, experiment with the five different settings for the output mode:


Regards, Joe
fbroccoliAuthor Commented:
Joe, I retract my last statement that the ocr did not do a great job.  I'm an idiot.  I forgot that I had to black out the sensitive information in the document I sent you.  My bad.
Joe Winograd, Fellow 2017DeveloperCommented:
No problem, but I was surprised by the comment because the OCR looked reasonably accurate to me (not perfect, of course). Thanks for the correction. :)
fbroccoliAuthor Commented:
Thank you Joe for all your help and time!
Joe Winograd, Fellow 2017DeveloperCommented:
You're very welcome. Good luck on the project! Regards, Joe
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic Classic

From novice to tech pro — start learning today.