Solved

Adobe PDF Data to MS Access

Posted on 2013-12-05
2
320 Views
Last Modified: 2013-12-25
This pdf file has two sections I would like to inquire about a solution to place the data into a database.

In section 2 it has product and ingredient info.  Because of this I am thinking this would make three tables in access.  

Table 1. Header info
Table 2. Product Info
Table 3. Product ingredient Info


Please advise with any ideas.  What I am trying to accomplish is importing hundreds of pdf files in this format into a database to do analysis rather than typing them in.

Any ideas will be appreciated.

Thanks
Test-Doc.pdf
0
Comment
Question by:MPDenver
2 Comments
 
LVL 84

Assisted Solution

by:Scott McDaniel (Microsoft Access MVP - EE MVE )
Scott McDaniel (Microsoft Access MVP - EE MVE ) earned 200 total points
ID: 39700618
You can do this manually by opening the PDF in Adobe Reader and Saving as Text, then import that Text file into Access. If you have hundreds of PDFs, however, that would be a time consuming task.

You cannot do this natively in Access VBA, unfortunately. You must use a 3rd party tool, like the Adobe SDK:

http://www.adobe.com/devnet/acrobat.html

There are also some other third party tools that might work:

http://www.foolabs.com/xpdf/download.html

xPDF converts a PDF to Text, so you could loop through all the files in a folder, convert the .PDF to .TXT and then use TransferText to move the data into your database.

You'd first create a Batch file that calls the xPDF utility, and then "run" that batch file for each PDF. The batch file would look like this:

@echo off
pdftotext.exe -layout %1

You'd then loop through the files, and call that batch file (let's name it ConvertPDF.bat):

Dim sFile As String
sFile = Dir("YOur Folder Directory here", "*.PDF")

Do Until Len(sFile) = 0
  ConvertPDF sFile
  Dir
Loop

Open in new window


After running this, you'll have a directory full of .txt files, so do the same basic thing:

Dim sTxt As String
sTxt = Dir("Your Folder Directory, "*.txt")

Do Until len(sTxt) = 0
  DoCmd.TransferText acImport, , "YourTable", sTxt
  Dir
Loop

See here for more info on TransferText: http://msdn.microsoft.com/en-us/library/office/ff835958.aspx

One caveat - xPDF can take a few moments to convert a PDF to Text, so you might want to put a "wait" function in. To do that:

Public Function Wait(NumSeconds As Integer)
  On Error Resume Next
  Dim dStop as Date
  dStop = DateAdd("s", NumSeconds, Now)

  Do While dStop < Now
     DoEvents
  Loop
End Function

Open in new window


You'd then call it like this:

Do Until Len(sFile) = 0
  ConvertPDF sFile
  Wait 5 '/ wait for 5 seconds
  Dir
Loop

Finally, you could combine the xPDF and TransferText, if you'd like, but that's another exercise ...
0
 
LVL 44

Accepted Solution

by:
Karl Heinz Kremer earned 300 total points
ID: 39707458
Do you have Adobe Acrobat? If so, you can automate it via Access VBA. I wrote a few tutorials that illustrate how you would integrate Acrobat via VBA (and JavaScript):

http://khkonsulting.com/2009/03/adobe-acrobat-and-vba-an-introduction/
http://khkonsulting.com/2009/03/acrobat-javascript-and-vb-walk-into-a-bar/
http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/

You would need the Acrobat SDK (which in this case is just documentation), which you can download for free from here: http://www.adobe.com/devnet/acrobat.html
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe INTRODUCTION This Article is a follow-up to the Article entitled How To Rename-Move a Batch of PDF Files Based o…
A simple tool to export all objects of two Access files as text and compare it with Meld, a free diff tool.
Learn how to number pages in an Access report over each group. Activate two pass printing by referencing the pages property: Add code to the Page Footers OnFormat event to capture the pages as there occur for each group. Use the pages property to …
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now