Solved

Adobe PDF Data to MS Access

Posted on 2013-12-05
2
325 Views
Last Modified: 2013-12-25
This pdf file has two sections I would like to inquire about a solution to place the data into a database.

In section 2 it has product and ingredient info.  Because of this I am thinking this would make three tables in access.  

Table 1. Header info
Table 2. Product Info
Table 3. Product ingredient Info


Please advise with any ideas.  What I am trying to accomplish is importing hundreds of pdf files in this format into a database to do analysis rather than typing them in.

Any ideas will be appreciated.

Thanks
Test-Doc.pdf
0
Comment
Question by:MPDenver
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 84

Assisted Solution

by:Scott McDaniel (Microsoft Access MVP - EE MVE )
Scott McDaniel (Microsoft Access MVP - EE MVE ) earned 200 total points
ID: 39700618
You can do this manually by opening the PDF in Adobe Reader and Saving as Text, then import that Text file into Access. If you have hundreds of PDFs, however, that would be a time consuming task.

You cannot do this natively in Access VBA, unfortunately. You must use a 3rd party tool, like the Adobe SDK:

http://www.adobe.com/devnet/acrobat.html

There are also some other third party tools that might work:

http://www.foolabs.com/xpdf/download.html

xPDF converts a PDF to Text, so you could loop through all the files in a folder, convert the .PDF to .TXT and then use TransferText to move the data into your database.

You'd first create a Batch file that calls the xPDF utility, and then "run" that batch file for each PDF. The batch file would look like this:

@echo off
pdftotext.exe -layout %1

You'd then loop through the files, and call that batch file (let's name it ConvertPDF.bat):

Dim sFile As String
sFile = Dir("YOur Folder Directory here", "*.PDF")

Do Until Len(sFile) = 0
  ConvertPDF sFile
  Dir
Loop

Open in new window


After running this, you'll have a directory full of .txt files, so do the same basic thing:

Dim sTxt As String
sTxt = Dir("Your Folder Directory, "*.txt")

Do Until len(sTxt) = 0
  DoCmd.TransferText acImport, , "YourTable", sTxt
  Dir
Loop

See here for more info on TransferText: http://msdn.microsoft.com/en-us/library/office/ff835958.aspx

One caveat - xPDF can take a few moments to convert a PDF to Text, so you might want to put a "wait" function in. To do that:

Public Function Wait(NumSeconds As Integer)
  On Error Resume Next
  Dim dStop as Date
  dStop = DateAdd("s", NumSeconds, Now)

  Do While dStop < Now
     DoEvents
  Loop
End Function

Open in new window


You'd then call it like this:

Do Until Len(sFile) = 0
  ConvertPDF sFile
  Wait 5 '/ wait for 5 seconds
  Dir
Loop

Finally, you could combine the xPDF and TransferText, if you'd like, but that's another exercise ...
0
 
LVL 44

Accepted Solution

by:
Karl Heinz Kremer earned 300 total points
ID: 39707458
Do you have Adobe Acrobat? If so, you can automate it via Access VBA. I wrote a few tutorials that illustrate how you would integrate Acrobat via VBA (and JavaScript):

http://khkonsulting.com/2009/03/adobe-acrobat-and-vba-an-introduction/
http://khkonsulting.com/2009/03/acrobat-javascript-and-vb-walk-into-a-bar/
http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/

You would need the Acrobat SDK (which in this case is just documentation), which you can download for free from here: http://www.adobe.com/devnet/acrobat.html
0

Featured Post

Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Phishing attempts can come in all forms, shapes and sizes. No matter how familiar you think you are with them, always remember to take extra precaution when opening an email with attachments or links.
In Part II of this series, I will discuss how to identify all open instances of Excel and enumerate the workbooks, spreadsheets, and named ranges within each of those instances.
In Microsoft Access, learn different ways of passing a string value within a string argument. Also learn what a “Type Mis-match” error is about.
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question