Solved

Get Text from an Acrobat PDF file into MS-Access database using IAC or DOM interface

Posted on 2006-06-17
3
4,817 Views
Last Modified: 2012-08-14
I need to be able to parse a PDF file and get the data into MS-Access.  Ideally the user should be able to hilite some text or a table in the PDF, and make entries in an MS-Access form to say what is to be done, then push a button and have it happen.  How do I connect the connect the selection in the PDF with an AcroPDTextSelect object in the VBA?  The only way seems to be to set the selection from the VBA, but I can't figure out what to put in the AcroRect object to use AcroExch.PDDoc.CreateTextSelect.  When I try to build an AcroHiliteList object to use AcroPDPage.CreateWordHilite, it bombs on the .Add method.

The other alternative I have looked at is using the Accessibility (AcrobatAccess.lib) DOM interface, but I haven't found out how to attach the root IPDDomElement to a file.

Here is my test code.

***********Begin code
Option Compare Database
Option Explicit

' Copyright (c) 2006, Lyle Anderson  All rights reserved.
Dim appAcrobat As AcroApp
Dim strPDFfilePath As String
Dim pdfForEdit As AcroAVDoc
Dim avpPageForEdit As AcroAVPageView
Dim pddDocumentForEdit As AcroPDDoc
Dim avdDocumentForEdit As AcroAVDoc
Dim pdpPageForEdit As AcroPDPage
Dim pdtTextSelection As AcroPDTextSelect
Dim bolAcrobatForEditRunning As Boolean
Dim lngCurrentPageNum As Long
Dim recCurrentRectangle As AcroRect
Dim objHiliteList As AcroHiliteList
Public Sub readPDFdocument(varPath As Variant)
    Dim strPath As String
    If IsBlank(varPath) Then
        strPath = getPDFfilePath()
    Else
        strPath = setPDFfilePath(varPath)
    End If
    If appAcrobat Is Nothing Then
        Set appAcrobat = New Acrobat.AcroApp
    End If
    appAcrobat.Show  ' This works fine
    If pdfForEdit Is Nothing Then
        Set pdfForEdit = New AcroAVDoc
    End If
    If pdfForEdit.Open(strPath, "Temp Title") Then
        Set avpPageForEdit = pdfForEdit.GetAVPageView
        avpPageForEdit.GoTo (89) ' Page 90 shows on the screen OK
    End If
End Sub

Public Function ProcessPDFpage() As String
    Dim strFileName As String
    Dim strActiveTool As String
    Dim lngNumAVDocs As Long
    Dim intFirst As Integer
    Dim intCount As Integer
    lngCurrentPageNum = avpPageForEdit.GetPageNum
    Set pdpPageForEdit = avpPageForEdit.GetPage
    Set pddDocumentForEdit = avpPageForEdit.GetDoc
    Set recCurrentRectangle = New AcroRect
    Set recCurrentRectangle = appAcrobat.GetFrame  ' Gives reasonable values for frame rectangle
    strActiveTool = appAcrobat.GetActiveTool  ' Tells me the Hand is active
    recCurrentRectangle.bottom = 100
    recCurrentRectangle.Top = 20
    recCurrentRectangle.Left = 62
    recCurrentRectangle.Right = 250
    strFileName = pddDocumentForEdit.getFileName  ' Gets the file name.
    lngNumAVDocs = appAcrobat.GetNumAVDocs   ' works
    If pdfForEdit.FindText("Name / Role Name", 0, 0, 0) Then
        pdfForEdit.ShowTextSelect ' Finds and hilites the text just fine.
    End If
    Set objHiliteList = New AcroHiliteList
    intFirst = 0
    intCount = 5
    objHiliteList.Add intFirst, intCount  ' Bombs on an unhandled exception
        Set pdtTextSelection = pdpPageForEdit.CreateWordHilite(objHiliteList)
        If Not (pdtTextSelection Is Nothing) Then
            If pdfForEdit.SetTextSelection(pdtTextSelection) Then
                pdfForEdit.ShowTextSelect
            End If
        End If
End Function
************End code

I know how to do this with XML, Excel, Word, and PowerPoint, but it would be really helpful to be able to process PDF files without converting them to Word.  As a backup I can use the convert table to Excel feature in Adobe Acrobat, but I really want this to be more integrated than that.
0
Comment
Question by:K3LJX
  • 2
3 Comments
 
LVL 26

Accepted Solution

by:
dannywareham earned 500 total points
ID: 16929031
I've not pulled data from pdf before.
However, I think I know a man that has:

http://www.experts-exchange.com/Databases/MS_Access/Q_20822119.html

:-)

Danny
0
 

Author Comment

by:K3LJX
ID: 16942365
It appears that there is no way to do what I want short of buying PDF Library.
0
 
LVL 26

Expert Comment

by:dannywareham
ID: 16942500
Sorry chap
:-(

0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The Visual Basic for Applications (VBA) language is at the heart of every application that you write. It is your key to taking Access beyond the world of wizards into a world where anything is possible. This article introduces you to…
Overview: This article:       (a) explains one principle method to cross-reference invoice items in Quickbooks®       (b) explores the reasons one might need to cross-reference invoice items       (c) provides a sample process for creating a M…
Familiarize people with the process of utilizing SQL Server views from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Access…
Familiarize people with the process of utilizing SQL Server stored procedures from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Micr…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question