• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 5022
  • Last Modified:

Get Text from an Acrobat PDF file into MS-Access database using IAC or DOM interface

I need to be able to parse a PDF file and get the data into MS-Access.  Ideally the user should be able to hilite some text or a table in the PDF, and make entries in an MS-Access form to say what is to be done, then push a button and have it happen.  How do I connect the connect the selection in the PDF with an AcroPDTextSelect object in the VBA?  The only way seems to be to set the selection from the VBA, but I can't figure out what to put in the AcroRect object to use AcroExch.PDDoc.CreateTextSelect.  When I try to build an AcroHiliteList object to use AcroPDPage.CreateWordHilite, it bombs on the .Add method.

The other alternative I have looked at is using the Accessibility (AcrobatAccess.lib) DOM interface, but I haven't found out how to attach the root IPDDomElement to a file.

Here is my test code.

***********Begin code
Option Compare Database
Option Explicit

' Copyright (c) 2006, Lyle Anderson  All rights reserved.
Dim appAcrobat As AcroApp
Dim strPDFfilePath As String
Dim pdfForEdit As AcroAVDoc
Dim avpPageForEdit As AcroAVPageView
Dim pddDocumentForEdit As AcroPDDoc
Dim avdDocumentForEdit As AcroAVDoc
Dim pdpPageForEdit As AcroPDPage
Dim pdtTextSelection As AcroPDTextSelect
Dim bolAcrobatForEditRunning As Boolean
Dim lngCurrentPageNum As Long
Dim recCurrentRectangle As AcroRect
Dim objHiliteList As AcroHiliteList
Public Sub readPDFdocument(varPath As Variant)
    Dim strPath As String
    If IsBlank(varPath) Then
        strPath = getPDFfilePath()
    Else
        strPath = setPDFfilePath(varPath)
    End If
    If appAcrobat Is Nothing Then
        Set appAcrobat = New Acrobat.AcroApp
    End If
    appAcrobat.Show  ' This works fine
    If pdfForEdit Is Nothing Then
        Set pdfForEdit = New AcroAVDoc
    End If
    If pdfForEdit.Open(strPath, "Temp Title") Then
        Set avpPageForEdit = pdfForEdit.GetAVPageView
        avpPageForEdit.GoTo (89) ' Page 90 shows on the screen OK
    End If
End Sub

Public Function ProcessPDFpage() As String
    Dim strFileName As String
    Dim strActiveTool As String
    Dim lngNumAVDocs As Long
    Dim intFirst As Integer
    Dim intCount As Integer
    lngCurrentPageNum = avpPageForEdit.GetPageNum
    Set pdpPageForEdit = avpPageForEdit.GetPage
    Set pddDocumentForEdit = avpPageForEdit.GetDoc
    Set recCurrentRectangle = New AcroRect
    Set recCurrentRectangle = appAcrobat.GetFrame  ' Gives reasonable values for frame rectangle
    strActiveTool = appAcrobat.GetActiveTool  ' Tells me the Hand is active
    recCurrentRectangle.bottom = 100
    recCurrentRectangle.Top = 20
    recCurrentRectangle.Left = 62
    recCurrentRectangle.Right = 250
    strFileName = pddDocumentForEdit.getFileName  ' Gets the file name.
    lngNumAVDocs = appAcrobat.GetNumAVDocs   ' works
    If pdfForEdit.FindText("Name / Role Name", 0, 0, 0) Then
        pdfForEdit.ShowTextSelect ' Finds and hilites the text just fine.
    End If
    Set objHiliteList = New AcroHiliteList
    intFirst = 0
    intCount = 5
    objHiliteList.Add intFirst, intCount  ' Bombs on an unhandled exception
        Set pdtTextSelection = pdpPageForEdit.CreateWordHilite(objHiliteList)
        If Not (pdtTextSelection Is Nothing) Then
            If pdfForEdit.SetTextSelection(pdtTextSelection) Then
                pdfForEdit.ShowTextSelect
            End If
        End If
End Function
************End code

I know how to do this with XML, Excel, Word, and PowerPoint, but it would be really helpful to be able to process PDF files without converting them to Word.  As a backup I can use the convert table to Excel feature in Adobe Acrobat, but I really want this to be more integrated than that.
0
K3LJX
Asked:
K3LJX
  • 2
1 Solution
 
dannywarehamCommented:
I've not pulled data from pdf before.
However, I think I know a man that has:

http://www.experts-exchange.com/Databases/MS_Access/Q_20822119.html

:-)

Danny
0
 
K3LJXAuthor Commented:
It appears that there is no way to do what I want short of buying PDF Library.
0
 
dannywarehamCommented:
Sorry chap
:-(

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now