Solved

Get Text from an Acrobat PDF file into MS-Access database using IAC or DOM interface

Posted on 2006-06-17
3
4,840 Views
Last Modified: 2012-08-14
I need to be able to parse a PDF file and get the data into MS-Access.  Ideally the user should be able to hilite some text or a table in the PDF, and make entries in an MS-Access form to say what is to be done, then push a button and have it happen.  How do I connect the connect the selection in the PDF with an AcroPDTextSelect object in the VBA?  The only way seems to be to set the selection from the VBA, but I can't figure out what to put in the AcroRect object to use AcroExch.PDDoc.CreateTextSelect.  When I try to build an AcroHiliteList object to use AcroPDPage.CreateWordHilite, it bombs on the .Add method.

The other alternative I have looked at is using the Accessibility (AcrobatAccess.lib) DOM interface, but I haven't found out how to attach the root IPDDomElement to a file.

Here is my test code.

***********Begin code
Option Compare Database
Option Explicit

' Copyright (c) 2006, Lyle Anderson  All rights reserved.
Dim appAcrobat As AcroApp
Dim strPDFfilePath As String
Dim pdfForEdit As AcroAVDoc
Dim avpPageForEdit As AcroAVPageView
Dim pddDocumentForEdit As AcroPDDoc
Dim avdDocumentForEdit As AcroAVDoc
Dim pdpPageForEdit As AcroPDPage
Dim pdtTextSelection As AcroPDTextSelect
Dim bolAcrobatForEditRunning As Boolean
Dim lngCurrentPageNum As Long
Dim recCurrentRectangle As AcroRect
Dim objHiliteList As AcroHiliteList
Public Sub readPDFdocument(varPath As Variant)
    Dim strPath As String
    If IsBlank(varPath) Then
        strPath = getPDFfilePath()
    Else
        strPath = setPDFfilePath(varPath)
    End If
    If appAcrobat Is Nothing Then
        Set appAcrobat = New Acrobat.AcroApp
    End If
    appAcrobat.Show  ' This works fine
    If pdfForEdit Is Nothing Then
        Set pdfForEdit = New AcroAVDoc
    End If
    If pdfForEdit.Open(strPath, "Temp Title") Then
        Set avpPageForEdit = pdfForEdit.GetAVPageView
        avpPageForEdit.GoTo (89) ' Page 90 shows on the screen OK
    End If
End Sub

Public Function ProcessPDFpage() As String
    Dim strFileName As String
    Dim strActiveTool As String
    Dim lngNumAVDocs As Long
    Dim intFirst As Integer
    Dim intCount As Integer
    lngCurrentPageNum = avpPageForEdit.GetPageNum
    Set pdpPageForEdit = avpPageForEdit.GetPage
    Set pddDocumentForEdit = avpPageForEdit.GetDoc
    Set recCurrentRectangle = New AcroRect
    Set recCurrentRectangle = appAcrobat.GetFrame  ' Gives reasonable values for frame rectangle
    strActiveTool = appAcrobat.GetActiveTool  ' Tells me the Hand is active
    recCurrentRectangle.bottom = 100
    recCurrentRectangle.Top = 20
    recCurrentRectangle.Left = 62
    recCurrentRectangle.Right = 250
    strFileName = pddDocumentForEdit.getFileName  ' Gets the file name.
    lngNumAVDocs = appAcrobat.GetNumAVDocs   ' works
    If pdfForEdit.FindText("Name / Role Name", 0, 0, 0) Then
        pdfForEdit.ShowTextSelect ' Finds and hilites the text just fine.
    End If
    Set objHiliteList = New AcroHiliteList
    intFirst = 0
    intCount = 5
    objHiliteList.Add intFirst, intCount  ' Bombs on an unhandled exception
        Set pdtTextSelection = pdpPageForEdit.CreateWordHilite(objHiliteList)
        If Not (pdtTextSelection Is Nothing) Then
            If pdfForEdit.SetTextSelection(pdtTextSelection) Then
                pdfForEdit.ShowTextSelect
            End If
        End If
End Function
************End code

I know how to do this with XML, Excel, Word, and PowerPoint, but it would be really helpful to be able to process PDF files without converting them to Word.  As a backup I can use the convert table to Excel feature in Adobe Acrobat, but I really want this to be more integrated than that.
0
Comment
Question by:K3LJX
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 26

Accepted Solution

by:
dannywareham earned 500 total points
ID: 16929031
I've not pulled data from pdf before.
However, I think I know a man that has:

http://www.experts-exchange.com/Databases/MS_Access/Q_20822119.html

:-)

Danny
0
 

Author Comment

by:K3LJX
ID: 16942365
It appears that there is no way to do what I want short of buying PDF Library.
0
 
LVL 26

Expert Comment

by:dannywareham
ID: 16942500
Sorry chap
:-(

0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Omit After Update event 5 45
Outlook VBA question - How do I mark a message as read 2 59
Microsoft Access report help 4 37
ms access - Helpdesk ticket system 5 51
A simple tool to export all objects of two Access files as text and compare it with Meld, a free diff tool.
You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
Familiarize people with the process of utilizing SQL Server views from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Access…
In Microsoft Access, learn how to “cascade” or have the displayed data of one combo control depend upon what’s entered in another. Base the dependent combo on a query for its row source: Add a reference to the first combo on the form as criteria i…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question