Solved

Get Text from an Acrobat PDF file into MS-Access database using IAC or DOM interface

Posted on 2006-06-17
3
4,771 Views
Last Modified: 2012-08-14
I need to be able to parse a PDF file and get the data into MS-Access.  Ideally the user should be able to hilite some text or a table in the PDF, and make entries in an MS-Access form to say what is to be done, then push a button and have it happen.  How do I connect the connect the selection in the PDF with an AcroPDTextSelect object in the VBA?  The only way seems to be to set the selection from the VBA, but I can't figure out what to put in the AcroRect object to use AcroExch.PDDoc.CreateTextSelect.  When I try to build an AcroHiliteList object to use AcroPDPage.CreateWordHilite, it bombs on the .Add method.

The other alternative I have looked at is using the Accessibility (AcrobatAccess.lib) DOM interface, but I haven't found out how to attach the root IPDDomElement to a file.

Here is my test code.

***********Begin code
Option Compare Database
Option Explicit

' Copyright (c) 2006, Lyle Anderson  All rights reserved.
Dim appAcrobat As AcroApp
Dim strPDFfilePath As String
Dim pdfForEdit As AcroAVDoc
Dim avpPageForEdit As AcroAVPageView
Dim pddDocumentForEdit As AcroPDDoc
Dim avdDocumentForEdit As AcroAVDoc
Dim pdpPageForEdit As AcroPDPage
Dim pdtTextSelection As AcroPDTextSelect
Dim bolAcrobatForEditRunning As Boolean
Dim lngCurrentPageNum As Long
Dim recCurrentRectangle As AcroRect
Dim objHiliteList As AcroHiliteList
Public Sub readPDFdocument(varPath As Variant)
    Dim strPath As String
    If IsBlank(varPath) Then
        strPath = getPDFfilePath()
    Else
        strPath = setPDFfilePath(varPath)
    End If
    If appAcrobat Is Nothing Then
        Set appAcrobat = New Acrobat.AcroApp
    End If
    appAcrobat.Show  ' This works fine
    If pdfForEdit Is Nothing Then
        Set pdfForEdit = New AcroAVDoc
    End If
    If pdfForEdit.Open(strPath, "Temp Title") Then
        Set avpPageForEdit = pdfForEdit.GetAVPageView
        avpPageForEdit.GoTo (89) ' Page 90 shows on the screen OK
    End If
End Sub

Public Function ProcessPDFpage() As String
    Dim strFileName As String
    Dim strActiveTool As String
    Dim lngNumAVDocs As Long
    Dim intFirst As Integer
    Dim intCount As Integer
    lngCurrentPageNum = avpPageForEdit.GetPageNum
    Set pdpPageForEdit = avpPageForEdit.GetPage
    Set pddDocumentForEdit = avpPageForEdit.GetDoc
    Set recCurrentRectangle = New AcroRect
    Set recCurrentRectangle = appAcrobat.GetFrame  ' Gives reasonable values for frame rectangle
    strActiveTool = appAcrobat.GetActiveTool  ' Tells me the Hand is active
    recCurrentRectangle.bottom = 100
    recCurrentRectangle.Top = 20
    recCurrentRectangle.Left = 62
    recCurrentRectangle.Right = 250
    strFileName = pddDocumentForEdit.getFileName  ' Gets the file name.
    lngNumAVDocs = appAcrobat.GetNumAVDocs   ' works
    If pdfForEdit.FindText("Name / Role Name", 0, 0, 0) Then
        pdfForEdit.ShowTextSelect ' Finds and hilites the text just fine.
    End If
    Set objHiliteList = New AcroHiliteList
    intFirst = 0
    intCount = 5
    objHiliteList.Add intFirst, intCount  ' Bombs on an unhandled exception
        Set pdtTextSelection = pdpPageForEdit.CreateWordHilite(objHiliteList)
        If Not (pdtTextSelection Is Nothing) Then
            If pdfForEdit.SetTextSelection(pdtTextSelection) Then
                pdfForEdit.ShowTextSelect
            End If
        End If
End Function
************End code

I know how to do this with XML, Excel, Word, and PowerPoint, but it would be really helpful to be able to process PDF files without converting them to Word.  As a backup I can use the convert table to Excel feature in Adobe Acrobat, but I really want this to be more integrated than that.
0
Comment
Question by:K3LJX
  • 2
3 Comments
 
LVL 26

Accepted Solution

by:
dannywareham earned 500 total points
ID: 16929031
I've not pulled data from pdf before.
However, I think I know a man that has:

http://www.experts-exchange.com/Databases/MS_Access/Q_20822119.html

:-)

Danny
0
 

Author Comment

by:K3LJX
ID: 16942365
It appears that there is no way to do what I want short of buying PDF Library.
0
 
LVL 26

Expert Comment

by:dannywareham
ID: 16942500
Sorry chap
:-(

0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

In Debugging – Part 1, you learned the basics of the debugging process. You learned how to avoid bugs, as well as how to utilize the Immediate window in the debugging process. This article takes things to the next level by showing you how you can us…
A simple tool to export all objects of two Access files as text and compare it with Meld, a free diff tool.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Learn how to number pages in an Access report over each group. Activate two pass printing by referencing the pages property: Add code to the Page Footers OnFormat event to capture the pages as there occur for each group. Use the pages property to …

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now