?
Solved

Get Text from an Acrobat PDF file into MS-Access database using IAC or DOM interface

Posted on 2006-06-17
3
Medium Priority
?
4,876 Views
Last Modified: 2012-08-14
I need to be able to parse a PDF file and get the data into MS-Access.  Ideally the user should be able to hilite some text or a table in the PDF, and make entries in an MS-Access form to say what is to be done, then push a button and have it happen.  How do I connect the connect the selection in the PDF with an AcroPDTextSelect object in the VBA?  The only way seems to be to set the selection from the VBA, but I can't figure out what to put in the AcroRect object to use AcroExch.PDDoc.CreateTextSelect.  When I try to build an AcroHiliteList object to use AcroPDPage.CreateWordHilite, it bombs on the .Add method.

The other alternative I have looked at is using the Accessibility (AcrobatAccess.lib) DOM interface, but I haven't found out how to attach the root IPDDomElement to a file.

Here is my test code.

***********Begin code
Option Compare Database
Option Explicit

' Copyright (c) 2006, Lyle Anderson  All rights reserved.
Dim appAcrobat As AcroApp
Dim strPDFfilePath As String
Dim pdfForEdit As AcroAVDoc
Dim avpPageForEdit As AcroAVPageView
Dim pddDocumentForEdit As AcroPDDoc
Dim avdDocumentForEdit As AcroAVDoc
Dim pdpPageForEdit As AcroPDPage
Dim pdtTextSelection As AcroPDTextSelect
Dim bolAcrobatForEditRunning As Boolean
Dim lngCurrentPageNum As Long
Dim recCurrentRectangle As AcroRect
Dim objHiliteList As AcroHiliteList
Public Sub readPDFdocument(varPath As Variant)
    Dim strPath As String
    If IsBlank(varPath) Then
        strPath = getPDFfilePath()
    Else
        strPath = setPDFfilePath(varPath)
    End If
    If appAcrobat Is Nothing Then
        Set appAcrobat = New Acrobat.AcroApp
    End If
    appAcrobat.Show  ' This works fine
    If pdfForEdit Is Nothing Then
        Set pdfForEdit = New AcroAVDoc
    End If
    If pdfForEdit.Open(strPath, "Temp Title") Then
        Set avpPageForEdit = pdfForEdit.GetAVPageView
        avpPageForEdit.GoTo (89) ' Page 90 shows on the screen OK
    End If
End Sub

Public Function ProcessPDFpage() As String
    Dim strFileName As String
    Dim strActiveTool As String
    Dim lngNumAVDocs As Long
    Dim intFirst As Integer
    Dim intCount As Integer
    lngCurrentPageNum = avpPageForEdit.GetPageNum
    Set pdpPageForEdit = avpPageForEdit.GetPage
    Set pddDocumentForEdit = avpPageForEdit.GetDoc
    Set recCurrentRectangle = New AcroRect
    Set recCurrentRectangle = appAcrobat.GetFrame  ' Gives reasonable values for frame rectangle
    strActiveTool = appAcrobat.GetActiveTool  ' Tells me the Hand is active
    recCurrentRectangle.bottom = 100
    recCurrentRectangle.Top = 20
    recCurrentRectangle.Left = 62
    recCurrentRectangle.Right = 250
    strFileName = pddDocumentForEdit.getFileName  ' Gets the file name.
    lngNumAVDocs = appAcrobat.GetNumAVDocs   ' works
    If pdfForEdit.FindText("Name / Role Name", 0, 0, 0) Then
        pdfForEdit.ShowTextSelect ' Finds and hilites the text just fine.
    End If
    Set objHiliteList = New AcroHiliteList
    intFirst = 0
    intCount = 5
    objHiliteList.Add intFirst, intCount  ' Bombs on an unhandled exception
        Set pdtTextSelection = pdpPageForEdit.CreateWordHilite(objHiliteList)
        If Not (pdtTextSelection Is Nothing) Then
            If pdfForEdit.SetTextSelection(pdtTextSelection) Then
                pdfForEdit.ShowTextSelect
            End If
        End If
End Function
************End code

I know how to do this with XML, Excel, Word, and PowerPoint, but it would be really helpful to be able to process PDF files without converting them to Word.  As a backup I can use the convert table to Excel feature in Adobe Acrobat, but I really want this to be more integrated than that.
0
Comment
Question by:K3LJX
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 26

Accepted Solution

by:
dannywareham earned 1500 total points
ID: 16929031
I've not pulled data from pdf before.
However, I think I know a man that has:

http://www.experts-exchange.com/Databases/MS_Access/Q_20822119.html

:-)

Danny
0
 

Author Comment

by:K3LJX
ID: 16942365
It appears that there is no way to do what I want short of buying PDF Library.
0
 
LVL 26

Expert Comment

by:dannywareham
ID: 16942500
Sorry chap
:-(

0

Featured Post

How Blockchain Is Impacting Every Industry

Blockchain expert Alex Tapscott talks to Acronis VP Frank Jablonski about this revolutionary technology and how it's making inroads into other industries and facets of everyday life.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s been over a month into 2017, and there is already a sophisticated Gmail phishing email making it rounds. New techniques and tactics, have given hackers a way to authentically impersonate your contacts.How it Works The attack works by targeti…
In Part II of this series, I will discuss how to identify all open instances of Excel and enumerate the workbooks, spreadsheets, and named ranges within each of those instances.
Using Microsoft Access, learn some simple rules for how to construct tables in a relational database. Split up all multi-value fields into single values: Split up fields that belong to other things into separate tables: Make sure that all record…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question