Go Premium for a chance to win a PS4. Enter to Win

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 14935
  • Last Modified:

Using VBScript to Read data within PDF in order to name file.

I am hoping you can help me with an issue.  I have about 200 single page invoices that are all in PDF format, each unqiue with an invoice#, person's name, etc.  They come out of access as one document, I am able to split them into single pages very easily using adobe pro, but of course it just names them, invoice_1, invoice_2, invoice_3, etc.....

My goal is to rename each file to the actual invoice#, by parsing the PDF file and pulling the invoice#.  I have been able to find code that will do a "find" within a PDF (which I will attach).  However, I will have no idea what the invoice number will be so, that won't work.  My hope as to find the text "Invoice#" and then pull the 8 characters to the right of that, thus giving me the invoice# that I need and then use vbscript to rename the file, which I already know how to do.  I just can't seem to figure out how to do the text manipulation within vbs with the Adobe object.  If it were TextStream Object it would be a piece of cake.  I know there are a few ways to convert the PDF to text and then read it, but I was hoping there was a simpler way.  Also, if I could do it without actually having adobe open on my screen that would be a bonus.  Another point, I have access to Adobe Pro and Adobe reader, either is fine.

Any help on this is much appreciated.  It is my first time working with a PDF in VBScript.

Option Explicit
Dim accapp, acavdocu
Dim pdf_path, bReset, Wrd_count
'AcroExch is acrobat application object
Set accapp=CreateObject("AcroExch.App")

'Need to create one AVDoc object par displayed document
Set acavdocu=CreateObject("AcroExch.AVDoc")

'Opening the PDF
If acavdocu.Open(pdf_path,"") Then
bReset=1 : Wrd_count = 0
'Find Text Finds the specified text, scrolls so that it is visible, and highlights it
Do While acavdocu.FindText("Invoice#", 1, 1, bReset)
bReset=0 : Wrd_count=Wrd_count+1
'Wait 0, 200
End If

msgbox "The word 'Invoice#' was found " & Wrd_count & "times"
Set accap=nothing : Set accapp=nothing

Open in new window

  • 2
1 Solution
From: http://www.eggheadcafe.com/software/aspnet/33958218/reading-pdf-file.aspx

In Excel:

Go into the VBA IDE (Alt-F11)
Go into Tools->References
Check all the Adobe Libraries

I have:
Adobe Acrobat 7.0 Browser Control Type Library 1.0
Adobe Acrobat 7.0 Type Library

Go into Object Browser

See if you can get a VBA Sub going that looks like this:

Sub SearchPDF()
Set a = New AcroAVDoc
Set ln = New Long(1)
b =  a.FindText("SearchTextString",ln,ln) 'b is a boolean
MsgBox CStr(b)
End Sub

Open in new window

*IF* you ever get that to work - the arguments to FindText are
undocumented - the next step is to translate this into VBScript -

Someone might be able to help you here with another post.
You'd need to convert this VBA:

Set a = New AcroAVDoc

'into VBScript that might look like this:

Set a = CreateObject("AcroAVDoc")
Set a = CreateObject("Adobe Acrobat 7.0")

Open in new window

Opps. I forget to tell you what to do once you get to object browser.

You probably figured that out...

In the VBA IDE select View->Object Browser

In the drop down in the middle of the page where it says <All
Libraries> select Acrobat

Peruse the objects.

For example, click AcroAVDoc - and you see the method FindText.
Hope that helps

Featured Post


Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now