Solved

Using VBScript to Read data within PDF in order to name file.

Posted on 2011-02-20
2
12,880 Views
Last Modified: 2012-05-11
Experts,
I am hoping you can help me with an issue.  I have about 200 single page invoices that are all in PDF format, each unqiue with an invoice#, person's name, etc.  They come out of access as one document, I am able to split them into single pages very easily using adobe pro, but of course it just names them, invoice_1, invoice_2, invoice_3, etc.....

My goal is to rename each file to the actual invoice#, by parsing the PDF file and pulling the invoice#.  I have been able to find code that will do a "find" within a PDF (which I will attach).  However, I will have no idea what the invoice number will be so, that won't work.  My hope as to find the text "Invoice#" and then pull the 8 characters to the right of that, thus giving me the invoice# that I need and then use vbscript to rename the file, which I already know how to do.  I just can't seem to figure out how to do the text manipulation within vbs with the Adobe object.  If it were TextStream Object it would be a piece of cake.  I know there are a few ways to convert the PDF to text and then read it, but I was hoping there was a simpler way.  Also, if I could do it without actually having adobe open on my screen that would be a bonus.  Another point, I have access to Adobe Pro and Adobe reader, either is fine.

Any help on this is much appreciated.  It is my first time working with a PDF in VBScript.

Thanks,
Mike
Option Explicit
Dim accapp, acavdocu
Dim pdf_path, bReset, Wrd_count
pdf_path="C:\LS\Test\Invoices\02_2011_PDF\rpt_Invoice_1.pdf"
'AcroExch is acrobat application object
Set accapp=CreateObject("AcroExch.App")
accapp.Show()

'Need to create one AVDoc object par displayed document
Set acavdocu=CreateObject("AcroExch.AVDoc")

'Opening the PDF
If acavdocu.Open(pdf_path,"") Then
acavdocu.BringToFront()
bReset=1 : Wrd_count = 0
'Find Text Finds the specified text, scrolls so that it is visible, and highlights it
Do While acavdocu.FindText("Invoice#", 1, 1, bReset)
bReset=0 : Wrd_count=Wrd_count+1
'Wait 0, 200
Loop
End If

accapp.CloseAllDocs()
accapp.Exit()
msgbox "The word 'Invoice#' was found " & Wrd_count & "times"
Set accap=nothing : Set accapp=nothing

Open in new window

0
Comment
Question by:uconnfb13
  • 2
2 Comments
 
LVL 1

Accepted Solution

by:
dev00790 earned 500 total points
ID: 34938139
From: http://www.eggheadcafe.com/software/aspnet/33958218/reading-pdf-file.aspx
-----------------------------

In Excel:

Go into the VBA IDE (Alt-F11)
Go into Tools->References
Check all the Adobe Libraries

I have:
Adobe Acrobat 7.0 Browser Control Type Library 1.0
Adobe Acrobat 7.0 Type Library

Go into Object Browser

See if you can get a VBA Sub going that looks like this:

 
Sub SearchPDF()
Set a = New AcroAVDoc
a.Open("C:\mypdf.pdf")
Set ln = New Long(1)
b =  a.FindText("SearchTextString",ln,ln) 'b is a boolean
MsgBox CStr(b)
End Sub

Open in new window


*IF* you ever get that to work - the arguments to FindText are
undocumented - the next step is to translate this into VBScript -

Someone might be able to help you here with another post.
You'd need to convert this VBA:

Set a = New AcroAVDoc

'into VBScript that might look like this:

 
Set a = CreateObject("AcroAVDoc")
Set a = CreateObject("Adobe Acrobat 7.0")

Open in new window


Opps. I forget to tell you what to do once you get to object browser.

You probably figured that out...

In the VBA IDE select View->Object Browser

In the drop down in the middle of the page where it says <All
Libraries> select Acrobat

Peruse the objects.

For example, click AcroAVDoc - and you see the method FindText.
0
 
LVL 1

Expert Comment

by:dev00790
ID: 34938142
Hope that helps
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is the result of a quest to better understand Task Scheduler 2.0 and all the newer objects available in vbscript in this version over  the limited options we had scripting in Task Scheduler 1.0.  As I started my journey of knowledge I f…
This article is in response to a question here (http://www.experts-exchange.com/Other/URLs/Q_28283850.html) at Experts Exchange. The Original Poster has a scanned signature and wants to make the background transparent so that the signature may be pl…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question