Solved

Using VBScript to Read data within PDF in order to name file.

Posted on 2011-02-20
2
12,630 Views
Last Modified: 2012-05-11
Experts,
I am hoping you can help me with an issue.  I have about 200 single page invoices that are all in PDF format, each unqiue with an invoice#, person's name, etc.  They come out of access as one document, I am able to split them into single pages very easily using adobe pro, but of course it just names them, invoice_1, invoice_2, invoice_3, etc.....

My goal is to rename each file to the actual invoice#, by parsing the PDF file and pulling the invoice#.  I have been able to find code that will do a "find" within a PDF (which I will attach).  However, I will have no idea what the invoice number will be so, that won't work.  My hope as to find the text "Invoice#" and then pull the 8 characters to the right of that, thus giving me the invoice# that I need and then use vbscript to rename the file, which I already know how to do.  I just can't seem to figure out how to do the text manipulation within vbs with the Adobe object.  If it were TextStream Object it would be a piece of cake.  I know there are a few ways to convert the PDF to text and then read it, but I was hoping there was a simpler way.  Also, if I could do it without actually having adobe open on my screen that would be a bonus.  Another point, I have access to Adobe Pro and Adobe reader, either is fine.

Any help on this is much appreciated.  It is my first time working with a PDF in VBScript.

Thanks,
Mike
Option Explicit
Dim accapp, acavdocu
Dim pdf_path, bReset, Wrd_count
pdf_path="C:\LS\Test\Invoices\02_2011_PDF\rpt_Invoice_1.pdf"
'AcroExch is acrobat application object
Set accapp=CreateObject("AcroExch.App")
accapp.Show()

'Need to create one AVDoc object par displayed document
Set acavdocu=CreateObject("AcroExch.AVDoc")

'Opening the PDF
If acavdocu.Open(pdf_path,"") Then
acavdocu.BringToFront()
bReset=1 : Wrd_count = 0
'Find Text Finds the specified text, scrolls so that it is visible, and highlights it
Do While acavdocu.FindText("Invoice#", 1, 1, bReset)
bReset=0 : Wrd_count=Wrd_count+1
'Wait 0, 200
Loop
End If

accapp.CloseAllDocs()
accapp.Exit()
msgbox "The word 'Invoice#' was found " & Wrd_count & "times"
Set accap=nothing : Set accapp=nothing

Open in new window

0
Comment
Question by:uconnfb13
  • 2
2 Comments
 
LVL 1

Accepted Solution

by:
dev00790 earned 500 total points
ID: 34938139
From: http://www.eggheadcafe.com/software/aspnet/33958218/reading-pdf-file.aspx
-----------------------------

In Excel:

Go into the VBA IDE (Alt-F11)
Go into Tools->References
Check all the Adobe Libraries

I have:
Adobe Acrobat 7.0 Browser Control Type Library 1.0
Adobe Acrobat 7.0 Type Library

Go into Object Browser

See if you can get a VBA Sub going that looks like this:

 
Sub SearchPDF()
Set a = New AcroAVDoc
a.Open("C:\mypdf.pdf")
Set ln = New Long(1)
b =  a.FindText("SearchTextString",ln,ln) 'b is a boolean
MsgBox CStr(b)
End Sub

Open in new window


*IF* you ever get that to work - the arguments to FindText are
undocumented - the next step is to translate this into VBScript -

Someone might be able to help you here with another post.
You'd need to convert this VBA:

Set a = New AcroAVDoc

'into VBScript that might look like this:

 
Set a = CreateObject("AcroAVDoc")
Set a = CreateObject("Adobe Acrobat 7.0")

Open in new window


Opps. I forget to tell you what to do once you get to object browser.

You probably figured that out...

In the VBA IDE select View->Object Browser

In the drop down in the middle of the page where it says <All
Libraries> select Acrobat

Peruse the objects.

For example, click AcroAVDoc - and you see the method FindText.
0
 
LVL 1

Expert Comment

by:dev00790
ID: 34938142
Hope that helps
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever come up with a need of emailing only few pages of PDF file to one of yourfriend or colleague, instead of whole Adobe file? If yes, then surely you have face problems in doing that! Read this section as I have suggested multiple solutio…
PaperPort is a popular document imaging/management product from Nuance Communications (http://www.nuance.com/). It is in widespread use by both individuals (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) and businesses (http:/…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question