Solved

Using VBScript to Read data within PDF in order to name file.

Posted on 2011-02-20
2
12,065 Views
Last Modified: 2012-05-11
Experts,
I am hoping you can help me with an issue.  I have about 200 single page invoices that are all in PDF format, each unqiue with an invoice#, person's name, etc.  They come out of access as one document, I am able to split them into single pages very easily using adobe pro, but of course it just names them, invoice_1, invoice_2, invoice_3, etc.....

My goal is to rename each file to the actual invoice#, by parsing the PDF file and pulling the invoice#.  I have been able to find code that will do a "find" within a PDF (which I will attach).  However, I will have no idea what the invoice number will be so, that won't work.  My hope as to find the text "Invoice#" and then pull the 8 characters to the right of that, thus giving me the invoice# that I need and then use vbscript to rename the file, which I already know how to do.  I just can't seem to figure out how to do the text manipulation within vbs with the Adobe object.  If it were TextStream Object it would be a piece of cake.  I know there are a few ways to convert the PDF to text and then read it, but I was hoping there was a simpler way.  Also, if I could do it without actually having adobe open on my screen that would be a bonus.  Another point, I have access to Adobe Pro and Adobe reader, either is fine.

Any help on this is much appreciated.  It is my first time working with a PDF in VBScript.

Thanks,
Mike
Option Explicit
Dim accapp, acavdocu
Dim pdf_path, bReset, Wrd_count
pdf_path="C:\LS\Test\Invoices\02_2011_PDF\rpt_Invoice_1.pdf"
'AcroExch is acrobat application object
Set accapp=CreateObject("AcroExch.App")
accapp.Show()

'Need to create one AVDoc object par displayed document
Set acavdocu=CreateObject("AcroExch.AVDoc")

'Opening the PDF
If acavdocu.Open(pdf_path,"") Then
acavdocu.BringToFront()
bReset=1 : Wrd_count = 0
'Find Text Finds the specified text, scrolls so that it is visible, and highlights it
Do While acavdocu.FindText("Invoice#", 1, 1, bReset)
bReset=0 : Wrd_count=Wrd_count+1
'Wait 0, 200
Loop
End If

accapp.CloseAllDocs()
accapp.Exit()
msgbox "The word 'Invoice#' was found " & Wrd_count & "times"
Set accap=nothing : Set accapp=nothing

Open in new window

0
Comment
Question by:uconnfb13
  • 2
2 Comments
 
LVL 1

Accepted Solution

by:
dev00790 earned 500 total points
ID: 34938139
From: http://www.eggheadcafe.com/software/aspnet/33958218/reading-pdf-file.aspx
-----------------------------

In Excel:

Go into the VBA IDE (Alt-F11)
Go into Tools->References
Check all the Adobe Libraries

I have:
Adobe Acrobat 7.0 Browser Control Type Library 1.0
Adobe Acrobat 7.0 Type Library

Go into Object Browser

See if you can get a VBA Sub going that looks like this:

 
Sub SearchPDF()
Set a = New AcroAVDoc
a.Open("C:\mypdf.pdf")
Set ln = New Long(1)
b =  a.FindText("SearchTextString",ln,ln) 'b is a boolean
MsgBox CStr(b)
End Sub

Open in new window


*IF* you ever get that to work - the arguments to FindText are
undocumented - the next step is to translate this into VBScript -

Someone might be able to help you here with another post.
You'd need to convert this VBA:

Set a = New AcroAVDoc

'into VBScript that might look like this:

 
Set a = CreateObject("AcroAVDoc")
Set a = CreateObject("Adobe Acrobat 7.0")

Open in new window


Opps. I forget to tell you what to do once you get to object browser.

You probably figured that out...

In the VBA IDE select View->Object Browser

In the drop down in the middle of the page where it says <All
Libraries> select Acrobat

Peruse the objects.

For example, click AcroAVDoc - and you see the method FindText.
0
 
LVL 1

Expert Comment

by:dev00790
ID: 34938142
Hope that helps
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe INTRODUCTION This Article is a follow-up to the Article entitled How To Rename-Move a Batch of PDF Files Based o…
Not long ago I saw a question in the VB Script forum that I thought would not take much time. You can read that question (Question ID  (http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_28455246.html)28455246) Here (http…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader produc…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now