Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Using VBScript to Read data within PDF in order to name file.

Posted on 2011-02-20
2
Medium Priority
?
14,460 Views
Last Modified: 2012-05-11
Experts,
I am hoping you can help me with an issue.  I have about 200 single page invoices that are all in PDF format, each unqiue with an invoice#, person's name, etc.  They come out of access as one document, I am able to split them into single pages very easily using adobe pro, but of course it just names them, invoice_1, invoice_2, invoice_3, etc.....

My goal is to rename each file to the actual invoice#, by parsing the PDF file and pulling the invoice#.  I have been able to find code that will do a "find" within a PDF (which I will attach).  However, I will have no idea what the invoice number will be so, that won't work.  My hope as to find the text "Invoice#" and then pull the 8 characters to the right of that, thus giving me the invoice# that I need and then use vbscript to rename the file, which I already know how to do.  I just can't seem to figure out how to do the text manipulation within vbs with the Adobe object.  If it were TextStream Object it would be a piece of cake.  I know there are a few ways to convert the PDF to text and then read it, but I was hoping there was a simpler way.  Also, if I could do it without actually having adobe open on my screen that would be a bonus.  Another point, I have access to Adobe Pro and Adobe reader, either is fine.

Any help on this is much appreciated.  It is my first time working with a PDF in VBScript.

Thanks,
Mike
Option Explicit
Dim accapp, acavdocu
Dim pdf_path, bReset, Wrd_count
pdf_path="C:\LS\Test\Invoices\02_2011_PDF\rpt_Invoice_1.pdf"
'AcroExch is acrobat application object
Set accapp=CreateObject("AcroExch.App")
accapp.Show()

'Need to create one AVDoc object par displayed document
Set acavdocu=CreateObject("AcroExch.AVDoc")

'Opening the PDF
If acavdocu.Open(pdf_path,"") Then
acavdocu.BringToFront()
bReset=1 : Wrd_count = 0
'Find Text Finds the specified text, scrolls so that it is visible, and highlights it
Do While acavdocu.FindText("Invoice#", 1, 1, bReset)
bReset=0 : Wrd_count=Wrd_count+1
'Wait 0, 200
Loop
End If

accapp.CloseAllDocs()
accapp.Exit()
msgbox "The word 'Invoice#' was found " & Wrd_count & "times"
Set accap=nothing : Set accapp=nothing

Open in new window

0
Comment
Question by:uconnfb13
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
2 Comments
 
LVL 1

Accepted Solution

by:
dev00790 earned 2000 total points
ID: 34938139
From: http://www.eggheadcafe.com/software/aspnet/33958218/reading-pdf-file.aspx
-----------------------------

In Excel:

Go into the VBA IDE (Alt-F11)
Go into Tools->References
Check all the Adobe Libraries

I have:
Adobe Acrobat 7.0 Browser Control Type Library 1.0
Adobe Acrobat 7.0 Type Library

Go into Object Browser

See if you can get a VBA Sub going that looks like this:

 
Sub SearchPDF()
Set a = New AcroAVDoc
a.Open("C:\mypdf.pdf")
Set ln = New Long(1)
b =  a.FindText("SearchTextString",ln,ln) 'b is a boolean
MsgBox CStr(b)
End Sub

Open in new window


*IF* you ever get that to work - the arguments to FindText are
undocumented - the next step is to translate this into VBScript -

Someone might be able to help you here with another post.
You'd need to convert this VBA:

Set a = New AcroAVDoc

'into VBScript that might look like this:

 
Set a = CreateObject("AcroAVDoc")
Set a = CreateObject("Adobe Acrobat 7.0")

Open in new window


Opps. I forget to tell you what to do once you get to object browser.

You probably figured that out...

In the VBA IDE select View->Object Browser

In the drop down in the middle of the page where it says <All
Libraries> select Acrobat

Peruse the objects.

For example, click AcroAVDoc - and you see the method FindText.
0
 
LVL 1

Expert Comment

by:dev00790
ID: 34938142
Hope that helps
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Over the years I have built up my own little library of code snippets that I refer to when programming or writing a script.  Many of these have come from the web or adaptations from snippets I find on the Web.  Periodically I add to them when I come…
Update 21-May-2015: I temporarily removed the source code to make major changes to the program. Regards, Joe In a previous Experts Exchange article, How To Rename-Move a Batch of PDF Files Based on Contents of the Files (http://www.experts-exchan…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…
Suggested Courses

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question