Link to home
Start Free TrialLog in
Avatar of rsburge
rsburgeFlag for United States of America

asked on

Search for a Time in a PDF using VBA

Hi Experts - I am using the attached function in an Access database to search for today's date in a PDF.  I need help adjusting this code (or writing a new function) to search for today's date and a time and store the time in 24 hour format in a variable labled "pdfTime".  I will be using this variable to compare it to the current database time.

For example, if the date/time are showing in the PDF in any of the following formats, I need to make sure the date is today's date and then store the time in the variable "pdfTime" in the 24 hour format.  

10/4/2011 9:25 PM
10/4/11 9:25 PM
10/4/2011 21:25
10/4/11 21:25

The end purpose of the function I need is to compare the current date/time in a PDF to the current database time and return False if the date/time in the PDF are less than the current database date/time.
Public Function PDFdatecheck() As Boolean
SDdrive = Forms("frmFirewall").txtSDdriveLetter.Value
'-----CHECK PDF DATE-----'
    'IAC objects
    'DoCmd.Hourglass True
    Dim gAvDoc As Object
    
    'variables
    Dim Resp 'For message box responses
    Dim gPDFPath As String
    Dim sText1 As String 'String to search for
    Dim sText2 As String 'String to search for
    Dim sText3 As String 'String to search for
    Dim sText4 As String 'String to search for
    Dim sText5 As String 'String to search for
    Dim sText6 As String 'String to search for
    Dim sText7 As String 'String to search for
    Dim sText8 As String 'String to search for
    Dim sText9 As String 'String to search for
    Dim sText10 As String 'String to search for
    Dim sText11 As String 'String to search for
    Dim sText12 As String 'String to search for
    Dim sText13 As String 'String to search for
    Dim sStr As String 'Message string
    Dim foundText1 As Integer 'Holds return value from "FindText" method
    Dim foundText2 As Integer 'Holds return value from "FindText" method
    Dim foundText3 As Integer 'Holds return value from "FindText" method
    Dim foundText4 As Integer 'Holds return value from "FindText" method
    Dim foundText5 As Integer 'Holds return value from "FindText" method
    Dim foundText6 As Integer 'Holds return value from "FindText" method
    Dim foundText7 As Integer 'Holds return value from "FindText" method
    Dim foundText8 As Integer 'Holds return value from "FindText" method
    Dim foundText9 As Integer 'Holds return value from "FindText" method
    Dim foundText10 As Integer 'Holds return value from "FindText" method
    Dim foundText11 As Integer 'Holds return value from "FindText" method
    Dim foundText12 As Integer 'Holds return value from "FindText" method
    Dim foundText13 As Integer 'Holds return value from "FindText" method
    Dim Respi
    Dim Path As String
    
    'hard coding for a PDF to open, it can be changed when needed.
        'Path = SDdrive & ":\aLXE-Pricing\Reunion\Branch300.pdf"
    gPDFPath = filePath
 
    'Initialize Acrobat by creating App object
    Set gApp = CreateObject("AcroExch.App")
    gApp.Hide
        
    'Set AVDoc object
    Set gAvDoc = CreateObject("AcroExch.AVDoc")
        
    ' open the PDF
    If gAvDoc.Open(gPDFPath, "") Then
        sText1 = Format(Date, "mmmm d, yyyy") 'enter your searchstring here
        sText2 = Format(Date, "mmm-d-yyyy")  'enter your searchstring here
        sText3 = Format(Date, "mm/dd/yy")  'enter your searchstring here
        sText4 = Format(Date, "mm/dd/yyyy")  'enter your searchstring here
        sText5 = Format(Date, "m/d/yyyy")  'enter your searchstring here
        sText6 = Format(Date, "m/d/yy")  'enter your searchstring here
        sText7 = Format(Date, "mm_dd_yyyy")  'enter your searchstring here
        sText8 = Format(Date, "dddd, mmmm dd, yyyy")  'enter your searchstring here
        sText9 = Format(Date, "mmm-dd-yy")  'enter your searchstring here
        sText10 = Format(Date, "mmmm dd, yyyy")  'enter your searchstring here
        sText11 = Format(Date, "mmm dd, yyyy")  'enter your searchstring here
        sText12 = Format(Date, "mmm d, yyyy")  'enter your searchstring here
        sText13 = Format(Date, "d-mmm-yy")  'enter your searchstring here
        
        'FindText params: StringToSearchFor, caseSensitive (1 or 0), WholeWords (1 or 0), ResetSearchToBeginOfDocument (1 or 0)
        foundText1 = gAvDoc.FindText(sText1, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText2 = gAvDoc.FindText(sText2, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText3 = gAvDoc.FindText(sText3, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText4 = gAvDoc.FindText(sText4, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText5 = gAvDoc.FindText(sText5, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText6 = gAvDoc.FindText(sText6, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText7 = gAvDoc.FindText(sText7, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText8 = gAvDoc.FindText(sText8, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText9 = gAvDoc.FindText(sText9, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText10 = gAvDoc.FindText(sText10, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText11 = gAvDoc.FindText(sText11, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText12 = gAvDoc.FindText(sText12, 1, 0, 1) 'Returns -1 if found, 0 otherwise
        foundText13 = gAvDoc.FindText(sText13, 1, 0, 1) 'Returns -1 if found, 0 otherwise
    Else
        ' if failed, show error message
        'Resp = MsgBox("Cannot open" & gPDFPath, vbOKOnly)
        PDFdatecheck = True
        gAvDoc.BringToFront
         Sleep 2000
         SendKeys ("^q"), True   'exit document
         Sleep 1500
        On Error Resume Next
        gApp.Exit
        Set gAvDoc = Nothing
        Set gApp = Nothing
        'numberlock
        Exit Function
    End If
 
 If foundText1 = -1 Or foundText2 = -1 Or foundText3 = -1 Or foundText4 = -1 Or foundText5 = -1 Or foundText6 = -1 Or foundText7 = -1 Or foundText8 = -1 Or foundText9 = -1 Or foundText10 = -1 Or foundText11 = -1 Or foundText12 = -1 Or foundText13 = -1 Then
        'compose a message
        'sStr = "Rates are Current"
        'Resp = MsgBox(sStr, vbOKOnly)
        gAvDoc.BringToFront
        Sleep 2000
        SendKeys ("^q"), True   'exit document
        Sleep 1500
        On Error Resume Next
        gApp.Exit
        Set gAvDoc = Nothing
        Set gApp = Nothing
        'SendKeys ("{ENTER}"), True   'exit document
  Else
        ' if failed, show error message
        'Resp = MsgBox("Today's Rates Not Available Yet", vbOKOnly)
        PDFdatecheck = True
        gAvDoc.BringToFront
         Sleep 2000
         SendKeys ("^q"), True   'exit document
         Sleep 1500
        On Error Resume Next
        gApp.Exit
        Set gAvDoc = Nothing
        Set gApp = Nothing
    End If
End Function

Open in new window

Avatar of Chris Raisin
Chris Raisin
Flag of Australia image

Stand By....
So is your function returning anything already? I gather it is not working, hence the question. What is the function returning presently?

What version of Access are you using and I assume you want the code in VB6 or VBA.

Are you looking for a date WITHIN the PDF or are you lookiing at the PDF's creation/modified date and time?

Once I clear up these questions I will get the answer for you ASAP.

Cheers
Chris (craisin)
Melbourne,Australia

One final thing, is the PDF any old PDF or are they a series of PDFs usually in the same format? The reason I ask is that I want to know if it is OK to test on any PDF I currently have. Perhaps you can supply a PDF if you want to ensure it works on yours.

Also, what if MULTIPLE dates and times appear in the PDF? Do we take just the first one found, or the latest? (Remember that there may be FUTURE dates in the text of a PDF if the author is writing about the future).

Cheers
Chris
You appear to be using a few "global" values in your code.

Can you supply the main module containing the "globals" as well please?
Further question...what references are you using?

I am looking for the library AccroExch. Where did you obtain that?
Avatar of rsburge

ASKER

Hi - Thank you for your help.  Answers to your questions are below...

1. This code does currently work - it opens a PDF and searches through all of the dates in the PDF looking for today's date and if found, closes the PDF and the remainder of my parent function runs.  If today's date isn't found, then the function returns FALSE and the remainder of the parent code ceases and moves on to something else.  I supplied this code because it is working, but I need to enhance it to look for date and time rather than just date, or create new code that will run after this current code runs if today's date is found that looks for the time in the PDF.  It could do something like if today's date is found, get the time associated with it and store it in pdfTime.

2.  Access 2007 and Yes, VBA

3. Date within the PDF

4.  This will be used on many different PDF's, but for now there are 5 that are the most urgent.  I will attach samples.  The only problem is all of the samples I have are from yesterday, 10/4.  Today's PDF's are not available yet.  I have highlighted the date/time that I need to look at in yellow.  So it isn't necessarily the first date found.  

5.  I will also attach the shared decs module.

SunTrustWhsl1.pdf PHH.pdf MetLife.pdf Merrimack.pdf AFRwholesale.pdf
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Public Declare Function ShellExecute Lib "shell32.dll" Alias "ShellExecuteA" (ByVal hwnd As Long, ByVal lpOperation As String, ByVal lpFile As String, ByVal lpParameters As String, ByVal lpDirectory As String, ByVal nShowCmd As Long) As Long
Public Declare Function FindWindow Lib "user32" Alias "FindWindowA" (ByVal lpClassName As String, ByVal lpWindowName As String) As Long
Public Declare Function PostMessage Lib "user32" Alias "PostMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long
Public Declare Function MoveFile Lib "kernel32" Alias "MoveFileA" (ByVal lpExistingFileName As String, ByVal lpNewFileName As String) As Long
Public Declare Function SendMessage Lib "user32" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long
Public Declare Function SetFocusAPI Lib "user32" Alias "SetForegroundWindow" (ByVal hwnd As Long) As Long
Public WinWnd As Long, windowName As String, retval As Long, lpClassName As String
'Public Declare Function SetFocusAPI Lib "user32" Alias "SetForegroundWindow" _
'    (ByVal hwnd As Long) As Long
Public Declare Function GetWindow Lib "user32" _
    (ByVal hwnd As Long, ByVal wCmd As Long) As Long
Public Declare Function GetDesktopWindow Lib "user32" () As Long
Public Declare Function GetWindowLW Lib "user32" Alias "GetWindowLongA" _
    (ByVal hwnd As Long, ByVal nIndex As Long) As Long
Public Declare Function GetParent Lib "user32" (ByVal hwnd As Long) As Long
Public Declare Function GetClassName Lib "user32" Alias "GetClassNameA" _
    (ByVal hwnd As Long, ByVal lpClassName As String, _
     ByVal nMaxCount As Long) As Long
Public Declare Function GetWindowText Lib "user32" Alias "GetWindowTextA" _
    (ByVal hwnd As Long, ByVal lpString As String, ByVal cch As Long) As Long

'*********************************************************************************************************************************************************

Public Const GWL_ID = (-12)
Public Const GW_HWNDNEXT = 2
Public Const GW_CHILD = 5
Public Const SW_SHOWNORMAL = 1
Public Const WM_CLOSE = &H10


'*********************************************************************************************************************************************************

Public TargetUrl As String
Public blnComp As Boolean
Public strHtml As String
Public sPos As Long
Public ePos As Long
Public ReturnValue
Public IE As Object
Public IEb As Object
Public xlApp As Object
Public xlws As Excel.Worksheet
Public xlwb As Excel.Workbook
Public xlapp2 As Object
Public xlws2 As Excel.Worksheet
Public xlwb2 As Excel.Workbook
Public xlwb3 As Excel.Workbook
Public xlwb4 As Excel.Workbook
Public sheet As Excel.Worksheet
Public WshShell As Object

'*********************************************************************************************************************************************************

Public attchnm As String, atttchnm2 As String    '<----- Used in ST functions; this is the usual name of the email attachment
Public bn As String '<----- Bot Number
Public cwrg As String
Public Cnm As String    'name of client to sync
Public LXSync As String 'URL for the client to sync
Public filePath As String, filePath2 As String, filePath3 As String
Public fName As String  '<----- file name for saving file
Public ln As String, LNm As String '<----- Lender ID & lender name
Public layout As String '<----- path for pdf2xl layout
Public kw As String '<----- Bot Keyword
Public ebot As String   '<----- external bot path
Public ieLink, I As Long
Public ibot As String   '<----- internal bot path
Public extb As String, intb As String, extab As String, intab As String  '---- ext is for external and int is for internal base url for bots
Public un As String, userID As String   '<----- website username userID is for the company code
Public pw As String '<----- website password
Public polgov As String, polconv As String  '<-----file save locations for polaris
Public merge1 As String, merge2 As String, merge3 As String, merge4 As String, merge5 As String
Public psid As String  '<-----price sheet id retrieved from LXE
Public rng As String, tsrng As String, tsrng2 As String, tsrng3 As String, tsrng4 As String
Public sec1 As String, sec2 As String, sec3 As String   '<----- for Amtrust security questions
Public strSQL As String, strBFO As String   '<----- to define the SQL statements
Public urlExpire As String  '<-----url used to update status to expired
Public urlInactive As String  '<-----url used to update status to inactive
Public urlDoc As String
Public urlMain As String
Public urlPDF  As String
Public urlXls As String
Public vresponse As String
Public xlsx As String
Public st As String, st2 As String
Public sResponse1 As String, sResponse2 As String, sResponse3 As String, sResponse4 As String, sResponse5 As String  '<----- set to True or False from Function to use later in code  (see BoAc for examples)
Public chaseResponse1 As String, chaseResponse2 As String, chaseResponse3 As String, chaseResponse4 As String, chaseResponse5 As String  '<----- set to True or False from Function to use later in code  (Chase)
Public usbcResponse1 As String, usbcResponse2 As String  '<----- set to True or False from Function to use later in code  (USBank Corr)
Public usbwResponse1 As String, usbwResponse2 As String  '<----- set to True or False from Function to use later in code  (USBank Whsl)
Public mlResponse1 As String, mlResponse2 As String, mlResponse3 As String, mlResponse4 As String, mlResponse5 As String '<----- set to True or False from Function to use later in code  (MetLife)
Public boaResponse1 As String, boaResponse2 As String, boaResponse3 As String, boaResponse4 As String, boaResponse5 As String  '<----- set to True or False from Function to use later in code  (BoAc)
Public atResponse1 As String, atResponse2 As String, atResponse3 As String, atResponse4 As String, atResponse5 As String, atResponse6 As String  '<----- set to True or False from Function to use later in code  (Amtrust)
Public citiResponse1 As String, citiResponse2 As String  '<----- set to True or False from Function to use later in code  (Citi)
Public ebResponse1 As String, ebResponse2 As String, ebResponse3 As String  '<----- set to True or False from Function to use later in code  (EverBank)
Public pfResponse1 As String, pfResponse2 As String  '<----- set to True or False from Function to use later in code  (Provident)
Public spResponse1 As String, spResponse2 As String, spResponse3 As String, spResponse4 As String '<----- set to True or False from Function to use later in code  (Sierra Pacific)
Public crescentResponse1 As String  '<----- set to True or False from Function to use later in code  (Crescent)
Public tmplt As String
Public USBnm As String
Public EAST As String, CENTRAL As String, WEST As String    '----for wells fargo regions
'Public SDdrive As String
Public SDletter As String
Public Sync As String
Public syncLender As String


'*********************************************************************************************************************************************************

Public db As DAO.Database
Public rs As DAO.Recordset
Public rs2 As DAO.Recordset

'*********************************************************************************************************************************************************
'Private Declare Function FindWindow Lib "user32" Alias "FindWindowA" (ByVal lpClassName As String, ByVal lpWindowName As String) As Long
Public Declare Function FindWindowEx Lib "user32" Alias "FindWindowExA" (ByVal HWND1 As Long, ByVal HWND2 As Long, ByVal lpsz1 As String, ByVal lpsz2 As String) As Long
Public Declare Function SetActiveWindow Lib "user32.dll" (ByVal hwnd As Long) As Long
Public Declare Function GetWindowTextLength Lib "user32" Alias "GetWindowTextLengthA" (ByVal hwnd As Long) As Long
'public Declare Function SendMessage Lib "user32" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As Long) As Long
Public Declare Function SendMessageByString Lib "user32" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As String) As Long
Public Declare Function SetForegroundWindow Lib "user32" (ByVal hwnd As Long) As Long
'public Declare Function PostMessage Lib "user32" Alias "PostMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As Long) As Long
Public Declare Function GetWindowRect Lib "user32.dll" (ByVal hwnd As Long, lpRect As RECT) As Long
Public Declare Function SetCursorPos Lib "user32.dll" (ByVal x As Integer, ByVal y As Integer) As Integer

Public Type RECT
        Left As Long
        Top As Long
        Right As Long
        Bottom As Long
End Type

Public Const WM_SETTEXT = &HC
Public Const WM_GETTEXT = &HD
Public Const WM_GETTEXTLENGTH = &HE
Public Const BM_CLICK = &HF5

Open in new window

Avatar of rsburge

ASKER

Let me look into that AccroExch - I can't remember.  It may be because I am also using Adobe Acrobat 9.0 Professional and not just a reader.

Would it be helpful if I attach files with today's date once they become available?
Its OK. I went in VBA within Excel and it all seemed to work fine!
(Goodness knows what library holds it)

I am changing the function slightly to accept a filename and also a date on which to search. Thf function will then return a value in a certain format.

I did that so the function can be used on a much bigger scale so it does not matter that the date is not todays date (really jhelps the testing!).

I ran across one slight problem to start off...in Austral;ia the date is 6th October, and we write that as 06/10/2011. To make the function truly generic I will also include a "opattern" parameter which will default to your format of "mm/dd/yy"

Anyway...things are goinf well

Stand by....

Chris
I am getting a lot of messages when the date is NOT found.

Is that supposed to happen?

It appears to be automatically produced by the function gAvDoc.FindText()
Avatar of rsburge

ASKER

I thought I had commented out the message boxes, but I might not have.  

Are the messages saying "Today's Rates Not Available Yet"
Yes...That is correct.

There is a way around it nut it is complicated.

You need to read all the text in from the PDF and then search the sords extracted
using something like "Instr()"

Actually that would be a good way to go, because it makes the finsing of the time so much easier.

Should I proceed with that method?

Off to bed now since it is 4am and I need at least 4 hours sleep!  :-)
I must be getting tired....Repeating my last mesage so it makes sense:


Yes...That is correct.

There is a way around it but it is complicated.

You need to read all the text in from the PDF and then search the words extracted
using something like "Instr()"

Actually that would be a good way to go, because it makes the finding of the time so much easier.

Should I proceed with that method?

Off to bed now since it is 4am and I need at least 4 hours sleep!  :-)
(It is now 4.23 am actually)
Avatar of rsburge

ASKER

Thank you again for all of your help!  I hope you get some sleep.  :)

The method you mention would be fine as long as it won't run too slowly when searching through the PDF's.

If it helps, the search can be limited to the first 3 pages of any pdf...  the effective date and time are always within the first 3 pages; although some have only 1 page and some have over 10 pages.
OK thanks.

The speed is not an issue (it is always very fast).

Please let me know what references you are using. For some reason I changed mine and now the PDFs are becoming corrupt when I search them.
(Look under "Tools/References")

Cheers
Chris
Avatar of rsburge

ASKER

I am away from computer right now...  I will look as soon as I can.
Avatar of rsburge

ASKER

There are a whole slew of references...  I didn't do the initial design, so there could be several in here that I don't need.  I will attach file for you that shows all of the references References.pdf
OK....thanks for that...stand by
For some reason I keep getting "corruption" messages, and yet first time I used your PDF files they worked!

I think I am having problems becasue Acrobat 5 keeps firing up when I fire up Acrobat using the line:
     'Initialize Acrobat by creating App object
     Set gApp = CreateObject("AcroExch.App")

Trying to see if we can capture text by other means.
Stand by......
Avatar of rsburge

ASKER

Would it make any difference that the code I supplied works for acrobat 9.0?
No...its OK...have made a major breakthrough and am now just picking up times.

I need time patterns,

e.g.   "09:10AM"
         "13:45:00"
         "15.41PM"
         "07:19 PM"

or are they all the same format?

Have done away with need to use AcroEch altogether and it runs VERY fast.
I think you will be pleased.

I await your patterns.
OK I have it working.

I just re-read your question and I think I have it all in hand now.

I will post some TEST code for you to look at, and we can tweak it where needed.

Stand by.....
Avatar of rsburge

ASKER

Excellent!  Thank you so much for all of your help!

The format of the time can vary from PDF to PDF.  That makes it a nightmare, right?  I have seen all of the formats you listed plus others.  Do you need me to go through all of the PDFs I get and figure out all of the formats?
It is strange, but every file returns the correct time, except the file "AFRWholesale".

Can you think of a reason for that?

I attach the associated text file that is being searched.

Cheers
Chris
SunTrustWhsl1.txt
Avatar of rsburge

ASKER

Hmmm...  The AFR file is a PDF printed file...  In other words, it comes in a completely locked down Excel file and I have other code that prints it to PDF.  Could that be it; that it is printed to PDF?  Could the code possibly read a protected excel file too?  If so, I could read that before printing it to PDF.
Its OK...I have fixed it!  :-)

It was because the text was being written out as a continuous line and there were multiple occurrences of the "times" in the same line.

I changed the extracted text output to be "text extractyed with same structure as showing in PDF" and that fixed the problem.

OK...here it all is.

There are several functions you need so I will list them all, one post at a time.

The code "Main" is test code to show how the process works.
I wrote this in VB6 but it should run fine as VBA in the editor within Access/Excel.

In "Main" you need to change a couple of lines to indicate the date for which you are searching and also the path in which the PDF files are stored. Of course at "real run time" the date you will search for will be "date()"

Please note that there is one external executable file you need to install in the same directory as the PDF files. It is a public domain executable and it took me HOURS to find it on the net. It is called "PDFToText.exe" and is probably written in "C" (source code not available). It is normally run from the DOS "command" line and if you go into "Cmd" from the "Start" button on your sektop and simply type in "PDFTotext.exe" and press Enter you can see the parameters it accepts.

Luckily it runs very fast and is not noticible when called from another program (like out VBA code). You have to give it time to execute before it moves on to the rest of the VB code, so I have included some code I wrote to pause until the executable has finished (subroutine called "IsRunning") .

All the code starting with "CKR" is my own code (my initilas are CKR).

Give it a go and see if it does what you want.
If so, all you will need to do is (in your own code) call "PDFFindDateTime" which in turn calls "FindTime". The value returned is the FIRST time found AFTER the the program fiunds the first occurrence of the stipulated date. The date returend is in 24 hour format (although the code does allow you to change this to AM/PM if desired).

You will also need to include the code for the various "CKR" subroutines unless you can find alternatives:
                    CKRFileExists
                    CKRIsProcessRunning
                    CKRIsRunning
                    CKRKillProcess
                    CKRPadL

The references I have used within VB6 are:
     Visual Basic for Applications
     Visual Basic Runtime objects and procedures
     Visual Basic objects and Procedures
     Adobe Acrobat Browser Control Type Library 1.0

The executable file you need is "PDFToText.exe" which is available from the following link I have set up for you:

    https://filedb.experts-exchange.com/incoming/ee-stuff/8149-pd.renamed-from-exe-for-your-safety

Now here comes the code...Please stand by.
There should be 6 submissions.


'Code 1 of 6

Option Explicit

Public Sub Main()
  Dim dDate    As Date      'date to be searched (normally today's date
  Dim cPath    As String
  Dim cFile    As String
  Dim nFile    As Integer
  Dim cRet     As String
  Dim cRet1    As String
  Dim cFiles() As String 'All the PDF file names are stored in this array
  ReDim cFiles(0)
  'In this test code change the values of the next two lines
  'to suit
  dDate = CDate("4/10/2011")
  'Path where the PDF files are stored
  'Note: The executable "PDFToText.exe" must be stored in that directory also
  cPath = "F:\DEV\# EE AND IT\FINDDATETIMEINPDF\"

  'Just a check a final backslash is stored in the path
  If Right$(cPath, 1) <> "\" Then
    cPath = cPath + "\"

  End If

  ChDir cPath
  cFile = Dir(cPath + "\*.pdf")

  Do While cFile <> ""
    ReDim Preserve cFiles(UBound(cFiles) + 1)
    cFiles(UBound(cFiles)) = cPath + cFile
    cFile = Dir
  Loop

  For nFile = 1 To UBound(cFiles)
    cRet1 = PDFFindDateTime(cFiles(nFile), dDate, False) 'Get time in 24 hourformat
    cRet = cRet + cFiles(nFile) + ":" + cRet1 + vbCrLf
  Next
  MsgBox Trim(cRet)
  End

End Sub

Public Function PDFFindDateTime(cPDFFile As String, _
                                dDate As Date, _
                                bAMPM As Boolean) As String
  Dim sText(13) As String
  Dim x         As Integer
  Dim cLine     As String
  Dim nHandle   As Integer
  Dim BFound    As Boolean
  Dim cRun      As String
  Dim nRet      As Long
  Dim strTime   As String

  If Dir(cPDFFile) <> "" Then
    sText(1) = Format(dDate, "mmmm d, yyyy") 'enter your searchstring here
    sText(2) = Format(dDate, "mmm-d-yyyy")  'enter your searchstring here
    sText(3) = Format(dDate, "mm/dd/yy")  'enter your searchstring here
    sText(4) = Format(dDate, "mm/dd/yyyy")  'enter your searchstring here
    sText(5) = Format(dDate, "m/d/yyyy")  'enter your searchstring here
    sText(6) = Format(dDate, "m/d/yy")  'enter your searchstring here
    sText(7) = Format(dDate, "mm_dd_yyyy")  'enter your searchstring here
    sText(8) = Format(dDate, "dddd, mmmm dd, yyyy")  'enter your searchstring here
    sText(9) = Format(dDate, "mmm-dd-yy")  'enter your searchstring here
    sText(10) = Format(dDate, "mmmm dd, yyyy")  'enter your searchstring here
    sText(11) = Format(dDate, "mmm dd, yyyy")  'enter your searchstring here
    sText(12) = Format(dDate, "mmm d, yyyy")  'enter your searchstring here
    sText(13) = Format(dDate, "d-mmm-yy")  'enter your searchstring here
    cRun = Chr(34) + App.Path + "\PDFtoText.exe" + Chr(34) + " -layout " + Chr(34) + cPDFFile + Chr(34) + " " + Chr(34) + App.Path + "\PDF2Text.txt" + Chr(34)
    nRet = Shell(cRun, vbMinimizedNoFocus)

    'don't continue until the PDFTText.exe has finished
    Do While CKRIsRunning("PDFTOTEXT.EXE")
      DoEvents
    Loop

    If nRet > 0 Then   'The program ran successfully

      For x = 1 To UBound(sText)
        nHandle = FreeFile
        Open App.Path + "\PDF2Text.txt" For Input As #nHandle

        Do While Not EOF(nHandle)
          Line Input #nHandle, cLine

          If InStr(cLine, sText(x)) > 0 Then
            PDFFindDateTime = cLine
            BFound = True
            strTime = FindTime(nHandle, cLine, bAMPM)
            Exit Do

          End If

        Loop
        Close #nHandle

        If BFound Then
          Exit For

        End If

      Next
    Else
      MsgBox "Error: PDFToText.exe could not run!"
      End

    End If

  End If

  PDFFindDateTime = strTime

End Function

Private Function FindTime(nHandle As Integer, cLine As String, bAMPM As Boolean) As String
  Dim cRet       As String
  Dim cTimes(12) As String
  Dim cToken()   As String
  Dim BFound     As Boolean
  Dim nToken     As Integer
  Dim y          As Integer
  'The various formats for time
  cTimes(1) = "#:##"
  cTimes(2) = "#.##"  'if followed by AM or PM this is a time, else normal decimal
  cTimes(3) = "##:##"
  cTimes(4) = "##.##" 'if followed by AM or PM this is a time, else normal decimal
  cTimes(5) = "#:##AM"
  cTimes(6) = "#.##AM"
  cTimes(7) = "##:##AM"
  cTimes(8) = "##.##AM"
  cTimes(9) = "#:##PM"
  cTimes(10) = "#.##PM"
  cTimes(11) = "##:##PM"
  cTimes(12) = "##.##PM"

  Do While Not EOF(nHandle)
    cToken() = Split(Trim(cLine), " ")

    For nToken = 0 To UBound(cToken)
      For y = 1 To UBound(cTimes)

        If UCase$(cToken(nToken)) Like cTimes(y) Then
          If InStr(UCase(cToken(nToken)), "AM") = 0 And InStr(UCase(cToken(nToken)), "PM") = 0 Then

            If Len(cToken(nToken + 1)) > 0 Then
              If InStr("AM PM", UCase(cToken(nToken + 1))) > 0 Then
                If bAMPM Then
                  FindTime = cToken(nToken) + " " + cToken(nToken + 1)
                Else

                  If UCase$(cToken(nToken + 1)) = "PM" Then
                    FindTime = Left$(cToken(nToken), InStr(cToken(nToken), ":") - 1)

                    If Val(FindTime) < 12 Then
                      FindTime = CStr(Val(FindTime) + 12) + Mid$(cToken(nToken), InStr(cToken(nToken), ":"))
                    Else
                      FindTime = cToken(nToken)

                    End If

                  Else
                    FindTime = cToken(nToken)

                  End If

                End If

                Exit Do

              End If

            End If

          Else

            If InStr(cToken(nToken), ":") > 0 Then
              FindTime = Left$(cToken(nToken), InStr(cToken(nToken), ":") - 1)

              If UCase(Right$(cToken(nToken), 2)) = "PM" Then
                If Val(FindTime) < 12 Then
                  FindTime = CStr(Val(FindTime) + 12) + Mid$(cToken(nToken), InStr(cToken(nToken), ":"), 3)
                Else
                  FindTime = FindTime + Mid$(cToken(nToken), InStr(cToken(nToken), ":"), 3)

                End If

              Else
                FindTime = Left(cToken(nToken), Len(cToken(nToken)) - 2)

              End If

              Exit Do
            Else
              FindTime = Left$(cToken(nToken), InStr(cToken(nToken), ".") - 1)

              If UCase(Right$(cToken(nToken), 2)) = "PM" Then
                FindTime = CStr(Val(FindTime) + 12) + Mid$(cToken(nToken), InStr(cToken(nToken), "."), 3)
              Else
                FindTime = Left(cToken(nToken), Len(cToken(nToken)) - 2)

              End If

              Exit Do

            End If

          End If

        End If

      Next y
    Next nToken

    Line Input #nHandle, cLine
  Loop

End Function

Open in new window


'Code 2 of 6
Option Explicit

Public Function CKRFileExists(FileName) As Boolean
  Dim fs As Object
  Set fs = CreateObject("Scripting.FileSystemObject")
  CKRFileExists = fs.FileExists(FileName)
End Function

Open in new window


'Code 3 of 6
Option Explicit

Function CKRIsProcessRunning(cProgram As String, _
                             Optional lDontErase As Boolean, _
                             Optional ExcludeThisInstance As Boolean, _
                             Optional bKillInstance As Boolean) As Boolean
  Dim process              As Object
  Dim nHandleOut           As Long
  Dim cTime                As Variant
  Dim cTempFile            As String
  Dim dDate                As Date
  Dim nFound               As Integer
  Dim cRun                 As String
  Dim nMaxInstancesAllowed As Integer
  cTime = Time
  dDate = Date
  CKRIsProcessRunning = False

  If ExcludeThisInstance Then
    nMaxInstancesAllowed = 1   'cannot count the current one running!
  Else
    nMaxInstancesAllowed = 0

  End If

  nHandleOut = FreeFile
  'Produce a file for debugging purposes
  'If you want to see what IS running comment out the
  'KILL block at the end of this code
  cTempFile = App.Path + "\WindowsList_" + CStr(Year(dDate)) + CKRPadL(CStr(Month(dDate)), 2, "0") + CKRPadL(CStr(Day(dDate)), 2, "0") + "_" + CKRPadL(CStr(Hour(cTime)), 2, "0") + CKRPadL(CStr(Minute(cTime)), 2, "0") + CKRPadL(CStr(Second(cTime)), 2, "0") + ".txt"
  Open cTempFile For Output As #nHandleOut

  For Each process In GetObject("winmgmts://").InstancesOf("win32_process")

    Print #nHandleOut, UCase(process.Name)

    If UCase(process.Name) = UCase(cProgram) + ".EXE" Then
      nFound = nFound + 1

      If nFound > nMaxInstancesAllowed Then
        If bKillInstance Then
          If MsgBox("Do you wish to stop " + cProgram + "?", vbQuestion + vbYesNo) = vbYes Then
            CKRKillProcess cProgram

          End If

        End If

        CKRIsProcessRunning = True
        Exit For

      End If

    End If

  Next
  Close #nHandleOut

  If Not lDontErase Then
    If CKRFileExists(cTempFile) Then
      Kill cTempFile

    End If

  End If

End Function

Open in new window


'code 4 of 6
Option Explicit
Private Const MAX_PATH = 260
Private Const PROCESS_TERMINATE = &H1
Private Const PROCESS_CREATE_THREAD = &H2
Private Const PROCESS_VM_OPERATION = &H8
Private Const PROCESS_VM_READ = &H10
Private Const PROCESS_VM_WRITE = &H20
Private Const PROCESS_DUP_HANDLE = &H40
Private Const PROCESS_CREATE_PROCESS = &H80
Private Const PROCESS_SET_QUOTA = &H100
Private Const PROCESS_SET_INFORMATION = &H200
Private Const PROCESS_QUERY_INFORMATION = &H400

Type MODULEINFO

  lpBaseOfDLL As Long
  SizeOfImage As Long
  EntryPoint As Long

End Type

Type PROCESS_MEMORY_COUNTERS

  cb As Long
  PageFaultCount As Long
  PeakWorkingSetSize As Long
  workingSetSize As Long
  QuotaPeakPagedPoolUsage As Long
  QuotaPagedPoolUsage As Long
  QuotaPeakNonPagedPoolUsage As Long
  QuotaNonPagedPoolUsage As Long
  PagefileUsage As Long
  PeakPagefileUsage As Long

End Type

Type PSAPI_WS_WATCH_INFORMATION

  FaultingPc As Long
  FaultingVa As Long

End Type

Public Declare Function EmptyWorkingSet Lib "psapi.dll" (ByVal hProcess As Long) As Long
Public Declare Function EnumDeviceDrivers _
               Lib "psapi.dll" (lpImageBase() As Long, _
                                ByVal cb As Long, _
                                lpcbNeeded As Long) As Long
Public Declare Function EnumProcesses _
               Lib "psapi.dll" (lpidProcess As Long, _
                                ByVal cb As Long, _
                                cbNeeded As Long) As Long
Public Declare Function EnumProcessModules _
               Lib "psapi.dll" (ByVal hProcess As Long, _
                                lphModule As Long, _
                                ByVal cb As Long, _
                                lpcbNeeded As Long) As Long
Public Declare Function GetDeviceDriverBaseName _
               Lib "psapi.dll" _
               Alias "GetDeviceDriverBaseNameA" (ByVal ImageBase As Long, _
                                                 ByVal lpBaseName As String, _
                                                 ByVal nSize As Long) As Long
Public Declare Function GetDeviceDriverFileName _
               Lib "psapi.dll" _
               Alias "GetDeviceDriverFileNameA" (ByVal ImageBase As Long, _
                                                 ByVal lpFileName As String, _
                                                 ByVal nSize As Long) As Long
Public Declare Function GetMappedFileName _
               Lib "psapi.dll" _
               Alias "GetMappedFileNameA" (ByVal hProcess As Long, _
                                           ByVal lpv As Long, _
                                           ByVal lpFileName As String, _
                                           ByVal nSize As Long) As Long
Public Declare Function GetModuleBaseName _
               Lib "psapi.dll" _
               Alias "GetModuleBaseNameA" (ByVal hProcess As Long, _
                                           ByVal hModule As Long, _
                                           ByVal lpFileName As String, _
                                           ByVal nSize As Long) As Long
Public Declare Function GetModuleFileNameEx _
               Lib "psapi.dll" _
               Alias "GetModuleFileNameExA" (ByVal hProcess As Long, _
                                             ByVal hModule As Long, _
                                             ByVal lpFileName As String, _
                                             ByVal nSize As Long) As Long
Public Declare Function GetModuleInformation _
               Lib "psapi.dll" (ByVal hProcess As Long, _
                                ByVal hModule As Long, _
                                lpmodinfo As MODULEINFO, _
                                ByVal cb As Long) As Long
Public Declare Function GetProcessMemoryInfo _
               Lib "psapi.dll" (ByVal hProcess As Long, _
                                ppsmemCounters As PROCESS_MEMORY_COUNTERS, _
                                ByVal cb As Long) As Long
Public Declare Function GetWsChanges _
               Lib "psapi.dll" (ByVal hProcess As Long, _
                                lpWatchInfo As PSAPI_WS_WATCH_INFORMATION, _
                                ByVal cb As Long) As Long
Public Declare Function InitializeProcessForWsWatch _
               Lib "psapi.dll" (ByVal hProcess As Long) As Long
Public Declare Function QueryWorkingSet _
               Lib "psapi.dll" (ByVal hProcess As Long, _
                                pv As Long, _
                                ByVal cb As Long) As Long
Private Declare Function OpenProcess _
                Lib "kernel32" (ByVal dwDesiredAccess As Long, _
                                ByVal bInheritHandle As Long, _
                                ByVal dwProcessID As Long) As Long
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
   
'Using the API
'This example demonstrates how to enumerate all running process and provides information about each of the
'modules running, along with their working set usage. The first thing to do is to get a list of running
'processes. The EnumProcesses takes an array which you must pre-initialize to the maximum number of processes
'you can work with, then tells you how many were actually returned.
'The final parameter (optional) allow you to "Kill" any instances found.
Public Function CKRIsRunning(cProgram As String, _
                             Optional lDontEraseListing As Boolean, _
                             Optional bExcludeThisInstance As Boolean, _
                             Optional bKillInstance As Boolean, _
                             Optional bStartRunning As Boolean) As Boolean
  Dim aProcesses() As Long
  Dim cbNeeded     As Long
  Dim cProcesses   As Long
  Dim lProcessId   As Long
  'If nTestForInstances = 0 Then
  '  nTestForInstances = 1
  'End If
  'now superceded by process IsProcessRunning
  CKRIsRunning = CKRIsProcessRunning(cProgram, lDontEraseListing, bExcludeThisInstance, bKillInstance)
  Exit Function
  ReDim aProcesses(0 To 1023) As Long

  If (EnumProcesses(aProcesses(0), 1024 * 4, cbNeeded) <> 0) Then
    '// Calculate how many process identifiers were returned.
    cProcesses = cbNeeded / 4

  End If

  'Having got a process list, you can then find the specific information about each process.
  'The first thing to do is to convert the process ID into a handle to a process:
  Dim hProcess As Long
  hProcess = OpenProcess(PROCESS_QUERY_INFORMATION Or PROCESS_VM_READ, 0, lProcessId)
  CloseHandle hProcess
  'Once you have a handle, you can then find the memory usage information and a list of modules
  'associated with the process. The list of modules is obtained using an array in the same manner
  'as getting a list of processes in the first place:
  Dim hMod(0 To 1023) As Long
  'Dim cbNeeded As Long
  Dim cModules        As Long
  Dim tPMC            As PROCESS_MEMORY_COUNTERS
  Dim workingSetSize  As Long
  GetProcessMemoryInfo hProcess, tPMC, Len(tPMC)
  workingSetSize = tPMC.workingSetSize

  If (hProcess <> 0) Then
    If (EnumProcessModules(hProcess, hMod(0), 1024 * 4, cbNeeded)) Then
      cModules = cbNeeded \ 4

    End If

  End If

  'Then you can find the name, file name and information about the memory usage of the process:
  Dim szProcessName   As String
  Dim processName     As String
  Dim processFileName As String
  Dim lR              As Long
  Dim i               As Integer
  Dim lLen            As Integer

  For i = 0 To cModules - 1
    szProcessName = String$(MAX_PATH, 0)
    LSet szProcessName = "unknown"
    lR = GetModuleBaseName(hProcess, hMod(i), szProcessName, lLen)
    processName = szProcessName
    szProcessName = String$(MAX_PATH, 0)
    LSet szProcessName = "unknown"
    lR = GetModuleFileNameEx(hProcess, hMod(i), szProcessName, lLen)
    processFileName = szProcessName

    If Left(LCase(processName), Len(cProgram)) = LCase(cProgram) Then
      CKRIsRunning = True

      If bKillInstance Then
        If MsgBox("Do you want to stop " + cProgram + " ?", vbYesNo + vbQuestion) = vbYes Then
          CKRKillProcess cProgram

        End If

      End If

      If bStartRunning Then
        OpenProcess 0, 0, hProcess

      End If

      Exit Function

    End If

  Next i

End Function

'The demonstration application uses these techniques to present a complete process list with names, file names and memory usage in a ListView.
'Conclusion
'This tip demonstrates how to use the PSAPI.DLL functions under Windows NT/2000/XP to investigate running
'processes. You can use this information to monitor memory usage of an application, or to look into which
'DLLs a particular application is using. See also the GUI Resource Tracer utility which uses these techniques
'to provide a correct task list matched against the running processes.

Open in new window


'Code 5 of 6
Option Explicit
Public Declare Function OpenProcess _
               Lib "Kernel32.dll" (ByVal dwDesiredAccessas As Long, _
                                   ByVal bInheritHandle As Long, _
                                   ByVal dwProcId As Long) As Long
Public Declare Function EnumProcesses _
               Lib "psapi.dll" (ByRef lpidProcess As Long, _
                                ByVal cb As Long, _
                                ByRef cbNeeded As Long) As Long
Public Declare Function EnumProcessModules _
               Lib "psapi.dll" (ByVal hProcess As Long, _
                                ByRef lphModule As Long, _
                                ByVal cb As Long, _
                                ByRef cbNeeded As Long) As Long
Const MAX_PATH = 260
Public Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Public Declare Function GetModuleFileNameExA _
               Lib "psapi.dll" (ByVal hProcess As Long, _
                                ByVal hModule As Long, _
                                ByVal ModuleName As String, _
                                ByVal nSize As Long) As Long
Public Declare Function TerminateProcess _
               Lib "kernel32" (ByVal hProcess As Long, _
                               ByVal uExitCode As Long) As Long
Private Const PROCESS_QUERY_INFORMATION As Long = 1024
Private Const PROCESS_ALL_ACCESS        As Long = &H1F0FFF
Private Const PROCESS_VM_READ           As Long = &H10

 
Public Function CKRKillProcess(cProcess As String) As Long
  Dim cb                As Long
  Dim cbNeeded          As Long
  Dim NumElements       As Long
  Dim ProcessIDs()      As Long
  Dim cbNeeded2         As Long
  Dim NumElements2      As Long
  Dim Modules(1 To 200) As Long
  Dim nRet              As Long
  Dim ModuleName        As String
  Dim nSize             As Long
  Dim hProcess          As Long
  Dim i                 As Long
  Dim bTerminated       As Boolean

  If Len(Trim(cProcess)) > 0 Then
    'Get the array containing the process id's for each process object
    cb = 8
    cbNeeded = 96

    Do While cb <= cbNeeded
      cb = cb * 2
      ReDim ProcessIDs(cb / 4) As Long
      nRet = EnumProcesses(ProcessIDs(1), cb, cbNeeded)
    Loop
    NumElements = cbNeeded / 4

    For i = 1 To NumElements
      'Get a handle to the Process
      hProcess = OpenProcess(PROCESS_QUERY_INFORMATION Or PROCESS_VM_READ, 0, ProcessIDs(i))

      'Got a Process handle
      If hProcess <> 0 Then
        'Get an array of the module handles for the specified                 'process
        nRet = EnumProcessModules(hProcess, Modules(1), 200, cbNeeded2)

        'If the Module Array is retrieved, Get the ModuleFileName
        If nRet <> 0 Then
          ModuleName = Space(MAX_PATH)
          nSize = 500
          nRet = GetModuleFileNameExA(hProcess, Modules(1), ModuleName, nSize)
          ModuleName = Trim(Left(ModuleName, nRet))

          If LCase(Right(Trim(ModuleName), Len(cProcess) + 4)) = (LCase(cProcess) + ".exe") Then
            hProcess = OpenProcess(PROCESS_ALL_ACCESS, False, ProcessIDs(i))
            TerminateProcess hProcess, 0
            bTerminated = True 'flag to determine whether to continue in the loop or not

          End If

        End If

      End If

      'Close the handle to the process
      nRet = CloseHandle(hProcess)

      If bTerminated Then
        MsgBox cProcess + " Terminated"
        Exit For

      End If

    Next
    CKRKillProcess = nRet

    If Not bTerminated Then
      MsgBox cProcess + " could NOT be Terminated"

    End If

  End If

End Function

Open in new window


'Code 6 of 6
Option Explicit

Public Function CKRPadL(cString As String, _
                        nLen1 As Integer, _
                        Optional cPad As String) As String

  'this function pads a certain character to the left of s string
  'until the string is a certain length
  'cString is the string to which characters are to be pre-pended
  'nLen1 is the desired length of the new string with the pre-pended characters
  'cPad is the optional) character to be prepended. If not defined then spaces will be pre-pended
  If cPad = "" Then
    cPad = " "

  End If

  'remove all leading and trailing spaces from current string
  cString = LTrim(RTrim(cString))

  'while the working string is shorter than the desired length
  Do While Len(cString) < nLen1
    'prepend one desired character to the working string
    cString = cPad + cString
  Loop
  'return the working string as the newly created string with leading characters
  CKRPadL = cString

End Function

Open in new window

Whew!   :-)

Well, I hope this solves your problem....

It was great fun fuinding the solution (I just love doing thsio...but I don't get much sleep).

Please let me know if you have any problems...

Cheers
Chris
Oops, I have just noticed in your comments that it is not necessarily the first date found.

Provided you always know the occurrence of the date in a particular PDF format, we can adjust the module "PDFFindDateTime" to find the "nth" occurrence of a specified date. Of course the trouble is, you ill not know if the first date found is also the same as the "nth" date for which you are searching.
I suppose the code could be tweaked to find every occurrence of a date, and then on the "nth" occurrence of a date, decide whether to accept it if THAT date matched the date passed to the function.

Of course, you run across problems whent the authors of the PDF;'s change the format.

I will work on this alternative and post the other version of "PDFFindDateTime" to allow for thsi check on the "nth" occurrence of a date.

Test out the first lot of code while you wait....

Cheers
Chris
Avatar of rsburge

ASKER

Hi - I apologize, i had a family emergency and didn't have a chance to respond.  

Thank you for all of the work you have done on this.  In response to your last post; Would it work to just search through all of the dates in the PDF and stop if it finds today's date rather than looking for a specific instance of a date?
Yes, that is what it is designed to do, if so desired.

All you have to do in your code is call the function as follows:

     PDFFindDateTime(cFiles(nFile), date(), False) 'Get time in 24 hourformat

You set the "Date" to search for as todays date by specifying "date()"
(which returns the current system date).

That way you do not need to find the "nth" occurrrence of the date (unless you wanted to, of source).

I hope everything turned out OK fmily-wise.

Cheers
Chris
Avatar of rsburge

ASKER

Hi.  Thank you for all of your work on this code.  I will test it later tonight or tomorrow.  The server has been acting up today and I have to get that worked out first.

Everything with the family is good; thank you for asking.  My daughter fell of the slide at school on Friday and broke her arm in 3 places.  We spent the weekend meeting with doctors trying to figure out what needs to be done.  Tomorrow we are off to the pediatric orthopedic surgeon.

I will get back to you as soon as I can get to testing.  :)
Avatar of rsburge

ASKER

Hi - Quick question on this line

'Path where the PDF files are stored
  'Note: The executable "PDFToText.exe" must be stored in that directory also
  cPath = "F:\DEV\# EE AND IT\FINDDATETIMEINPDF\"

All of my PDF's are all stored in various folders in the following directory...

D:\aLXEPricing\"folder name"\"file Name.pdf"

If I save the PDFToText.exe in the following location, will it work as expected?

D:\aLXEPricing\PDFToText.exe
Avatar of rsburge

ASKER

Hi - I also get an error at this line in code 4 of 6...

Public Declare Function EmptyWorkingSet Lib "psapi.dll" (ByVal hProcess As Long) As Long

The error says that only comments may appear after End Sub, End Function, or End Property

And...

I am not able to locate these references...

     Visual Basic Runtime objects and procedures (this I was able to locate as msvbvm60.dll, but I can't replace the existing visual basic reference with this one)
     Visual Basic objects and Procedures
Avatar of rsburge

ASKER

I did put all of the code into one module...  Now that I think about it maybe I need to put them in separate modules?
Avatar of rsburge

ASKER

Ok...  I'm sorry for all of the follow-up comments.

I get the same compile error on all of the Public Declare Functions in code 4 of 6
Am working on answers to your questions now...stand by.
In answer to your first question.

I think it best that yo place the executable in a central location
(e.g. "C:\Program Files\PDFToText") and then alter the code as showing below.

I have made chnages to both the "Test" code (with comments to help explain things) and also to the subroutine "PDFFindDateTime" which now accepts an extra paramater.

I will post the code, then move on to your next question.


Option Explicit
Public Sub Main()
  Dim dDate  As Date   'date to be searched (normally today's date??)
  Dim cPath  As String
  Dim cFile  As String
  Dim nFile  As Integer
  Dim cRet   As String
  Dim cRet1  As String
  
  'All the PDF file names are stored in this array for TESTING only
  Dim cFiles()   As String 
  'This variable store path & name of executable text extraction file
  Dim cPDFToText As String 

  ReDim cFiles(0)

  'In this test code change the values of the next two lines to suit
  
  'Date on which search is performed 
  '(may be changed to "date()" to obtain the current date
  dDate = CDate("4/10/2011")
  'Full details of location of file "PDFToText.exe" is stored.
  cPDFToText = "C:\Program Files\PDFToText\PDFToText.exe"
    
  'The next bit of code is for test purposes only to get
  'multiple PDF's for testing. If you are passing various files
  'from various folders one file at a time simply state the
  'filename instead as per axample below
  cPath = "F:\DEV\# EE AND IT\FINDDATETIMEINPDF\"
  ChDir cPath
  cFile = Dir(cPath + "*.pdf")
  Do While cFile <> ""
    ReDim Preserve cFiles(UBound(cFiles) + 1)
    cFiles(UBound(cFiles)) = cPath + cFile
    cFile = Dir
  Loop
  'OR replace the above lines with just the name of a single
  'file:  e.g. cFile = "C:\Data\MyFiles\Test.pdf" and
  'then ignore the following lines and replace them with
  'cRet = PDFFindDateTime(cFile, dDate, cPDFToText, False) 

  For nFile = 1 To UBound(cFiles)
    'Get time in 24 hourformat
     cRet1 = PDFFindDateTime(cFiles(nFile), dDate, cPDFToText,False) 
     cRet = cRet + cFiles(nFile) + ":" + cRet1 + vbCrLf
  Next
  MsgBox Trim(cRet)
  End
End Sub

'The filename passed as "cPDFFile" to the following subroutine
'must include the full folder name for the file 
'e.g. "c:\Data\MyFiles\abc.pdf"
Public Function PDFFindDateTime(cPDFFile As String, _
                                dDate As Date, _
                                cPDFToText As String, _
                                bAMPM As Boolean) As String
  Dim sText(13) As String
  Dim x         As Integer
  Dim cLine     As String
  Dim nHandle   As Integer
  Dim bFound    As Boolean
  Dim cRun      As String
  Dim nRet      As Long
  Dim strTime   As String

  If Dir(cPDFFile) <> "" Then
    sText(1) = Format(dDate, "mmmm d, yyyy") 
    sText(2) = Format(dDate, "mmm-d-yyyy")  
    sText(3) = Format(dDate, "mm/dd/yy")  
    sText(4) = Format(dDate, "mm/dd/yyyy")
    sText(5) = Format(dDate, "m/d/yyyy")  
    sText(6) = Format(dDate, "m/d/yy")  
    sText(7) = Format(dDate, "mm_dd_yyyy")
    sText(8) = Format(dDate, "dddd, mmmm dd, yyyy")
    sText(9) = Format(dDate, "mmm-dd-yy")  
    sText(10) = Format(dDate, "mmmm dd, yyyy")
    sText(11) = Format(dDate, "mmm dd, yyyy") 
    sText(12) = Format(dDate, "mmm d, yyyy")  
    sText(13) = Format(dDate, "d-mmm-yy")  
    cRun = Chr(34) + cPDFToText + Chr(34) + " -layout " + _
           Chr(34) + cPDFFile + Chr(34) + " " + Chr(34) + _
           App.Path + "\PDF2Text.txt" + Chr(34)
    nRet = Shell(cRun, vbMinimizedNoFocus)

    'don't continue until the PDFTText.exe has finished
    Do While CKRIsRunning("PDFTOTEXT.EXE")
      DoEvents
    Loop

    If nRet > 0 Then   'The program ran successfully

      For x = 1 To UBound(sText)
        nHandle = FreeFile
        Open App.Path + "\PDF2Text.txt" For Input As #nHandle

        Do While Not EOF(nHandle)
          Line Input #nHandle, cLine

          If InStr(cLine, sText(x)) > 0 Then
            PDFFindDateTime = cLine
            bFound = True
            strTime = FindTime(nHandle, cLine, bAMPM)
            Exit Do

          End If

        Loop
        Close #nHandle

        If bFound Then
          Exit For

        End If

      Next
    Else
      MsgBox "Error: PDFToText.exe could not run!"
      End

    End If

  End If

  PDFFindDateTime = strTime

End Function

Open in new window

Please note that there was no change to the code in the subroutine "FindTime"
In the new code I have submitted you can store your file in the folder D:\aLXEPricing and you must then simply state in the code
(at line 23)

          cPDFToText = "D:\aLXEPricing\PDFToText.exe"

I suggest instead the folder "C:\Proram Files\PDFToText" though, since you may want to use that utility at other times when not using aLXEPricing.

Just an idea.  :-)

Stand by for further responses......
It probably does not matter which version of "Visual Basic For Applications" you are using. (That is what msvbvm60.dll is - VBA version 6)

The comment "code 4 of 6" is there just for your reference so you can distinguish the seperate code modules.

Please remove it (and do the same for the other modules).
You CAN place  comment code there if running VB6, but perhaps VBA has some restriction where you place comment code. (I cannot imagine why)
Comment lines all start with a single apostrophe ( ' ).

Yes, you should have these in separate module if you are running VB6.
If you are running within a VBA macro, you can combine them into the one module but you should place all "Type...End Type" declarations and "Public Declare" declarations together at the very top of the module.

On to the next question...stand by!
Oops! No more quetions!

OK, give the changes a go and see if you get the following results as I have here:

 User generated image
Avatar of rsburge

ASKER

Thank you so much for all of your responses.  I will give this all a shot and see how it goes.  Have a super day!
Avatar of rsburge

ASKER

Hi - I have made all of the changes and ran into something else...  I get a variable not defined error wherever this is...  App.Path
Avatar of rsburge

ASKER

I did try to test it even though I had the error above and I received another error...  Just hoping this will help you identify where the issue is.  

I must admit all of this code you provided is way beyond my experience level so I am not sure where to look for the problem.

In the PDFFindDateTime Function... At this line, I get an Object Required run time error...
    cRun = Chr(34) + cPDFToText + Chr(34) + " -layout " + _
           Chr(34) + cPDFFile + Chr(34) + " " + Chr(34) + _
           App.Path + "\PDF2Text.txt" + Chr(34)
Are you running this in VBA environment within a Microsoft product (such as Access or Powerpoint or Outloook or Word)? Or are you running it in VB6 programming environment?

I suspect you are running it in VBA since "App.Path" only exsts in VB6.
In VBA it is "Application.Path"

I will investigate the code under VBA and get back to you.

Cheers
Chris
Avatar of rsburge

ASKER

Hi - Yes, it is VBA in Access 2007.  Thank you!
OK....

I ran the code in VBA and had to do a few alterations. This included deleting the output text file (used by the PDFTextExtractor) if it already exists before the macro starts. (Tricky...I had to use "On error resume" code within loops unfortunately but it was the only way I could get the macro to slow down enough to successfully delete the file before it continued. (It is VERY fast)

There were a few other minor changes, so please use this new code rather than the code I supplied beofre (which only works in VB6)

The problem we had with "App.Path" occurred becuase VBA does not have the obj "App" and uses "Appliucation" instead, but THAT obkect does not have an instance variable called "Path", so I resorted instead to using "CurDir()" which simply says "the current working directory". I was only using that for the temporary text file which is deleted anyway.

Rather than list all the VBA code again, I have uploaded the code into separate modules.

You need to go into the VBA editor and then do a "File/Import" and import each of the modules. They will show in the Project Explorer (if you press Ctrl+R or click on "View/Project Explorer").

You just need to change a couple of lines in the module "TestFindingDateTimeInPDF" to suit your needs, then run the macro from within that module by pressing "F5".

If it all works (as it did at my end), then all you need to do is ensure you have all the modules in your VBA environment (except the "TestFindingDateTimeInPDF" module of course)  then make a call like this in your own code to return the time in 24 hour format:

               pdfTime = PDFFindDateTime(cFile, dDate, cPDFToText, False)
 
where:  
   cFile is the name of the file you are searching
   dDate is the date you are looking for
   cPDFToText is the folder name and filename of the executable PDFTOPText
   False (optional) says whether you want the time returned in AM/PM format
   (the default for the last paramneter is "false" so you can leave it off if you like)

e.g.  To search for todays date in a file and return the time in 24 hour format:

        cExeFile = "C:\Program Files\PDFToText\PDFToText.exe"
        cLookInFile = "D:\folder name\filename.pdf"
        'place 24 hour time found in file into variable.
        pdfTime = PDFFindDateTime(cLookInFile, date(), cExeFile)

I hope this works OK for you.

I had no problems using Access 2003 VBA  

Cheers
Chris

       FindDateTimeInPDF.zip
Avatar of rsburge

ASKER

Thank you so much for all of this!  I will get these modules imported and try again.  I am sure everything will work just fine.  :)
Avatar of rsburge

ASKER

Hi - I am getting closer.  I imported all of the modules and started testing...  I get either a File Not Found error or a Permission Denied error at the lines below...          

Kill CurDir() + "\PDF2text.txt"
Open CurDir() + "\PDF2Text.txt" For Input As #nHandle

When I did a debug, when it stopped at the first line, the file was not in the directory, so I moved back to the beginning and ran it a line at a time and it ran fine until I got to the error at the second line.  When debugging at this line, I can verify that the file is now in the proper location, but I can't get past this line when I hit play again.

This code is running on a server that is very, very fast.  Is it possible that it is running too fast (as you mention above you had to do some on errors within loops because the code runs so fast)?  As a side note, I didn't see anything in any of the modules that I imported that specifically says "On Error".
Good pick-up!

I actually upploaded the incorrect code for "PDFFindDateTime.bas" so I will recreate the zip file with the correct code shortly (am off to bridge now and will be back in about 6 hours).

Please stand by.....        
ASKER CERTIFIED SOLUTION
Avatar of Chris Raisin
Chris Raisin
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of rsburge

ASKER

Thank you so much!  I have to take my daughter back to the surgeon today, but with any luck it won't take too long and I will be able to get this fully tested today.

Have a super evening!
Avatar of rsburge

ASKER

This is really awesome!  I think it is working properly, I just need to test a few more scenarios.

I did have to change the call line to...

pdftime = PDFFindDateTime((cPDFFile), (dDate), (cPDFToText), False)

THANK YOU!!!!
Avatar of rsburge

ASKER

Everything is working great!  Thank you SO much!!!  

I do have one question that is related, but not.  :)

If you think there is an answer out there of if you know the answer, I will submit another question.

Is there a way, other than using SendKeys, to remove MetaData and Hidden Text from a PDF and/or reduce the file size?  
You will have to submit that as a seperate question.

You should award the points for this question first, then add a new question.

There is a way (tricky), but I will answer that in a seperate question.

I would not classify that new question though as related to this one, since it is dealing with a replacement for sendkeys (which is always prone to error due to the always unreliable screen interaction ) rather than this questions main theme (which was to do a text search in a PDF).

Glad it all worked out OK.

Cheers
Chris
Avatar of rsburge

ASKER

Thank you for the feedback.  I will submit a new question.  I actually have two...  One for the SendKeys stuff and one similar to this one but searching through an Excel sheet instead of a PDF (in case you are interested).
Avatar of rsburge

ASKER

This code is super awesome and extremely fast!  Thank you for all of your help!!
Thanks for the points (although the ranking of 4.2 "Average" was a little disappointing seeing how the code is running so fast for you).

Anyway, I cannot see your new questions anywhere.

I have created a filter to run to let me know when you post new questions using your userid "rsburge" but I cannot see them.

I have started work on the  "remove MetaFData" question and it looks promising. The Excel question will be interesting too. I look forward to your posting of the questions.

Cheers
Chris - Melbourne
(craisin)
Avatar of rsburge

ASKER

I actually gave you an "A" and the full 500 points on the last post with the zip file.  I would have given you an A++++++ and a gazillion points if I cod have.  

I accepted the solution from my phone, maybe I did something wrong.  I will ask for help to fix that.

I haven't posted the two new questions, I will be home in a few hours and will post the questions then.

Thank you again for all of your help!
No worries...I supposed that was the reason. Perhaps you cannot easily see how to grade an answer using your iPhone. I suspect by default that if a grading is not given then it defaults to "Avergae" (4.5 out of 9)

I look forward to your questions.

No need to leave me a message. I have a special filter set up to ook for any questions you have and I will always help if I can.

Thanks again.

Cheers
Chris