We help IT Professionals succeed at work.

Script to compare existing filenames to filenames in a text file AND MORE

HopperSI
HopperSI asked
on
Medium Priority
604 Views
Last Modified: 2012-05-07
I have a directory full of PDFs and subdirectories of PDFs that need to be "processed" by an application which can run from the command line.  Once a PDF is "processed" once, it never needs to be processed again.  Running the application on that directory will re-process all the PDFs that have already been processed.  I'm looking to run a batch file or VBScript (or combination) that will tell this application to run (which will "process" all the PDFs), then output all those filenames to a text file.  The script can then read the filenames out of that text file, skip those files, and execute the application on the PDFs whose filenames are not in that text file, then add the new filenames to that text file.  This way I can avoid the time is takes to re-process already processed files.

The VBScript below (taken from Hey Scripting Guy) successfully outputs existing filenames to a text file.  
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
strFolderName = "C:\docs"
 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.CreateTextFile("C:\docs\list.txt")
 
Set colSubfolders = objWMIService.ExecQuery _
    ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
        & "Where AssocClass = Win32_Subdirectory " _
            & "ResultRole = PartComponent")
 
Set colFiles = objWMIService.ExecQuery _
    ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
        & "ResultClass = CIM_DataFile")
 
For Each objFile in colFiles
    If objFile.Extension = "pdf" Then
        objTextFile.WriteLine objFile.FileName 
    End If
Next
 
For Each objFolder in colSubfolders
    GetSubFolders strFolderName
Next
 
Sub GetSubFolders(strFolderName)
 
    Set colSubfolders2 = objWMIService.ExecQuery _
        ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
            & "Where AssocClass = Win32_Subdirectory " _
                & "ResultRole = PartComponent")
 
    For Each objFolder2 in colSubfolders2
        strFolderName = objFolder2.Name
 
    Set colFiles = objWMIService.ExecQuery _
        ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
            & "ResultClass = CIM_DataFile")
 
    For Each objFile in colFiles
        If objFile.Extension = "pdf" Then
            objTextFile.WriteLine objFile.FileName 
        End If
    Next
 
        GetSubFolders strFolderName
    Next
End Sub

Open in new window

Comment
Watch Question

Commented:
Are the files in the sub-folder being processed too?

Author

Commented:
Yes.  They are mostly in subfolders.

Commented:
So the batch file needs to 'walk' through all the sub-folders and process only those files which have not yet been processed - yes?

So we would also need to include the file's path in the text file in case of duplicate filenames - or is this not a worry?

Rather than using an 'exclude' file, is it possible to rename the file itself (slightly) so that we could use this as a flag instead? - Perhaps we could add the date processed to the filename.

How about the file's archive attributes - can we use that instead of are these likely to be changed by a backup program you run?



Author

Commented:
The PDFs will be saved by a document management application which assigns unique numbers as the filenames---duplicate filenames are not an issue, but filename cannot be changed.  

The archive attribute will change with backups and also if/when a PDF is edited.

All valid ideas though, which I have dismissed for the above reasons.  I've also toyed with the idea of actively monitoring the top level directory with a VBscript to open any new files automatically (in a batch file pointing to my "processing" application).  This eliminate the need for the above script, nice all new files would be handled "on-the-fly."  I am unable the modify the VBscript below to include subfolders though.  If you can that would be a huge step for me.  (script below also taken from Hey Scripting Guy)
Set objShell = CreateObject("Wscript.Shell")
 
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
Set colMonitoredEvents = objWMIService.ExecNotificationQuery _
    ("SELECT * FROM __InstanceCreationEvent WITHIN 5 WHERE " _
        & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _
            & "TargetInstance.GroupComponent= " _
                & "'Win32_Directory.Name=""c:\\\\test""'")
 
Do
    Set objLatestEvent = colMonitoredEvents.NextEvent
    strNewFile = objLatestEvent.TargetInstance.PartComponent
    arrNewFile = Split(strNewFile, "=")
    strFileName = arrNewFile(1)
    strFileName = Replace(strFileName, "\\", "\")
    strFileName = Replace(strFileName, Chr(34), "")
    objShell.Run("notepad.exe " & strFileName)
Loop

Open in new window

Commented:
Copy and paste the following code into Notepad and save it as PROCPDF.BAT in the parent folder where your PDF files reside.

The batch file travrses the sub-folders and processing unprocessed files as well as adding their filenams to the file PROCESSED.TXT. This file will automatically be created if it doesn't already exist.

The only line you need to chage is:

   echo Unprocessed File: "%%~nxa"

Replace this line to run your application. The "%%~nxa" is the name of the unprocessed file you need to pass to your application as in the following example:

   YouApplication "%%~nxa"

Trusting this meets your need.


@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s *.bat') do (
   find /i "%%~nxa" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%~nxa"
      echo %%a >>processed.txt
   )
)

Author

Commented:
That looks solid.  Seems simpler that what I was envisioning, which is why I reached out for the experts in the first place.

...and what if the directory with the files is on a machine that does not have the application....can I use a mapped drive or a UNC path with that script?
Top Expert 2009

Commented:
Excellent!
But I think echo Unprocessed file should be "%%a" and the line below should be echo "%%~nxa"
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
I wouldn't need a list of computers, it's only one computer, but the application doesn't exist on the same computer as the directory.

So the final script would look like this?

@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s \\computername\\d:\docs\*.PDF') do (
   find /i "%%a" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%a"
      echo %%a >>processed.txt
   )
)
)

Commented:
Shouldn't that be something along the following lines:

   :
   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d$\docs\*.PDF"') do (
   :

also, you need to remove the extra ')' (closing bracket) at the end of your code.

Commented:
Or....

   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d\docs\*.PDF"') do (

anyway, whiever it is, the ':' (colon) shouldn't be there.

Commented:
Well, there's a surprise for you. Thank you. And just to show you my appreciation check out the following suggestion.

There may come a time when some PDF files are either deleted or moved from your folders however, their entry will remain in the PROCESSED.TXT file. Ideally, these 'dead' entreis should be removed from the PROCESSED.TXT file thereby keeping the file as small as possible to help speed up PROCPDF.BAT. The follow batch file will do this for you.

Copy and paste the following code into Notepad and save it as SYNCPDF.BAT in the same folder as PROCESSED.TXT.


@echo off
del processed.tmp 2>nul
for /f "tokens=*" %%a in ('type processed.txt') do (
   if exist "%%a" echo %%a>>processed.tmp
)
move /y processed.tmp processed.txt
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.