Link to home
Start Free TrialLog in
Avatar of HopperSI
HopperSI

asked on

Script to compare existing filenames to filenames in a text file AND MORE

I have a directory full of PDFs and subdirectories of PDFs that need to be "processed" by an application which can run from the command line.  Once a PDF is "processed" once, it never needs to be processed again.  Running the application on that directory will re-process all the PDFs that have already been processed.  I'm looking to run a batch file or VBScript (or combination) that will tell this application to run (which will "process" all the PDFs), then output all those filenames to a text file.  The script can then read the filenames out of that text file, skip those files, and execute the application on the PDFs whose filenames are not in that text file, then add the new filenames to that text file.  This way I can avoid the time is takes to re-process already processed files.

The VBScript below (taken from Hey Scripting Guy) successfully outputs existing filenames to a text file.  
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
strFolderName = "C:\docs"
 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.CreateTextFile("C:\docs\list.txt")
 
Set colSubfolders = objWMIService.ExecQuery _
    ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
        & "Where AssocClass = Win32_Subdirectory " _
            & "ResultRole = PartComponent")
 
Set colFiles = objWMIService.ExecQuery _
    ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
        & "ResultClass = CIM_DataFile")
 
For Each objFile in colFiles
    If objFile.Extension = "pdf" Then
        objTextFile.WriteLine objFile.FileName 
    End If
Next
 
For Each objFolder in colSubfolders
    GetSubFolders strFolderName
Next
 
Sub GetSubFolders(strFolderName)
 
    Set colSubfolders2 = objWMIService.ExecQuery _
        ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
            & "Where AssocClass = Win32_Subdirectory " _
                & "ResultRole = PartComponent")
 
    For Each objFolder2 in colSubfolders2
        strFolderName = objFolder2.Name
 
    Set colFiles = objWMIService.ExecQuery _
        ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
            & "ResultClass = CIM_DataFile")
 
    For Each objFile in colFiles
        If objFile.Extension = "pdf" Then
            objTextFile.WriteLine objFile.FileName 
        End If
    Next
 
        GetSubFolders strFolderName
    Next
End Sub

Open in new window

Avatar of t0t0
t0t0
Flag of United Kingdom of Great Britain and Northern Ireland image

Are the files in the sub-folder being processed too?
Avatar of HopperSI
HopperSI

ASKER

Yes.  They are mostly in subfolders.
So the batch file needs to 'walk' through all the sub-folders and process only those files which have not yet been processed - yes?

So we would also need to include the file's path in the text file in case of duplicate filenames - or is this not a worry?

Rather than using an 'exclude' file, is it possible to rename the file itself (slightly) so that we could use this as a flag instead? - Perhaps we could add the date processed to the filename.

How about the file's archive attributes - can we use that instead of are these likely to be changed by a backup program you run?



The PDFs will be saved by a document management application which assigns unique numbers as the filenames---duplicate filenames are not an issue, but filename cannot be changed.  

The archive attribute will change with backups and also if/when a PDF is edited.

All valid ideas though, which I have dismissed for the above reasons.  I've also toyed with the idea of actively monitoring the top level directory with a VBscript to open any new files automatically (in a batch file pointing to my "processing" application).  This eliminate the need for the above script, nice all new files would be handled "on-the-fly."  I am unable the modify the VBscript below to include subfolders though.  If you can that would be a huge step for me.  (script below also taken from Hey Scripting Guy)
Set objShell = CreateObject("Wscript.Shell")
 
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
Set colMonitoredEvents = objWMIService.ExecNotificationQuery _
    ("SELECT * FROM __InstanceCreationEvent WITHIN 5 WHERE " _
        & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _
            & "TargetInstance.GroupComponent= " _
                & "'Win32_Directory.Name=""c:\\\\test""'")
 
Do
    Set objLatestEvent = colMonitoredEvents.NextEvent
    strNewFile = objLatestEvent.TargetInstance.PartComponent
    arrNewFile = Split(strNewFile, "=")
    strFileName = arrNewFile(1)
    strFileName = Replace(strFileName, "\\", "\")
    strFileName = Replace(strFileName, Chr(34), "")
    objShell.Run("notepad.exe " & strFileName)
Loop

Open in new window

Copy and paste the following code into Notepad and save it as PROCPDF.BAT in the parent folder where your PDF files reside.

The batch file travrses the sub-folders and processing unprocessed files as well as adding their filenams to the file PROCESSED.TXT. This file will automatically be created if it doesn't already exist.

The only line you need to chage is:

   echo Unprocessed File: "%%~nxa"

Replace this line to run your application. The "%%~nxa" is the name of the unprocessed file you need to pass to your application as in the following example:

   YouApplication "%%~nxa"

Trusting this meets your need.


@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s *.bat') do (
   find /i "%%~nxa" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%~nxa"
      echo %%a >>processed.txt
   )
)
That looks solid.  Seems simpler that what I was envisioning, which is why I reached out for the experts in the first place.

...and what if the directory with the files is on a machine that does not have the application....can I use a mapped drive or a UNC path with that script?
Avatar of AmazingTech
Excellent!
But I think echo Unprocessed file should be "%%a" and the line below should be echo "%%~nxa"
ASKER CERTIFIED SOLUTION
Avatar of t0t0
t0t0
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I wouldn't need a list of computers, it's only one computer, but the application doesn't exist on the same computer as the directory.

So the final script would look like this?

@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s \\computername\\d:\docs\*.PDF') do (
   find /i "%%a" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%a"
      echo %%a >>processed.txt
   )
)
)

Shouldn't that be something along the following lines:

   :
   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d$\docs\*.PDF"') do (
   :

also, you need to remove the extra ')' (closing bracket) at the end of your code.
Or....

   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d\docs\*.PDF"') do (

anyway, whiever it is, the ':' (colon) shouldn't be there.
Well, there's a surprise for you. Thank you. And just to show you my appreciation check out the following suggestion.

There may come a time when some PDF files are either deleted or moved from your folders however, their entry will remain in the PROCESSED.TXT file. Ideally, these 'dead' entreis should be removed from the PROCESSED.TXT file thereby keeping the file as small as possible to help speed up PROCPDF.BAT. The follow batch file will do this for you.

Copy and paste the following code into Notepad and save it as SYNCPDF.BAT in the same folder as PROCESSED.TXT.


@echo off
del processed.tmp 2>nul
for /f "tokens=*" %%a in ('type processed.txt') do (
   if exist "%%a" echo %%a>>processed.tmp
)
move /y processed.tmp processed.txt