Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 570
  • Last Modified:

Script to compare existing filenames to filenames in a text file AND MORE

I have a directory full of PDFs and subdirectories of PDFs that need to be "processed" by an application which can run from the command line.  Once a PDF is "processed" once, it never needs to be processed again.  Running the application on that directory will re-process all the PDFs that have already been processed.  I'm looking to run a batch file or VBScript (or combination) that will tell this application to run (which will "process" all the PDFs), then output all those filenames to a text file.  The script can then read the filenames out of that text file, skip those files, and execute the application on the PDFs whose filenames are not in that text file, then add the new filenames to that text file.  This way I can avoid the time is takes to re-process already processed files.

The VBScript below (taken from Hey Scripting Guy) successfully outputs existing filenames to a text file.  
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
strFolderName = "C:\docs"
 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.CreateTextFile("C:\docs\list.txt")
 
Set colSubfolders = objWMIService.ExecQuery _
    ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
        & "Where AssocClass = Win32_Subdirectory " _
            & "ResultRole = PartComponent")
 
Set colFiles = objWMIService.ExecQuery _
    ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
        & "ResultClass = CIM_DataFile")
 
For Each objFile in colFiles
    If objFile.Extension = "pdf" Then
        objTextFile.WriteLine objFile.FileName 
    End If
Next
 
For Each objFolder in colSubfolders
    GetSubFolders strFolderName
Next
 
Sub GetSubFolders(strFolderName)
 
    Set colSubfolders2 = objWMIService.ExecQuery _
        ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
            & "Where AssocClass = Win32_Subdirectory " _
                & "ResultRole = PartComponent")
 
    For Each objFolder2 in colSubfolders2
        strFolderName = objFolder2.Name
 
    Set colFiles = objWMIService.ExecQuery _
        ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
            & "ResultClass = CIM_DataFile")
 
    For Each objFile in colFiles
        If objFile.Extension = "pdf" Then
            objTextFile.WriteLine objFile.FileName 
        End If
    Next
 
        GetSubFolders strFolderName
    Next
End Sub

Open in new window

0
HopperSI
Asked:
HopperSI
  • 7
  • 4
1 Solution
 
t0t0Commented:
Are the files in the sub-folder being processed too?
0
 
HopperSIAuthor Commented:
Yes.  They are mostly in subfolders.
0
 
t0t0Commented:
So the batch file needs to 'walk' through all the sub-folders and process only those files which have not yet been processed - yes?

So we would also need to include the file's path in the text file in case of duplicate filenames - or is this not a worry?

Rather than using an 'exclude' file, is it possible to rename the file itself (slightly) so that we could use this as a flag instead? - Perhaps we could add the date processed to the filename.

How about the file's archive attributes - can we use that instead of are these likely to be changed by a backup program you run?



0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
HopperSIAuthor Commented:
The PDFs will be saved by a document management application which assigns unique numbers as the filenames---duplicate filenames are not an issue, but filename cannot be changed.  

The archive attribute will change with backups and also if/when a PDF is edited.

All valid ideas though, which I have dismissed for the above reasons.  I've also toyed with the idea of actively monitoring the top level directory with a VBscript to open any new files automatically (in a batch file pointing to my "processing" application).  This eliminate the need for the above script, nice all new files would be handled "on-the-fly."  I am unable the modify the VBscript below to include subfolders though.  If you can that would be a huge step for me.  (script below also taken from Hey Scripting Guy)
Set objShell = CreateObject("Wscript.Shell")
 
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
Set colMonitoredEvents = objWMIService.ExecNotificationQuery _
    ("SELECT * FROM __InstanceCreationEvent WITHIN 5 WHERE " _
        & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _
            & "TargetInstance.GroupComponent= " _
                & "'Win32_Directory.Name=""c:\\\\test""'")
 
Do
    Set objLatestEvent = colMonitoredEvents.NextEvent
    strNewFile = objLatestEvent.TargetInstance.PartComponent
    arrNewFile = Split(strNewFile, "=")
    strFileName = arrNewFile(1)
    strFileName = Replace(strFileName, "\\", "\")
    strFileName = Replace(strFileName, Chr(34), "")
    objShell.Run("notepad.exe " & strFileName)
Loop

Open in new window

0
 
t0t0Commented:
Copy and paste the following code into Notepad and save it as PROCPDF.BAT in the parent folder where your PDF files reside.

The batch file travrses the sub-folders and processing unprocessed files as well as adding their filenams to the file PROCESSED.TXT. This file will automatically be created if it doesn't already exist.

The only line you need to chage is:

   echo Unprocessed File: "%%~nxa"

Replace this line to run your application. The "%%~nxa" is the name of the unprocessed file you need to pass to your application as in the following example:

   YouApplication "%%~nxa"

Trusting this meets your need.


@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s *.bat') do (
   find /i "%%~nxa" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%~nxa"
      echo %%a >>processed.txt
   )
)
0
 
HopperSIAuthor Commented:
That looks solid.  Seems simpler that what I was envisioning, which is why I reached out for the experts in the first place.

...and what if the directory with the files is on a machine that does not have the application....can I use a mapped drive or a UNC path with that script?
0
 
AmazingTechCommented:
Excellent!
But I think echo Unprocessed file should be "%%a" and the line below should be echo "%%~nxa"
0
 
t0t0Commented:
AmazingTech

Thank you. It always means a lot to me when you comment on my code - especially when you use words like "Excellent!"

I full agree regarding the %%~nxa. I only wanted to display the actual filename there however, on reflection, it might have been wiser had I included the full path as well as in %%a.

HopperSI

Yes, you can adapt this to include files on remote PCs. You will need a text file containing a list of computer names say, COMPUTERS.TXT and include a 'container' FOR loop as in the following example:

   @echo off
   setlocal enabledelayedexpansion
   if not exist processed.txt echo nul >processed.txt
   FOR /F %%p IN (Computers.txt) DO (

Then you add the previous code as in:

      for /f "tokens=*" %%a in ('dir /a-d /b /s \\%%p\%Source%\*.PDF') do (

Here, I've changed the filespec to \\ (which means networked PC), %%p (which is the networked PC's name), %Source% (which is the parent folder containing your PDF files and sub-folders) and *.PDF itself (the PDF files themselves).

You're going to have to either include a line near the start defining %Soiurce% as in the following example:

   SET Source=C$\Text Files

or replace the '%Source%' (in the FOR line above) with the actual pathname itself.

If the PDF files are in different locations (I'm refering to the parent folder here) on diferent PCs then please let me know because this will effect the structure of COMPUTERS.TXT file and how we read in our data from that it.

Then, as before, with only slight changes:

      find /i "%%a" processed.txt >nul
      if !errorlevel!==1 (
         echo Unprocessed File: "%%a"
         echo %%a >>processed.txt
      )
   )

And a final ')' to close the containing FOR:

)

I don't have a network here to test this on so if there are problems please get back to us for further assistance.

0
 
HopperSIAuthor Commented:
I wouldn't need a list of computers, it's only one computer, but the application doesn't exist on the same computer as the directory.

So the final script would look like this?

@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s \\computername\\d:\docs\*.PDF') do (
   find /i "%%a" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%a"
      echo %%a >>processed.txt
   )
)
)

0
 
t0t0Commented:
Shouldn't that be something along the following lines:

   :
   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d$\docs\*.PDF"') do (
   :

also, you need to remove the extra ')' (closing bracket) at the end of your code.
0
 
t0t0Commented:
Or....

   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d\docs\*.PDF"') do (

anyway, whiever it is, the ':' (colon) shouldn't be there.
0
 
t0t0Commented:
Well, there's a surprise for you. Thank you. And just to show you my appreciation check out the following suggestion.

There may come a time when some PDF files are either deleted or moved from your folders however, their entry will remain in the PROCESSED.TXT file. Ideally, these 'dead' entreis should be removed from the PROCESSED.TXT file thereby keeping the file as small as possible to help speed up PROCPDF.BAT. The follow batch file will do this for you.

Copy and paste the following code into Notepad and save it as SYNCPDF.BAT in the same folder as PROCESSED.TXT.


@echo off
del processed.tmp 2>nul
for /f "tokens=*" %%a in ('type processed.txt') do (
   if exist "%%a" echo %%a>>processed.tmp
)
move /y processed.tmp processed.txt
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 7
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now