Solved

Script to compare existing filenames to filenames in a text file AND MORE

Posted on 2009-06-30
12
560 Views
Last Modified: 2012-05-07
I have a directory full of PDFs and subdirectories of PDFs that need to be "processed" by an application which can run from the command line.  Once a PDF is "processed" once, it never needs to be processed again.  Running the application on that directory will re-process all the PDFs that have already been processed.  I'm looking to run a batch file or VBScript (or combination) that will tell this application to run (which will "process" all the PDFs), then output all those filenames to a text file.  The script can then read the filenames out of that text file, skip those files, and execute the application on the PDFs whose filenames are not in that text file, then add the new filenames to that text file.  This way I can avoid the time is takes to re-process already processed files.

The VBScript below (taken from Hey Scripting Guy) successfully outputs existing filenames to a text file.  
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
strFolderName = "C:\docs"
 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.CreateTextFile("C:\docs\list.txt")
 
Set colSubfolders = objWMIService.ExecQuery _
    ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
        & "Where AssocClass = Win32_Subdirectory " _
            & "ResultRole = PartComponent")
 
Set colFiles = objWMIService.ExecQuery _
    ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
        & "ResultClass = CIM_DataFile")
 
For Each objFile in colFiles
    If objFile.Extension = "pdf" Then
        objTextFile.WriteLine objFile.FileName 
    End If
Next
 
For Each objFolder in colSubfolders
    GetSubFolders strFolderName
Next
 
Sub GetSubFolders(strFolderName)
 
    Set colSubfolders2 = objWMIService.ExecQuery _
        ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _
            & "Where AssocClass = Win32_Subdirectory " _
                & "ResultRole = PartComponent")
 
    For Each objFolder2 in colSubfolders2
        strFolderName = objFolder2.Name
 
    Set colFiles = objWMIService.ExecQuery _
        ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _
            & "ResultClass = CIM_DataFile")
 
    For Each objFile in colFiles
        If objFile.Extension = "pdf" Then
            objTextFile.WriteLine objFile.FileName 
        End If
    Next
 
        GetSubFolders strFolderName
    Next
End Sub

Open in new window

0
Comment
Question by:HopperSI
  • 7
  • 4
12 Comments
 
LVL 16

Expert Comment

by:t0t0
ID: 24750176
Are the files in the sub-folder being processed too?
0
 

Author Comment

by:HopperSI
ID: 24750183
Yes.  They are mostly in subfolders.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24750242
So the batch file needs to 'walk' through all the sub-folders and process only those files which have not yet been processed - yes?

So we would also need to include the file's path in the text file in case of duplicate filenames - or is this not a worry?

Rather than using an 'exclude' file, is it possible to rename the file itself (slightly) so that we could use this as a flag instead? - Perhaps we could add the date processed to the filename.

How about the file's archive attributes - can we use that instead of are these likely to be changed by a backup program you run?



0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Author Comment

by:HopperSI
ID: 24750263
The PDFs will be saved by a document management application which assigns unique numbers as the filenames---duplicate filenames are not an issue, but filename cannot be changed.  

The archive attribute will change with backups and also if/when a PDF is edited.

All valid ideas though, which I have dismissed for the above reasons.  I've also toyed with the idea of actively monitoring the top level directory with a VBscript to open any new files automatically (in a batch file pointing to my "processing" application).  This eliminate the need for the above script, nice all new files would be handled "on-the-fly."  I am unable the modify the VBscript below to include subfolders though.  If you can that would be a huge step for me.  (script below also taken from Hey Scripting Guy)
Set objShell = CreateObject("Wscript.Shell")
 
strComputer = "."
 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 
Set colMonitoredEvents = objWMIService.ExecNotificationQuery _
    ("SELECT * FROM __InstanceCreationEvent WITHIN 5 WHERE " _
        & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _
            & "TargetInstance.GroupComponent= " _
                & "'Win32_Directory.Name=""c:\\\\test""'")
 
Do
    Set objLatestEvent = colMonitoredEvents.NextEvent
    strNewFile = objLatestEvent.TargetInstance.PartComponent
    arrNewFile = Split(strNewFile, "=")
    strFileName = arrNewFile(1)
    strFileName = Replace(strFileName, "\\", "\")
    strFileName = Replace(strFileName, Chr(34), "")
    objShell.Run("notepad.exe " & strFileName)
Loop

Open in new window

0
 
LVL 16

Expert Comment

by:t0t0
ID: 24750441
Copy and paste the following code into Notepad and save it as PROCPDF.BAT in the parent folder where your PDF files reside.

The batch file travrses the sub-folders and processing unprocessed files as well as adding their filenams to the file PROCESSED.TXT. This file will automatically be created if it doesn't already exist.

The only line you need to chage is:

   echo Unprocessed File: "%%~nxa"

Replace this line to run your application. The "%%~nxa" is the name of the unprocessed file you need to pass to your application as in the following example:

   YouApplication "%%~nxa"

Trusting this meets your need.


@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s *.bat') do (
   find /i "%%~nxa" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%~nxa"
      echo %%a >>processed.txt
   )
)
0
 

Author Comment

by:HopperSI
ID: 24750571
That looks solid.  Seems simpler that what I was envisioning, which is why I reached out for the experts in the first place.

...and what if the directory with the files is on a machine that does not have the application....can I use a mapped drive or a UNC path with that script?
0
 
LVL 21

Expert Comment

by:AmazingTech
ID: 24752560
Excellent!
But I think echo Unprocessed file should be "%%a" and the line below should be echo "%%~nxa"
0
 
LVL 16

Accepted Solution

by:
t0t0 earned 500 total points
ID: 24752764
AmazingTech

Thank you. It always means a lot to me when you comment on my code - especially when you use words like "Excellent!"

I full agree regarding the %%~nxa. I only wanted to display the actual filename there however, on reflection, it might have been wiser had I included the full path as well as in %%a.

HopperSI

Yes, you can adapt this to include files on remote PCs. You will need a text file containing a list of computer names say, COMPUTERS.TXT and include a 'container' FOR loop as in the following example:

   @echo off
   setlocal enabledelayedexpansion
   if not exist processed.txt echo nul >processed.txt
   FOR /F %%p IN (Computers.txt) DO (

Then you add the previous code as in:

      for /f "tokens=*" %%a in ('dir /a-d /b /s \\%%p\%Source%\*.PDF') do (

Here, I've changed the filespec to \\ (which means networked PC), %%p (which is the networked PC's name), %Source% (which is the parent folder containing your PDF files and sub-folders) and *.PDF itself (the PDF files themselves).

You're going to have to either include a line near the start defining %Soiurce% as in the following example:

   SET Source=C$\Text Files

or replace the '%Source%' (in the FOR line above) with the actual pathname itself.

If the PDF files are in different locations (I'm refering to the parent folder here) on diferent PCs then please let me know because this will effect the structure of COMPUTERS.TXT file and how we read in our data from that it.

Then, as before, with only slight changes:

      find /i "%%a" processed.txt >nul
      if !errorlevel!==1 (
         echo Unprocessed File: "%%a"
         echo %%a >>processed.txt
      )
   )

And a final ')' to close the containing FOR:

)

I don't have a network here to test this on so if there are problems please get back to us for further assistance.

0
 

Author Comment

by:HopperSI
ID: 24774096
I wouldn't need a list of computers, it's only one computer, but the application doesn't exist on the same computer as the directory.

So the final script would look like this?

@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s \\computername\\d:\docs\*.PDF') do (
   find /i "%%a" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%a"
      echo %%a >>processed.txt
   )
)
)

0
 
LVL 16

Expert Comment

by:t0t0
ID: 24777962
Shouldn't that be something along the following lines:

   :
   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d$\docs\*.PDF"') do (
   :

also, you need to remove the extra ')' (closing bracket) at the end of your code.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24777968
Or....

   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d\docs\*.PDF"') do (

anyway, whiever it is, the ':' (colon) shouldn't be there.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24789373
Well, there's a surprise for you. Thank you. And just to show you my appreciation check out the following suggestion.

There may come a time when some PDF files are either deleted or moved from your folders however, their entry will remain in the PROCESSED.TXT file. Ideally, these 'dead' entreis should be removed from the PROCESSED.TXT file thereby keeping the file as small as possible to help speed up PROCPDF.BAT. The follow batch file will do this for you.

Copy and paste the following code into Notepad and save it as SYNCPDF.BAT in the same folder as PROCESSED.TXT.


@echo off
del processed.tmp 2>nul
for /f "tokens=*" %%a in ('type processed.txt') do (
   if exist "%%a" echo %%a>>processed.tmp
)
move /y processed.tmp processed.txt
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article was inspired by a question here at Experts Exchange (http://www.experts-exchange.com/Software/Photos_Graphics/Images_and_Photos/Q_28629170.html). The requirements stated in that question are (1) reduce the file size of a large number of…
I have published numerous articles here at Experts Exchange that present programs/scripts written in a language called AutoHotkey. Each of those articles has a brief paragraph describing where to download the product and how to install it. I have al…
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question