Solved

Script to compare existing filenames to filenames in a text file AND MORE

Posted on 2009-06-30
12
557 Views
Last Modified: 2012-05-07
I have a directory full of PDFs and subdirectories of PDFs that need to be "processed" by an application which can run from the command line.  Once a PDF is "processed" once, it never needs to be processed again.  Running the application on that directory will re-process all the PDFs that have already been processed.  I'm looking to run a batch file or VBScript (or combination) that will tell this application to run (which will "process" all the PDFs), then output all those filenames to a text file.  The script can then read the filenames out of that text file, skip those files, and execute the application on the PDFs whose filenames are not in that text file, then add the new filenames to that text file.  This way I can avoid the time is takes to re-process already processed files.

The VBScript below (taken from Hey Scripting Guy) successfully outputs existing filenames to a text file.  
strComputer = "."
 

Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 

strFolderName = "C:\docs"
 

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objTextFile = objFSO.CreateTextFile("C:\docs\list.txt")
 

Set colSubfolders = objWMIService.ExecQuery _

    ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _

        & "Where AssocClass = Win32_Subdirectory " _

            & "ResultRole = PartComponent")
 

Set colFiles = objWMIService.ExecQuery _

    ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _

        & "ResultClass = CIM_DataFile")
 

For Each objFile in colFiles

    If objFile.Extension = "pdf" Then

        objTextFile.WriteLine objFile.FileName 

    End If

Next
 

For Each objFolder in colSubfolders

    GetSubFolders strFolderName

Next
 

Sub GetSubFolders(strFolderName)
 

    Set colSubfolders2 = objWMIService.ExecQuery _

        ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _

            & "Where AssocClass = Win32_Subdirectory " _

                & "ResultRole = PartComponent")
 

    For Each objFolder2 in colSubfolders2

        strFolderName = objFolder2.Name
 

    Set colFiles = objWMIService.ExecQuery _

        ("ASSOCIATORS OF {Win32_Directory.Name='" & strFolderName & "'} Where " _

            & "ResultClass = CIM_DataFile")
 

    For Each objFile in colFiles

        If objFile.Extension = "pdf" Then

            objTextFile.WriteLine objFile.FileName 

        End If

    Next
 

        GetSubFolders strFolderName

    Next

End Sub

Open in new window

0
Comment
Question by:HopperSI
  • 7
  • 4
12 Comments
 
LVL 16

Expert Comment

by:t0t0
ID: 24750176
Are the files in the sub-folder being processed too?
0
 

Author Comment

by:HopperSI
ID: 24750183
Yes.  They are mostly in subfolders.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24750242
So the batch file needs to 'walk' through all the sub-folders and process only those files which have not yet been processed - yes?

So we would also need to include the file's path in the text file in case of duplicate filenames - or is this not a worry?

Rather than using an 'exclude' file, is it possible to rename the file itself (slightly) so that we could use this as a flag instead? - Perhaps we could add the date processed to the filename.

How about the file's archive attributes - can we use that instead of are these likely to be changed by a backup program you run?



0
 

Author Comment

by:HopperSI
ID: 24750263
The PDFs will be saved by a document management application which assigns unique numbers as the filenames---duplicate filenames are not an issue, but filename cannot be changed.  

The archive attribute will change with backups and also if/when a PDF is edited.

All valid ideas though, which I have dismissed for the above reasons.  I've also toyed with the idea of actively monitoring the top level directory with a VBscript to open any new files automatically (in a batch file pointing to my "processing" application).  This eliminate the need for the above script, nice all new files would be handled "on-the-fly."  I am unable the modify the VBscript below to include subfolders though.  If you can that would be a huge step for me.  (script below also taken from Hey Scripting Guy)
Set objShell = CreateObject("Wscript.Shell")
 

strComputer = "."
 

Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
 

Set colMonitoredEvents = objWMIService.ExecNotificationQuery _

    ("SELECT * FROM __InstanceCreationEvent WITHIN 5 WHERE " _

        & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _

            & "TargetInstance.GroupComponent= " _

                & "'Win32_Directory.Name=""c:\\\\test""'")
 

Do

    Set objLatestEvent = colMonitoredEvents.NextEvent

    strNewFile = objLatestEvent.TargetInstance.PartComponent

    arrNewFile = Split(strNewFile, "=")

    strFileName = arrNewFile(1)

    strFileName = Replace(strFileName, "\\", "\")

    strFileName = Replace(strFileName, Chr(34), "")

    objShell.Run("notepad.exe " & strFileName)

Loop

Open in new window

0
 
LVL 16

Expert Comment

by:t0t0
ID: 24750441
Copy and paste the following code into Notepad and save it as PROCPDF.BAT in the parent folder where your PDF files reside.

The batch file travrses the sub-folders and processing unprocessed files as well as adding their filenams to the file PROCESSED.TXT. This file will automatically be created if it doesn't already exist.

The only line you need to chage is:

   echo Unprocessed File: "%%~nxa"

Replace this line to run your application. The "%%~nxa" is the name of the unprocessed file you need to pass to your application as in the following example:

   YouApplication "%%~nxa"

Trusting this meets your need.


@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s *.bat') do (
   find /i "%%~nxa" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%~nxa"
      echo %%a >>processed.txt
   )
)
0
 

Author Comment

by:HopperSI
ID: 24750571
That looks solid.  Seems simpler that what I was envisioning, which is why I reached out for the experts in the first place.

...and what if the directory with the files is on a machine that does not have the application....can I use a mapped drive or a UNC path with that script?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 21

Expert Comment

by:AmazingTech
ID: 24752560
Excellent!
But I think echo Unprocessed file should be "%%a" and the line below should be echo "%%~nxa"
0
 
LVL 16

Accepted Solution

by:
t0t0 earned 500 total points
ID: 24752764
AmazingTech

Thank you. It always means a lot to me when you comment on my code - especially when you use words like "Excellent!"

I full agree regarding the %%~nxa. I only wanted to display the actual filename there however, on reflection, it might have been wiser had I included the full path as well as in %%a.

HopperSI

Yes, you can adapt this to include files on remote PCs. You will need a text file containing a list of computer names say, COMPUTERS.TXT and include a 'container' FOR loop as in the following example:

   @echo off
   setlocal enabledelayedexpansion
   if not exist processed.txt echo nul >processed.txt
   FOR /F %%p IN (Computers.txt) DO (

Then you add the previous code as in:

      for /f "tokens=*" %%a in ('dir /a-d /b /s \\%%p\%Source%\*.PDF') do (

Here, I've changed the filespec to \\ (which means networked PC), %%p (which is the networked PC's name), %Source% (which is the parent folder containing your PDF files and sub-folders) and *.PDF itself (the PDF files themselves).

You're going to have to either include a line near the start defining %Soiurce% as in the following example:

   SET Source=C$\Text Files

or replace the '%Source%' (in the FOR line above) with the actual pathname itself.

If the PDF files are in different locations (I'm refering to the parent folder here) on diferent PCs then please let me know because this will effect the structure of COMPUTERS.TXT file and how we read in our data from that it.

Then, as before, with only slight changes:

      find /i "%%a" processed.txt >nul
      if !errorlevel!==1 (
         echo Unprocessed File: "%%a"
         echo %%a >>processed.txt
      )
   )

And a final ')' to close the containing FOR:

)

I don't have a network here to test this on so if there are problems please get back to us for further assistance.

0
 

Author Comment

by:HopperSI
ID: 24774096
I wouldn't need a list of computers, it's only one computer, but the application doesn't exist on the same computer as the directory.

So the final script would look like this?

@echo off
setlocal enabledelayedexpansion

if not exist processed.txt echo nul >processed.txt

for /f "tokens=*" %%a in ('dir /a-d /b /s \\computername\\d:\docs\*.PDF') do (
   find /i "%%a" processed.txt >nul
   if !errorlevel!==1 (
      echo Unprocessed File: "%%a"
      echo %%a >>processed.txt
   )
)
)

0
 
LVL 16

Expert Comment

by:t0t0
ID: 24777962
Shouldn't that be something along the following lines:

   :
   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d$\docs\*.PDF"') do (
   :

also, you need to remove the extra ')' (closing bracket) at the end of your code.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24777968
Or....

   for /f "tokens=*" %%a in ('dir /a-d /b /s "\\computername\d\docs\*.PDF"') do (

anyway, whiever it is, the ':' (colon) shouldn't be there.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24789373
Well, there's a surprise for you. Thank you. And just to show you my appreciation check out the following suggestion.

There may come a time when some PDF files are either deleted or moved from your folders however, their entry will remain in the PROCESSED.TXT file. Ideally, these 'dead' entreis should be removed from the PROCESSED.TXT file thereby keeping the file as small as possible to help speed up PROCPDF.BAT. The follow batch file will do this for you.

Copy and paste the following code into Notepad and save it as SYNCPDF.BAT in the same folder as PROCESSED.TXT.


@echo off
del processed.tmp 2>nul
for /f "tokens=*" %%a in ('type processed.txt') do (
   if exist "%%a" echo %%a>>processed.tmp
)
move /y processed.tmp processed.txt
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Introduction During my participation as a VBScript contributor at Experts Exchange, one of the most common questions I come across is this: "I have a script that runs against only one computer. How can I make it run against a list of computers in …
If like me you are one who spends a lot of time working and scripting with cmd.exe, sometimes it is handy to be able to quickly view a calendar for a given month and year. This script will quickly do just that!  Save the code posted below to a .bat …
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now