Article Update 13-March-2020:
I removed the full source code and the code snippets. The article that remains should act as a "design roadmap" for members who want to write the code in the programming language of your choice. If you are interested in discussing the program further, please contact me via the EE message system.
This article was inspired by a recent question
here at Experts Exchange. In response to that, I wrote a "quick-and-dirty" script that performs the function requested by the Original Poster, but it has numerous shortcomings. This article describes a major revision of that script, which addresses the shortcomings in the original one.
The objective is to report a count of the files, for each file type (file extension), in a folder and all of its subfolders. In addition, the report should contain the total size of all files for each file type. Example:
I wrote the original "quick-and-dirty" script in AutoHotkey
, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website
. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started
. After installation, AutoHotkey will own the AHK
file type, supporting the solution discussed in the remainder of this article.
The new script improves upon the previous one in the following ways:
The original script hard-codes the name of the source folder. The new script provides a standard Windows "Browse for Folder" dialog that allows the user to navigate to the folder (or type or copy/paste the folder name).
The original script saves the results in the same file, so the results are not preserved across multiple executions, unless the user manually saves the file. The new script saves the results in a file with the date and time of execution (including seconds) in the file name (yyyy-MM-dd_HH.mm.ss), so there are never duplicate or overwritten results files.
The original script saves the results in a simple text file. The new script saves the results in a CSV file that may be easily loaded into Excel for additional processing – sorting, formatting, printing, whatever.
The original script stores the results file in the source base folder. The new script stores the results file in the same folder where the script is located – a better choice. I considered allowing the user to select the folder (and even the file name), but decided not to complicate the process of entering parameters for the run.
The original script always includes subfolders. The new script provides an option to include or exclude subfolders.
The original script does no error checking. The new script does extensive error checking.
The original script terminates with a runtime error if there is a file type that contains a character which is invalid in AutoHotkey variable names (see the GOOD NEWS AND BAD NEWS
section below). The new script executes properly when these file types are present.
HOW TO RUN THIS SCRIPT
Download the attached file called Counts-and-Sizes.ahk
and simply double-click on it in Windows/File Explorer or whatever file manager you use. Since its file type is AHK, AutoHotkey will be launched to process it. If you prefer, the file may be turned into an executable via the AutoHotkey compiler, which is installed during the standard installation of AutoHotkey. If you right-click on an AHK file in Windows Explorer or whatever file manager you use, there will be a context menu pick called Compile Script:
Select that menu item and it will create an EXE
file, which is a stand-alone/no-install executable of the AHK script.
HOW THE SCRIPT WORKS
For those interested in understanding how the script works, the remainder of this article shows code snippets, with a description of what each snippet does, including screenshots where appropriate (this also acts as a form of documentation for the script).
What it does: The #Warn
statement provides a warning when a variable is read without having been initialized or assigned a value. The SetBatchLines
statement sets the script to run at maximum speed, i.e., no "sleeping" will occur in the script.
What it does: Asks the user to enter the full path of the source folder:
It allows the user to navigate/browse to it or type/paste it in. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not the user includes the ending backslash in the path). It then checks to see if a source folder was entered, and if so, if the folder exists. If either is not true, it gives the user the opportunity to exit or continue. Note:
whether or not the source folder can be reported as null with the Browse For Folder
dialog depends on the operating system, so the code checks for it. When the source folder is obtained, it creates a variable with the source files by appending *.*
to the source folder.
What it does: Asks if the user wants to include subfolders:
What it does: Creates a file name to store the results. It includes the date and time to the second (ss), so there can never be a duplicate file name. It stores the file in the same folder where the script is located.
What it does: Initializes some variables for the first loop through the files. The BadVarName
variables are discussed in the GOOD NEWS AND BAD NEWS
section later in this article.
What it does: The script makes three loops through the folders/files. This is the first. Its purpose is to detect all of the file types and create/initialize a dynamic variable for each file type to store its count and size. This is discussed in detail in the GOOD NEWS AND BAD NEWS
What it does: This is the second loop through the folders/files. Since a dynamic variable for each file type was created/initialized in the first loop, this loop utilizes those variables by incrementing the count and size for each file found.
What it does: Initializes some variables for the third loop. Writes out the header row for the Comma Separated Variable (CSV) results file, terminating with a fatal error if the append (write) operation gives a non-zero return code. The header looks like one of these two lines, depending on the choice of including/excluding subfolders:
File Type,Count,Size(bytes),Folder=D:\BaseFolder\ (with subfolders)
File Type,Count,Size(bytes),Folder=D:\BaseFolder\ (without subfolders)
What it does: The is the third and final loop through the folders/files. It appends a line to the CSV
results file for each file type with its count and size (the OutputFlag variable ensures that only one line is appended for each file type), terminating with a fatal error if the append (write) operation gives a non-zero return code.
The CSV lines look like this:
After loading the CSV results file into Excel, doing some formatting, sorting by the File Type column, and putting in a TOTAL row with SUM formulas for Count and Size, the Excel spreadsheet looks like this:
What it does: Displays a dialog box with the fully qualified file name of the CSV results file:
After displaying this dialog, the script exits.
GOOD NEWS AND BAD NEWS
In discussing the last code snippet, we get to the GOOD NEWS AND BAD NEWS
section mentioned previously in the article. The good news
is that the script does not require the user to specify in advance what file types to process. It takes advantage of a powerful AutoHotkey feature to create dynamically (at runtime) a variable with a name that is based on the contents of another variable. For example, consider the variables named:
If the variable %FileExt% contains the value PDF
at runtime, then these variables have the names:
This technique is used in the script to create dynamically Count
variables for every file type encountered during the search, without requiring the user to specify file types in advance.
The bad news
is that some characters are valid in Windows file names but invalid in AutoHotkey variable names. The list of these characters may be seen in the code snippet below. For example, if the script encounters a file type of
and attempts to create the variables
it will terminate with a runtime error – that's what happens with the script I posted in the original question
. So this new script checks every character in the file types that it finds and looks for "bad" characters, i.e., characters that would result in an invalid AutoHotkey variable name. When it finds them, it stores the counts and sizes in these variables:
It reports these values in the results file with *Other*
as the file type.
More good news
is that these characters are not found in most file types of interest, such as ahk, bat, csv, doc, docx, exe, flv, gif, htm, html, ico, java, kbd, lnk, m4v, nsi, opd, pdf, qt, rtf, sys, tif, tiff, uni, vbs, wma, xls, xlsx, y4m, zip (couldn't resist the A to Z approach).
What it does: This is a function that takes a file type as the input parameter and returns TRUE
if any character is a "bad" one, i.e., if it contains any character that is invalid in an AutoHotkey variable name. Otherwise, it returns FALSE
That's it! I hope this helps the Original Poster as well as other EE members. If you find this article to be helpful, please click the thumbs-up
icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe