Count and Total Size by File Type

Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT
Published:
Updated:
Edited by: Andrew Leniart
Article Update 13-March-2020: I removed the full source code and the code snippets. The article that remains should act as a "design roadmap" for members who want to write the code in the programming language of your choice. If you are interested in discussing the program further, please contact me via the EE message system.

INTRODUCTION

This article was inspired by a recent question here at Experts Exchange. In response to that, I wrote a "quick-and-dirty" script that performs the function requested by the Original Poster, but it has numerous shortcomings. This article describes a major revision of that script, which addresses the shortcomings in the original one.

PROBLEM DESCRIPTION

The objective is to report a count of the files, for each file type (file extension), in a folder and all of its subfolders. In addition, the report should contain the total size of all files for each file type. Example:


  SOLUTION

I wrote the original "quick-and-dirty" script in AutoHotkey, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started. After installation, AutoHotkey will own the AHK file type, supporting the solution discussed in the remainder of this article.

The new script improves upon the previous one in the following ways:

o  The original script hard-codes the name of the source folder. The new script provides a standard Windows "Browse for Folder" dialog that allows the user to navigate to the folder (or type or copy/paste the folder name).

o  The original script saves the results in the same file, so the results are not preserved across multiple executions, unless the user manually saves the file. The new script saves the results in a file with the date and time of execution (including seconds) in the file name (yyyy-MM-dd_HH.mm.ss), so there are never duplicate or overwritten results files.

o  The original script saves the results in a simple text file. The new script saves the results in a CSV file that may be easily loaded into Excel for additional processing – sorting, formatting, printing, whatever.

o  The original script stores the results file in the source base folder. The new script stores the results file in the same folder where the script is located – a better choice. I considered allowing the user to select the folder (and even the file name), but decided not to complicate the process of entering parameters for the run.

o  The original script always includes subfolders. The new script provides an option to include or exclude subfolders.

o  The original script does no error checking. The new script does extensive error checking.

o  The original script terminates with a runtime error if there is a file type that contains a character which is invalid in AutoHotkey variable names (see the GOOD NEWS AND BAD NEWS section below). The new script executes properly when these file types are present.

HOW TO RUN THIS SCRIPT

Download the attached file called Counts-and-Sizes.ahk and simply double-click on it in Windows/File Explorer or whatever file manager you use. Since its file type is AHK, AutoHotkey will be launched to process it. If you prefer, the file may be turned into an executable via the AutoHotkey compiler, which is installed during the standard installation of AutoHotkey. If you right-click on an AHK file in Windows Explorer or whatever file manager you use, there will be a context menu pick called Compile Script:


Select that menu item and it will create an EXE file, which is a stand-alone/no-install executable of the AHK script.

HOW THE SCRIPT WORKS

For those interested in understanding how the script works, the remainder of this article shows code snippets, with a description of what each snippet does, including screenshots where appropriate (this also acts as a form of documentation for the script).

Code snippet:
 
removed
What it does: The #Warn statement provides a warning when a variable is read without having been initialized or assigned a value. The SetBatchLines statement sets the script to run at maximum speed, i.e., no "sleeping" will occur in the script.

Code snippet:
 
removed
What it does: Asks the user to enter the full path of the source folder:


It allows the user to navigate/browse to it or type/paste it in. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not the user includes the ending backslash in the path). It then checks to see if a source folder was entered, and if so, if the folder exists. If either is not true, it gives the user the opportunity to exit or continue. Note: whether or not the source folder can be reported as null with the Browse For Folder dialog depends on the operating system, so the code checks for it. When the source folder is obtained, it creates a variable with the source files by appending *.* to the source folder.

Code snippet:
 
removed
What it does: Asks if the user wants to include subfolders:

Code snippet:
 
removed
What it does: Creates a file name to store the results. It includes the date and time to the second (ss), so there can never be a duplicate file name. It stores the file in the same folder where the script is located.

Code snippet:
 
removed
What it does: Initializes some variables for the first loop through the files. The BadVarName variables are discussed in the GOOD NEWS AND BAD NEWS section later in this article.

Code snippet:
 
removed
What it does: The script makes three loops through the folders/files. This is the first. Its purpose is to detect all of the file types and create/initialize a dynamic variable for each file type to store its count and size. This is discussed in detail in the GOOD NEWS AND BAD NEWS section.

Code snippet:
 
removed
What it does: This is the second loop through the folders/files. Since a dynamic variable for each file type was created/initialized in the first loop, this loop utilizes those variables by incrementing the count and size for each file found.

Code snippet:
 
removed
What it does: Initializes some variables for the third loop. Writes out the header row for the Comma Separated Variable (CSV) results file, terminating with a fatal error if the append (write) operation gives a non-zero return code. The header looks like one of these two lines, depending on the choice of including/excluding subfolders:

File Type,Count,Size(bytes),Folder=D:\BaseFolder\ (with subfolders)
File Type,Count,Size(bytes),Folder=D:\BaseFolder\ (without subfolders)

Code snippet:
 
removed
What it does: The is the third and final loop through the folders/files. It appends a line to the CSV results file for each file type with its count and size (the OutputFlag variable ensures that only one line is appended for each file type), terminating with a fatal error if the append (write) operation gives a non-zero return code.

The CSV lines look like this:

="jpg",1040,178585752
="rtf",9,6260767
="pdf",673,694644678
="docx",36,9087570
="html",249,11401711
="bat",9,8866
*Other*,2,1954
="class",3391,9249790
="webpage",1,112
="0A1",1,17920

After loading the CSV results file into Excel, doing some formatting, sorting by the File Type column, and putting in a TOTAL row with SUM formulas for Count and Size, the Excel spreadsheet looks like this:


Code snippet:
 
removed

What it does: Displays a dialog box with the fully qualified file name of the CSV results file:


After displaying this dialog, the script exits.

GOOD NEWS AND BAD NEWS

In discussing the last code snippet, we get to the GOOD NEWS AND BAD NEWS section mentioned previously in the article. The good news is that the script does not require the user to specify in advance what file types to process. It takes advantage of a powerful AutoHotkey feature to create dynamically (at runtime) a variable with a name that is based on the contents of another variable. For example, consider the variables named:

Count_%FileExt%
Size_%FileExt%

If the variable %FileExt% contains the value PDF at runtime, then these variables have the names:

Count_PDF
Size_PDF

This technique is used in the script to create dynamically Count and Size variables for every file type encountered during the search, without requiring the user to specify file types in advance.

The bad news is that some characters are valid in Windows file names but invalid in AutoHotkey variable names. The list of these characters may be seen in the code snippet below. For example, if the script encounters a file type of

ab!

and attempts to create the variables

Count_ab!
Size_ab!

it will terminate with a runtime error – that's what happens with the script I posted in the original question. So this new script checks every character in the file types that it finds and looks for "bad" characters, i.e., characters that would result in an invalid AutoHotkey variable name. When it finds them, it stores the counts and sizes in these variables:

Count_BadVarName
Size_BadVarName

It reports these values in the results file with *Other* as the file type.

More good news is that these characters are not found in most file types of interest, such as ahk, bat, csv, doc, docx, exe, flv, gif, htm, html, ico, java, kbd, lnk, m4v, nsi, opd, pdf, qt, rtf, sys, tif, tiff, uni, vbs, wma, xls, xlsx, y4m, zip (couldn't resist the A to Z approach).

Code snippet:
 
removed

What it does: This is a function that takes a file type as the input parameter and returns TRUE if any character is a "bad" one, i.e., if it contains any character that is invalid in an AutoHotkey variable name. Otherwise, it returns FALSE.

That's it! I hope this helps the Original Poster as well as other EE members.  If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
2
6,756 Views
Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.