A common requirement for Office Automation applications is to use the data in your spreadsheet / database / application to manipulate files stored on your local filing system. In VBA on a Windows platform there are two approaches you could use to do this:
The built-in VBA statements (ChDrive, CurDir, Dir, MkDir, RmDir, FileCopy, Name and Kill) but you will then need to create your own “if exists” methods and you can only copy one file at a time.
The Windows scripting runtime which, amongst other things, provides an object oriented wrapper for the Windows file system API. This approach is more complete, more flexible, and higher performance than the built-in VBA statements. In my opinion the object orientation also produces cleaner code.
In this article I use Excel as the host environment as an example but this approach works for everything using the standard Visual Basic for Applications environment including ASP although in Web applications it only makes sense for server side code; enabling access to local file manipulation from an Internet client is a totally unsafe thing to do so do not ask it of your users. I find myself using it mainly from Excel and occasionally from Access.
Please note that if you are using a .Net based development environment then you should use the System.IO.File library built into the framework…there is no need to touch the scripting objects at all.
So we are going to use something called the FileSystemObject in the Microsoft Scripting environment. The FileSystemObject appears in a number of the dll’s available for managing scripting but we just want to use the object not host our own scripting, so we only need to access the scripting runtime dll.
Referencing the DLL in Your Project
To gain the benefits of Intellisense and to be able to inspect the scripting type library in the VBA Object Browser we need to add a reference to the DLL to our projects. You do this from the References dialog accessed via the Tools > References… menu in the VBA Window.
You then need to select the DLL…either browse for scrrun.dll in the \\Windows\system32 folder or look for it under Microsoft Scripting Runtime in the Available References listbox on the dialog (see screenshot).
Exploring the Library
Having referenced the type library like this you can now use VBA’s built-in object browser (see screenshot below) to explore the objects, methods and properties provided by the library. Remember to select the Scripting type library in the top left dropdown to access the Scripting library hierarchy. There are three top level objects:
A Dictionary object which in my opinion is not really useful within a sophisticated host environment such as VBA; the VBA collection class offers much greater capabilities.
The FileSystemObject object that we are interested in. When you create an instance of the FileSystemObject it comes pre-loaded with a Drives collection and you can drill down from there. Alternatively you can use relative filenames and avoid too much wandering around the folder tree.
An Encoder object used to encode scripts on Web pages to obfuscate VBScript or JScript source code. Again not relevant for our current purpose.
In this example we are going to copy some files from a base folder to programmatically created sub-folders and rename the files as we do so. It is a simplification of a real application for judging photographs in a competition.
Each photo is a jpg file with an anonymous ID and an Excel spreadsheet is used to record a score from a panel of 3 judges against the ID of the photo.
After the initial scoring the judges have to review the images again to award medals and ribbons; but they need to do that within each ability category rather than globally and obviously they want to see the images in reverse order of score. So we prefix the original filename with the score so the images will sort by score in an image browser; and then we copy the files, renaming them as we go, into sub-folders that match the categories.
The demo has 5 columns…image ID, category, score, source filename and destination filename. It copies all the files from the source folder to a sub-folder created using the categories. Outside the scope of this article there is also a simple macro to allow the user to select the base image folder (the sub-folders are created under that).
The zip file attached here contains a pdf of the article, the demo project spreadsheet and 10 appropriately named dummy jpg’s so you can test it. Just unzip to wherever you want then open the spreadsheet and click on the ‘Get Source Folder’ button and select the folder containing the jpgs. Then press ‘Extract Files’ to run the renaming code.
Key Features in the Example Code
The code is fairly self-explanatory with the main procedure shown below. All the actual folder and file handling are in only two lines of the code inside the for/next loop.
fs.FolderExists(destPath) – is used to see if the category based sub-folder already exists.
fs.CreateFolder(destPath) – will create the category sub-folder the first time the category is found.
fs.CopyFile – sets the optional OverwriteFiles to True so no exceptions are raised if you run this more than once…it simply overwrites any existing file.
Also of note is that the CopyFile method will accept wildcards allowing you to copy multiple files with a single command; however there is also a single file method called Copy if you prefer that.
Basically this very useful scripting library provides a simple to use and powerful set of tools for drive, folder and file level manipulations from inside Excel, Access etc. It also supports text file sequential IO for programmatically working with structured text files (e.g. creating or parsing a delimited file like a csv, or field to a line etc).
It opens up the use of Excel in particular, as a host for some very sophisticated file handling utilities.
Mea culpa on the row data type; now changed to long. Although anyone using Excel to do file operations on >32760 or so files needs their head examining!:-)
Nice article introducing users to the FSO. I use that class quite often myself.
I did kind of wonder about one line from your article above:
I just published an article about using Dictionaries in VBA code; I'd appreciate it if you would take a look, and then let me know if you still think the Collection "offers much greater capabilities" than a Dictionary :)
Wow. Your article is a tour de force indeed! Excellent work. And I fully acknowledge that the magnitude of the performance difference surprised me...although I was not surprsed to find that there was one! One of my favorite peices of advice to people asking how to improve the performance of VBA is "don't use it!":-) I have seen close to an order of magnitude difference between using creative native Excel formulas and doing the same thing in VBA...that's 1000%...putting the or so 50% you found into perspective. But it would definitely be significant if you are working with VERY large lists like your test cases.
The "shortcoming" of not having access to key values can easily be remedied in collections by using structured content that also contains the key. Then I can enumerate the keys easily - but I acknowledge it is as much a fix as the one for dictionaries "shortcomings" in the para below. That said I find that most of the time I might need to do that it is even more efficient to move the data into a proper database and use SQL to manipulate whole sets of data at one time. The record by record approach of both Collections and Dictionaries is really best left to smaller scenarios...the SQL performance gain over this is HUGE!
Finally I very much like the ability of referencing a Collection by an Index value. If you put an ordered list in then you can them use simple math functions for random access (not for/each/next) without needing to know the key at all (which is why they are optional in Collections of course). The only way I can gain that sort of random access into the dictionary is to maintain a separate array of keys. For me at least, this is a MAJOR flexibility advantage for collections...but if you have never used them that way then it wouldn't matter.
I suspect the truth is that, as you say, there is really little to choose between them in situations where it is sensible to staying with in-memory list management. Their differences will suit them more or less for different applications.
Which is a change of my position...to some degree at least!:-)