How To Rename-Move a Batch of PDF Files Based on the Contents of Corresponding Excel Files

Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT
Published:
Updated:
Edited by: Andrew Leniart
Article Update 13-March-2020: I removed the source code. The article that remains should act as a "design roadmap" for members who want to write the code in the programming language of your choice. If you are interested in discussing the program further, please contact me via the EE message system.

In a previous Experts Exchange article, How To Rename-Move a Batch of PDF Files Based on Contents of the Files, I presented a program that solved a question asked here at EE. In that article and program, the situation was that a large number of PDF files in a folder had to be renamed automatically (en masse) with a suffix contained somewhere in each of the PDF files.

For example, a file might be named:

D123456.pdf

And inside that file at a fixed location (specific page, specific line, specific column) might be:

John Smith

In this example, the requirement is to rename the file as:

D123456 John Smith.pdf

A more recent question here at EE has a similar requirement, except that the suffix is contained in a specific cell of a corresponding Excel spreadsheet. In this newer question, there's an automated process that creates hundreds of PDF documents from Excel spreadsheets. The problem is that the PDF documents need to be renamed after they are created. The renaming should add a suffix to the current PDF file name, as in the previous question, but the suffix is in a fixed cell location of the Excel spreadsheet from which the PDF was created.

For example, a file might be named:

Order Number 12345.xls

In which case, the automated PDF creation process would create:

Order Number 12345.pdf

Now, suppose that cell V16 in each spreadsheet contains the invoice number for each order and that this should be the suffix for the PDF filename. In other words, if cell V16 in file Order Number 12345.xls contains:

K56789

Then the PDF file should be renamed to:

Order Number 12345K56789.pdf

Or perhaps the more visually pleasing:

Order Number 12345 K56789.pdf

Or maybe even:

Order Number 12345_Invoice K56789.pdf

In thinking about this new requirement, it occurred to me that I could leverage the bulk of the code created for the previous article. As with that previous program, the new program presented here is written in AutoHotkey, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started. After installation, AutoHotkey will own the AHK file type, supporting the solution discussed in the remainder of this article. Also, the setup utility will install a compiler that turns an AutoHotkey source code script into a stand-alone/no-install executable (an EXE file).

The source code is attached to this article in the file MassRenamePX.ahk, which is a plain text file ( PX in the file name stands for PDF files based on E Xcel cell contents). As mentioned above, AutoHotkey will own the file type AHK, so double-clicking on MassRenamePX.ahk in Windows/File Explorer (or whatever file manager you use) will invoke AutoHotkey to run the program. Alternatively, you may run the AutoHotkey compiler to create a stand-alone executable called MassRenamePX.exe (or whatever you'd like to call it).

For those folks who do not want to install AutoHotkey, I compiled it into MassRenamePX.exe and uploaded it to EE-Stuff. The regular EE site does not allow uploading EXE files, but EE-Stuff does. You log into EE-Stuff with the same username and password as EE. Then click on the Expert Area tab, then the "Find files for an article" link, and enter this article's ID (12872) in the box. When you download the file, the file name will be MassRenamePX.renamed-from-exe-for-your-safety. Rename it to MassRenamePX.exe (or whatever you want to call it, but make sure the file type is EXE).

When running the program, it goes through the following process:

(1) Warns that existing files in the destination folder will be overwritten with no warning and asks if you want to continue:


(2) Asks for the Excel cell (such as A1 or C5 or V16) containing the suffix:


And then performs error checking on the cell address, as follows:

(a) The cell address must be entered:


(b) The first character of the cell address must be alphabetic:


(c) The cell address must be entirely alphanumeric:


(d) The cell address must contain at least one number that is not zero – in other words, cells such as A0 or B00 are not valid:


(3) Asks for a separator to put between the current file name and the suffix, such as a space or hyphen or underscore (or no separator):


This may be any number of characters but, of course, must contain only the characters that are valid in a file name. If you enter any invalid characters (see list below in item #8), it shows this:


(4) Asks for the source folder:


This is the folder where the PDF files and Excel spreadsheets are located (the latter may have file types of XLS or XLSX). You may enter it by browsing to it or typing the name into the box or copying/pasting into the box. The source folder must exist, otherwise it displays this:


(5) Asks for the destination folder:


This is the folder where the renamed PDF files will be stored. If the source and destination folders are the same, the program does just a Rename. If the source and destination folders are not the same, the program does a Rename and a Move. As with the source folder, you may enter the destination folder by browsing to it or typing the name into the box or copying/pasting into the box. But this dialog has one more option not available for the source folder, namely, Make New Folder, providing the ability to create the destination folder if it doesn't exist.

(6) Asks for confirmation that the parameters are correct, since at this point it has all of the information needed:



As you can see in the two examples above, the program shows if the source and destination folders are the same or different, and what that means the result will be, namely, either a rename/in-place or a rename/move.

(7) Loops through all of the files in the source folder looking for PDF files that have an XLS or XLSX file with a matching name. When it finds such a pair, it reads the suffix cell in the Excel file and performs the rename/move on the PDF file. During this process, it displays a green progress bar moving to the right so you know that processing is taking place. The progress bar dialog also contains the name of the file that it is currently working on:


(8) Detects if the suffix contains a character that is not valid for a file name. Specifically, it checks every character in the suffix against this list:

< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
| (vertical bar or pipe)
\ (backslash)
? (question mark)
* (asterisk)

All of these are invalid characters for a Windows file name, and if the program finds one in a suffix, it will not attempt to perform the rename/move.

(9) Captures the following statistics during a run:
 
 Number of PDF files found
 Number of PDF files without a matching Excel spreadsheet (neither an XLS nor an XLSX)
 Number of Excel files with a character in the suffix cell that is invalid for a file name
 Number of times that an attempted rename/move returned an error code
 Elapsed time in minutes and seconds
At the completion of the run, the operational statistics (and input parameters) are saved in a plain text file (in the destination folder) called:

Operational_Statistics_YYYY-MM-DD_hh.mm.ss.txt

where YYYY-MM-DD and hh.mm.ss are the date and time when the run began. The file looks like this:

Operational Statistics from Batch-Mass-Rename-Move-PDF-Files-Excel
Beginning date and time: 2014-01-17_16.02.04
Source folder: D:\0tempD\TestPDFrename-in\
Destination folder: D:\0tempD\TestPDFrename-out\
Excel cell containing the suffix: a2
Separator between file name and suffix: _
Number of PDF files processed: 173
Number of PDF files without a matching Excel file: 8
Number of Excel files with invalid character in suffix: 3
Number of times attempted rename/move returned error: 0
Ending date and time: 2014-01-17_16.05.22
Elapsed time (minutes:seconds): 3:18

If the attempt to write the Operational Statistics file fails (returns an error code), the program displays this dialog:


As with the previous program, I did some generalization, but I realize that the solution is still rather specific to the requirements in the posted question. However, by providing the source code, I hope that other members with similar requirements will be able to modify the program to suit their purposes.

If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
4
9,385 Views
Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.