<

How To Rename-Move a Batch of PDF Files Based on the Contents of Corresponding Excel Files

Published on
12,952 Points
5,552 Views
4 Endorsements
Last Modified:
Approved
Joe Winograd, Fellow&MVE
50+ years in computer industry. Everything from development to sales. CIO. Document imaging. EE MVE 2015, EE MVE 2016, EE FELLOW 2017.
Update 21-May-2015: I temporarily removed the source code to make major changes to the program. Regards, Joe

In a previous Experts Exchange article, How To Rename-Move a Batch of PDF Files Based on Contents of the Files, I presented a program that solved a question asked here at EE. In that article and program, the situation was that a large number of PDF files in a folder had to be renamed automatically (en masse) with a suffix contained somewhere in each of the PDF files.

For example, a file might be named:

D123456.pdf

And inside that file at a fixed location (specific page, specific line, specific column) might be:

John Smith

In this example, the requirement is to rename the file as:

D123456 John Smith.pdf

A more recent question here at EE has a similar requirement, except that the suffix is contained in a specific cell of a corresponding Excel spreadsheet. In this newer question, there's an automated process that creates hundreds of PDF documents from Excel spreadsheets. The problem is that the PDF documents need to be renamed after they are created. The renaming should add a suffix to the current PDF file name, as in the previous question, but the suffix is in a fixed cell location of the Excel spreadsheet from which the PDF was created.

For example, a file might be named:

Order Number 12345.xls

In which case, the automated PDF creation process would create:

Order Number 12345.pdf

Now, suppose that cell V16 in each spreadsheet contains the invoice number for each order and that this should be the suffix for the PDF filename. In other words, if cell V16 in file Order Number 12345.xls contains:

K56789

Then the PDF file should be renamed to:

Order Number 12345K56789.pdf

Or perhaps the more visually pleasing:

Order Number 12345 K56789.pdf

Or maybe even:

Order Number 12345_Invoice K56789.pdf

In thinking about this new requirement, it occurred to me that I could leverage the bulk of the code created for the previous article. As with that previous program, the new program presented here is written in AutoHotkey, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started. After installation, AutoHotkey will own the AHK file type, supporting the solution discussed in the remainder of this article. Also, the setup utility will install a compiler that turns an AutoHotkey source code script into a stand-alone/no-install executable (an EXE file).

The source code is attached to this article in the file MassRenamePX.ahk, which is a plain text file (PX in the file name stands for PDF files based on EXcel cell contents). As mentioned above, AutoHotkey will own the file type AHK, so double-clicking on MassRenamePX.ahk in Windows/File Explorer (or whatever file manager you use) will invoke AutoHotkey to run the program. Alternatively, you may run the AutoHotkey compiler to create a stand-alone executable called MassRenamePX.exe (or whatever you'd like to call it).

For those folks who do not want to install AutoHotkey, I compiled it into MassRenamePX.exe and uploaded it to EE-Stuff. The regular EE site does not allow uploading EXE files, but EE-Stuff does. You log into EE-Stuff with the same username and password as EE. Then click on the Expert Area tab, then the "Find files for an article" link, and enter this article's ID (12872) in the box. When you download the file, the file name will be MassRenamePX.renamed-from-exe-for-your-safety. Rename it to MassRenamePX.exe (or whatever you want to call it, but make sure the file type is EXE).

When running the program, it goes through the following process:

(1) Warns that existing files in the destination folder will be overwritten with no warning and asks if you want to continue:

Overwrite Warning
(2) Asks for the Excel cell (such as A1 or C5 or V16) containing the suffix:

Enter Excel cell
And then performs error checking on the cell address, as follows:

(a) The cell address must be entered:

Excel cell not entered
(b) The first character of the cell address must be alphabetic:

Excel cell first character not alpha
(c) The cell address must be entirely alphanumeric:

Excel cell not alphanumeric
(d) The cell address must contain at least one number that is not zero – in other words, cells such as A0 or B00 are not valid:

Excel cell missing non-zero numeric
(3) Asks for a separator to put between the current file name and the suffix, such as a space or hyphen or underscore (or no separator):

Enter separator
This may be any number of characters but, of course, must contain only the characters that are valid in a file name. If you enter any invalid characters (see list below in item #8), it shows this:

Separator invalid
(4) Asks for the source folder:

Enter source folder
This is the folder where the PDF files and Excel spreadsheets are located (the latter may have file types of XLS or XLSX). You may enter it by browsing to it or typing the name into the box or copying/pasting into the box. The source folder must exist, otherwise it displays this:

Source folder missing
(5) Asks for the destination folder:

Enter destination folder
This is the folder where the renamed PDF files will be stored. If the source and destination folders are the same, the program does just a Rename. If the source and destination folders are not the same, the program does a Rename and a Move. As with the source folder, you may enter the destination folder by browsing to it or typing the name into the box or copying/pasting into the box. But this dialog has one more option not available for the source folder, namely, Make New Folder, providing the ability to create the destination folder if it doesn't exist.

(6) Asks for confirmation that the parameters are correct, since at this point it has all of the information needed:

Confirm Parameters - Source Destination same
Confirm Parameters - Source Destination different
As you can see in the two examples above, the program shows if the source and destination folders are the same or different, and what that means the result will be, namely, either a rename/in-place or a rename/move.

(7) Loops through all of the files in the source folder looking for PDF files that have an XLS or XLSX file with a matching name. When it finds such a pair, it reads the suffix cell in the Excel file and performs the rename/move on the PDF file. During this process, it displays a green progress bar moving to the right so you know that processing is taking place. The progress bar dialog also contains the name of the file that it is currently working on:

Progress Bar
(8) Detects if the suffix contains a character that is not valid for a file name. Specifically, it checks every character in the suffix against this list:

< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
| (vertical bar or pipe)
\ (backslash)
? (question mark)
* (asterisk)

All of these are invalid characters for a Windows file name, and if the program finds one in a suffix, it will not attempt to perform the rename/move.

(9) Captures the following statistics during a run:
 
 Number of PDF files found
 Number of PDF files without a matching Excel spreadsheet (neither an XLS nor an XLSX)
 Number of Excel files with a character in the suffix cell that is invalid for a file name
 Number of times that an attempted rename/move returned an error code
 Elapsed time in minutes and seconds
At the completion of the run, the operational statistics (and input parameters) are saved in a plain text file (in the destination folder) called:

Operational_Statistics_YYYY-MM-DD_hh.mm.ss.txt

where YYYY-MM-DD and hh.mm.ss are the date and time when the run began. The file looks like this:

Operational Statistics from Batch-Mass-Rename-Move-PDF-Files-Excel
Beginning date and time: 2014-01-17_16.02.04
Source folder: D:\0tempD\TestPDFrename-in\
Destination folder: D:\0tempD\TestPDFrename-out\
Excel cell containing the suffix: a2
Separator between file name and suffix: _
Number of PDF files processed: 173
Number of PDF files without a matching Excel file: 8
Number of Excel files with invalid character in suffix: 3
Number of times attempted rename/move returned error: 0
Ending date and time: 2014-01-17_16.05.22
Elapsed time (minutes:seconds): 3:18

If the attempt to write the Operational Statistics file fails (returns an error code), the program displays this dialog:

Error writing results file
As with the previous program, I did some generalization, but I realize that the solution is still rather specific to the requirements in the posted question. However, by providing the source code, I hope that other members with similar requirements will be able to modify the program to suit their purposes.

If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
4
Comment
0 Comments

Featured Post

Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

Join & Write a Comment

Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month