How To Rename-Move a Batch of PDF Files Based on Contents of the Files

Article Update 13-March-2020: I removed the full source code and the code snippets. The article that remains should act as a "design roadmap" for members who want to write the code in the programming language of your choice. If you are interested in discussing the program further, please contact me via the EE message system.

A recent question here at Experts Exchange piqued my interest, so I decided to provide a thorough solution and publish this Article about it. The Original Poster (OP) of the question has approximately one thousand PDF files containing 7-character sequential alphanumeric file names (and, of course, all of the file extensions are PDF). Although the OP did not state this, it is likely that the sequential alphanumerics represent unique identifiers for his customers, perhaps customer numbers. The alphanumeric file name is cryptic, in no way identifiable with the customer, so the OP would like the file name to contain the customer name in addition to the number. For example, a file might be named:

D123456.PDF

The OP would like this file to be renamed:

D123456 John Smith.PDF

The customer name always begins in column 16 on the first line of the first page in the PDF file (and runs to the end of the line). The OP wants an automated way to rename the thousand PDF files, based on the customer name in the contents of each file – in essence, a batch/mass rename. The program documented in this Article (and provided in source code) performs this function.

Two excellent freeware products are needed for this solution – the AutoHotkey scripting language (the program is written in this) and the Xpdf package to convert the PDF files to text (so the program can extract the customer names for renaming the files). Here are the links for downloading both:

http://ahkscript.org (also, see my EE article: AutoHotkey - Getting Started)
http://www.foolabs.com/xpdf/download.html

The Xpdf binaries (both 32-bit and 64-bit) contain seven tools, but the only one needed is pdftotext.exe. The script looks for this file in Program Files and Program Files (x86), but you may put pdftotext.exe wherever you want and simply navigate to it – the script gives you a file browse dialog if it doesn't find it in Program Files or Program Files (x86).

The program generalizes the solution by allowing any number of characters in the original file name (for the OP it is 7) and any starting column number for the string that will be in the new file name (for the OP it is 16). The latter is an area where significant generalization and/or customization can take place. For example, the string that will be in the new file name may be on some line other than the first; or the string may be in a variable column number preceded by a string that identifies it, such as Account Number:. By providing the source code, you may modify the program to parse the text extracted from the PDF files in order to create the new file names. If an EE member needs something different from a fixed column number on the first line of the first page (and you don't feel comfortable modifying the script), please post your requirement in a comment on this Article and (within reason) I'll modify the script for you and post it here.

For those interested in understanding how the script works, the remainder of this Article shows the entire script broken down into code snippets, with a description of what each snippet does, including screenshots where appropriate. This also acts as a form of documentation for the program.

Code snippet:

SetBatchLines, -1 ; run at maximum speed

What it does: Sets the script to run at maximum speed, i.e., no "sleeping" will occur in the program.

Code snippet:

removed

What it does: Checks to see if Xpdf's pdftotext.exe is located in C:\Program Files (x86)\xpdf\ or C:\Program Files\xpdf\. If not, it displays a message and provides a file browse dialog so the user may navigate to pdftotext.exe (or exit the program).

Code snippet:

removed

What it does: Warns the user that existing files in the destination folder will be overwritten with no warning, and then gives the user the opportunity to exit or continue.

Code snippet:

removed

What it does: Initializes some variables.

Code snippet:

removed

What it does: Asks the user for the number of characters in the source file names (before the .PDF extension). If the entry is not an integer and/or not greater than zero, it displays a message and gives the user the opportunity to try again or exit.

Code snippet:

removed

What it does: Asks the user for the starting column number of the string that will be appended to the current file name. If the entry is not an integer and/or not greater than zero, it displays a message and gives the user the opportunity to try again or exit.

Code snippet:

removed

What it does: Asks the user to enter the full path of the source folder. It allows the user to navigate/browse to it or type/paste it in. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not the user includes the ending backslash in the path). It then checks to see if a source folder was entered, and if so, if the folder exists. If either is not true, it gives the user the opportunity to exit or continue. Note: whether or not the source folder can be reported as null with the Browse For Folder dialog depends on the operating system, so the program checks for it.

Code snippet:

removed

What it does: Asks the user to enter the full path of the destination folder. It allows the user to navigate/browse to it or type/paste it in or create it. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not the user includes the ending backslash in the path). It then checks to see if a destination folder was entered, and if so, if the folder exists, giving the user the opportunity to create it, exit, or try again to enter the name. Note: whether or not the destination folder can be reported as null with the Browse For Folder dialog depends on the operating system, so the program checks for it.

Code snippet:

removed

What it does: Checks to see if the source folder is the same as the destination folder. If it is, a message tells the user that the operation will be a file rename; if it isn't, a message tells the user that the operation will be a file move. It asks if the user wishes to continue or exit.

Code snippet:

removed

What it does: Asks the user to confirm that the chosen parameters are correct, providing the option to continue or exit.

Code snippet:

removed

What it does: Initializes variables that are used to track operational statistics, which will be reported in Operation Complete dialog.

Code snippet:

removed

What it does: Loops through all of the files in the source folder, sorted in file name order (ascending), ignoring any non-PDF files. It counts the number of PDF files and the number of non-PDF files. It displays a dialog box with a green progress bar that moves to the right during processing, also showing the percentage that is done and the name of the file currently being processed.

Code snippet:

removed

What it does: Calls Xpdf's pdftotext.exe to read the first page of the PDF file into a text file (with the same file name as the PDF file, but with a file extension of .TXT). The -f parameter specifies the first page to convert (1) and the -l specifies the last page to convert (also 1). The -layout parameter maintains the original physical layout of the text so that multiple lines are not concatenated into a single line.

Code snippet:

removed

What it does: After converting the first page of the PDF file to a text file, it looks in the first line of the text file for the string starting in the column number specified by the user and then concatenates the original file name with that string, putting a space between them (and it deletes the text file). It then renames/moves the file with its new file name. Note that AutoHotkey doesn't have a FileRename command, per se, but instead provides a FileMove command, which serves a dual purpose – it, in essence, does a file rename when the source and destination folders are the same, but does a file move when they are different.

Code snippet:

removed

What it does: Finalizes and formats all of the statistics from the operation and displays them in an Operation Completed dialog box.

That's it! I hope this helps the OP as well as other EE members. If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe

Comments (63)

doctordigital

Commented: 2014-12-02

That would be great. Thanks

Member_2_7970298

Commented: 2016-08-04

How do I obtain a copy of your AutoHotKey script for "How To Rename-Move a Batch of PDF Files Based on Contents of the Files" ?
I have what I believe a application to rename multiple PDF files and would greatly appreciate receiving a copy of the AutoHotKey script.
Thank You

Joe Winograd

Developer

CERTIFIED EXPERT

Fellow

Most Valuable Expert 2018

Author

Hi New Member,

Perhaps you missed my answer to your same question at my Split-Rename-Move article, so I'll repeat it here for you.

When I removed the source code last year from six articles that I published here at EE, my intention was that the removal be temporary. I began a project to rewrite all of the programs in my portfolio in order to generalize them for a broader audience and to have a standard user interface, including both a GUI (graphical user interface) and, where it makes sense, a CLI (command line interface). It wound up being a much larger effort than I anticipated, and I'm still not ready to post or distribute the source code for this program (or any of the other five published at EE — and I don't know when or even if that will be, for a variety of reasons).

I have created customized versions of these various programs for EE members who became clients of mine. I provided licenses for the run-time programs (the executables, i.e., the compiled EXE files) for an agreed-upon fee, but I did not provide the source code. I did this previously when EE had the "Hire Me" button, but that no longer exists. The mechanism now at EE for such work is the new Gigs feature, if that interests you.

Regards, Joe

Joe,
Thanks for your response.
I appreciate your comments & issues.
I would greatly appreciate it if you could see your way clear to send me your original AutoHotKey script.
I'm trying to learn more about AutoHotKey scripts and especially how it interfaces with Xpdf's pdftotext.exe
Thanks

Hi Member_2_7970298 (???),

I received your email at my personal email address, which I'll respond to in a moment. I already responded to your post at the AHK forum, which led you to this article, and then to my Split-Rename-Move article. Instead of three different communication venues (EE, AHK, email), let's continue this discussion via just email.

That said, a quick message about your comments is that the Tutorials forum and the Scripts and Functions forum at the AHK boards are the way to go "to learn more about AutoHotKey scripts" (as well as the Tutorial at the AHK docs site).

There's not much to learn about "how it interfaces with Xpdf's pdftotext.exe" — the RunWait command is it. Here's an actual call from one of my programs:

RunWait,%pdftotextEXE% -f 1 -l 1 -raw "%FullFileNameCurrent%" "%DestinationFolder%%FileNameCurrentTXT%"

Open in new window

I'm sure from the names of the variables you can figure out what that line does. Also, I gave you links at the AHK forum to my two 5-minute EE video Micro Tutorials that should help you with learning about how to use the pdftotext.exe tool:
Xpdf - Command Line Utility for PDF Files
Xpdf - Convert PDF Files to Plain Text Files

If you haven't viewed them yet, I think you'll find them to be a worthwhile expenditure of 10 minutes. Regards, Joe