<

How To Rename-Move a Batch of PDF Files Based on Contents of the Files

Published on
31,661 Points
17,461 Views
7 Endorsements
Last Modified:
Awarded
Joe Winograd
50+ years in computer industry •Everything from development to sales •CIO •Windows •Document Imaging •EE MVE 2015,2016,2018 •EE FELLOW 2017
Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe

A recent question here at Experts Exchange piqued my interest, so I decided to provide a thorough solution and publish this Article about it. The Original Poster (OP) of the question has approximately one thousand PDF files containing 7-character sequential alphanumeric file names (and, of course, all of the file extensions are PDF). Although the OP did not state this, it is likely that the sequential alphanumerics represent unique identifiers for his customers, perhaps customer numbers. The alphanumeric file name is cryptic, in no way identifiable with the customer, so the OP would like the file name to contain the customer name in addition to the number. For example, a file might be named:

D123456.PDF

The OP would like this file to be renamed:

D123456 John Smith.PDF

The customer name always begins in column 16 on the first line of the first page in the PDF file (and runs to the end of the line). The OP wants an automated way to rename the thousand PDF files, based on the customer name in the contents of each file – in essence, a batch/mass rename. The program documented in this Article (and provided in source code) performs this function.

Two excellent freeware products are needed for this solution – the AutoHotkey scripting language (the program is written in this) and the Xpdf package to convert the PDF files to text (so the program can extract the customer names for renaming the files). Here are the links for downloading both:

http://ahkscript.org (also, see my EE article: AutoHotkey - Getting Started)
http://www.foolabs.com/xpdf/download.html

The Xpdf binaries (both 32-bit and 64-bit) contain seven tools, but the only one needed is pdftotext.exe. The script looks for this file in Program Files and Program Files (x86), but you may put pdftotext.exe wherever you want and simply navigate to it – the script gives you a file browse dialog if it doesn't find it in Program Files or Program Files (x86).

The program generalizes the solution by allowing any number of characters in the original file name (for the OP it is 7) and any starting column number for the string that will be in the new file name (for the OP it is 16). The latter is an area where significant generalization and/or customization can take place. For example, the string that will be in the new file name may be on some line other than the first; or the string may be in a variable column number preceded by a string that identifies it, such as Account Number:. By providing the source code, you may modify the program to parse the text extracted from the PDF files in order to create the new file names. If an EE member needs something different from a fixed column number on the first line of the first page (and you don't feel comfortable modifying the script), please post your requirement in a comment on this Article and (within reason) I'll modify the script for you and post it here.

For those interested in understanding how the script works, the remainder of this Article shows the entire script broken down into code snippets, with a description of what each snippet does, including screenshots where appropriate. This also acts as a form of documentation for the program.

Code snippet:
 
SetBatchLines, -1 ; run at maximum speed

Open in new window

What it does: Sets the script to run at maximum speed, i.e., no "sleeping" will occur in the program.

Code snippet:
 
temporarily removed

Open in new window

What it does: Checks to see if Xpdf's pdftotext.exe is located in C:\Program Files (x86)\xpdf\ or C:\Program Files\xpdf\. If not, it displays a message and provides a file browse dialog so the user may navigate to pdftotext.exe (or exit the program).

pdftotext not foundfind pdftotext
Code snippet:
 
temporarily removed

Open in new window

What it does: Warns the user that existing files in the destination folder will be overwritten with no warning, and then gives the user the opportunity to exit or continue.

Overwrite Warning
Code snippet:
 
temporarily removed

Open in new window

What it does: Initializes some variables.

Code snippet:
 
temporarily removed

Open in new window

What it does: Asks the user for the number of characters in the source file names (before the .PDF extension). If the entry is not an integer and/or not greater than zero, it displays a message and gives the user the opportunity to try again or exit.

Enter num first charsNum first chars must be integerNum first chars must be at least 1
Code snippet:
 
temporarily removed

Open in new window

What it does: Asks the user for the starting column number of the string that will be appended to the current file name. If the entry is not an integer and/or not greater than zero, it displays a message and gives the user the opportunity to try again or exit.
Enter starting column numberColumn num must be integerColumn num must be at least 1
Code snippet:
 
temporarily removed

Open in new window

What it does: Asks the user to enter the full path of the source folder. It allows the user to navigate/browse to it or type/paste it in. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not the user includes the ending backslash in the path). It then checks to see if a source folder was entered, and if so, if the folder exists. If either is not true, it gives the user the opportunity to exit or continue. Note: whether or not the source folder can be reported as null with the Browse For Folder dialog depends on the operating system, so the program checks for it.

Enter source folderSource folder must be specified
Code snippet:
 
temporarily removed

Open in new window

What it does: Asks the user to enter the full path of the destination folder. It allows the user to navigate/browse to it or type/paste it in or create it. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not the user includes the ending backslash in the path). It then checks to see if a destination folder was entered, and if so, if the folder exists, giving the user the opportunity to create it, exit, or try again to enter the name. Note: whether or not the destination folder can be reported as null with the Browse For Folder dialog depends on the operating system, so the program checks for it.

Enter destination folder
Code snippet:
 
temporarily removed

Open in new window

What it does: Checks to see if the source folder is the same as the destination folder. If it is, a message tells the user that the operation will be a file rename; if it isn't, a message tells the user that the operation will be a file move. It asks if the user wishes to continue or exit.

Source Destination same - RenameSource Destination different - Move
Code snippet:
 
temporarily removed

Open in new window

What it does: Asks the user to confirm that the chosen parameters are correct, providing the option to continue or exit.

Confirm params
Code snippet:
 
temporarily removed

Open in new window

What it does: Initializes variables that are used to track operational statistics, which will be reported in Operation Complete dialog.

Code snippet:
 
temporarily removed

Open in new window

What it does: Loops through all of the files in the source folder, sorted in file name order (ascending), ignoring any non-PDF files. It counts the number of PDF files and the number of non-PDF files. It displays a dialog box with a green progress bar that moves to the right during processing, also showing the percentage that is done and the name of the file currently being processed.

Progress Bar
Code snippet:
 
temporarily removed

Open in new window

What it does: Calls Xpdf's pdftotext.exe to read the first page of the PDF file into a text file (with the same file name as the PDF file, but with a file extension of .TXT). The -f parameter specifies the first page to convert (1) and the -l specifies the last page to convert (also 1). The -layout parameter maintains the original physical layout of the text so that multiple lines are not concatenated into a single line.

Code snippet:
 
temporarily removed

Open in new window

What it does: After converting the first page of the PDF file to a text file, it looks in the first line of the text file for the string starting in the column number specified by the user and then concatenates the original file name with that string, putting a space between them (and it deletes the text file). It then renames/moves the file with its new file name. Note that AutoHotkey doesn't have a FileRename command, per se, but instead provides a FileMove command, which serves a dual purpose – it, in essence, does a file rename when the source and destination folders are the same, but does a file move when they are different.

Code snippet:
 
temporarily removed

Open in new window

What it does: Finalizes and formats all of the statistics from the operation and displays them in an Operation Completed dialog box.

Operation CompletedThat's it! I hope this helps the OP as well as other EE members.  If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
 
7
Ask questions about what you read
If you have a question about something within an article, you can receive help directly from the article author. Experts Exchange article authors are available to answer questions and further the discussion.
Get 7 days free