I temporarily removed the source code to make major changes to the program. Regards, Joe
This article presents a solution to a question
asked here at Experts Exchange. The situation is that there's a large number of subfolders (400 in the original question), each of which has a number of PDF files (two in the original question). The goal is to combine/merge the PDF files in each subfolder (in ascending date order) into a single PDF file, storing the combined file in each subfolder. The source PDF files in each subfolder may have any file names and the user should be able to specify the file name of the combined file.
The method presented in this article requires AutoHotkey
, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website
. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started
. After installation, AutoHotkey will own the AHK
file type, supporting the solution discussed in the remainder of this article.
The program utilizes another excellent (free!) piece of software — PDF Toolkit (PDFtk)
. It comes in both command line and GUI versions. The command line version is called PDFtk Server
. Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable — (with a supporting DLL, ) that runs on XP, Vista, W7, and W8. That is, it does not have to run on a "server" OS.
In order to implement the solution in this article, you must have AutoHotkey and PDFtk on your computer (downloads are available at the links above). The solution should work on XP, Vista, W7, and W8 (32-bit and 64-bit), but I did all of the development and testing on W7/64-bit, so that is the only certified platform.
RUNNING THE PROGRAM
Download the source code for the program, which is attached to this article in a plain text file called Combine-Merge-PDF-files-20140826.ahk
. After installing AutoHotkey, it will own the file type AHK, so simply double-click on the downloaded source code file in Windows/File Explorer (or whatever file manager you prefer) to run it. You may also compile the source code into a stand-alone/no-install executable by right-clicking on the source code file and selecting Compile Script:
After compiling it, simply run the EXE
file that the compiler created.
The solution works on any number of subfolders and any number of PDF files in each subfolder (it ignores non-PDF files). It provides an option to combine/merge the files sorted in three ways — by file name, by modified date ascending (oldest first), by modified date descending (newest first).
The remainder of this section discusses the solution in detail by going through the user interface, showing the screenshots from various executions of the program (all screenshots are from a W7/64-bit system).
The first step is to check for the installation of PDFtk Server by looking for in default locations. If it isn't found, you will see this error dialog:
If you installed it in a non-default location, click OK and you will get a browse-for-file dialog:
Navigate to pdftk.exe
and select it.
The next step is to select the root folder:
You may navigate to it or type/paste it in. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not you include the ending backslash in the path). It checks to see if you entered a folder and if the folder exists. If either is not true, it gives you the opportunity to try again or exit the program:
Note: whether or not a folder can be reported as null with the browse-for-folder dialog depends on the operating system, so the program checks for it.
Now it's time to enter the parameters for the run:
In the top box, enter the name for the combined/merged file (without the .PDF file type). You must enter a name, otherwise it shows this:
The program then checks for characters that are invalid in a file name and displays this dialog if it finds any:
Once the file name is valid, it appends .pdf
as the file type. For example, if you enter
combined PDF file
in the box, then the name of each combined/merged file will be
combined PDF file.pdf
Then select a radio button for the order in which to combine/merge the files (default is By file name). Finally, select a radio button for which folders to process: subfolders only (the default), the root folder only, or the root folder and
The program now processes the selected folders (if subfolders are selected, they are processed to an unlimited depth). It calls PDFtk to combine all of the PDF files in each subfolder into a single PDF file in each subfolder
, with a file name as described above. During processing, it displays a green progress bar that moves to the right so that you know it is working, not hanging. The progress bar also displays the name of the current subfolder being processed and the percentage completion:
The percentage completion is based on the number of folders (not on time or number of PDFs).
If a call to PDFtk results in an error code, you will see this dialog:
It shows you the folder causing the error so you may investigate that folder to determine the problem. The most common reason for this error is that an input file has the same name as the output file — PDFtk does not allow this. This would happen if you ran the program a second time, giving it the same combined file name without having first deleted (or moved) the combined file from the previous run.
When the program finishes, it displays this Operation Completed dialog:
The operational statistics are stored in a plain text file in the source folder. The file name of this results file is "Operational_Statistics_YY
, where the date/time is the beginning time of the run. Since seconds (SS) are in the file name, it is not possible to have a duplicate file (so there's no issue with respect to overwriting a file).
The results file contains this:
Operational Statistics from Combine-Merge-PDFs
Name of merged file: combined PDF file.pdf
Root folder: D:\0tempD\test combine\
Sort order: By file name
Folders processed: Root folder and subfolders
Number of folders processed: 11
Beginning date and time: 2014-08-26_01.52.39
Ending date and time: 2014-08-26_01.52.42
Elapsed time (minutes:seconds): 0:3
The elapsed time measurement begins after the parameters are entered so that it measures just the processing time, not including the time spent waiting for user input.
If you find this article to be helpful, please click the thumbs-up
icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe