<

How to Combine-Merge PDF Files in Many Subfolders

Published on
17,918 Points
6,218 Views
7 Endorsements
Last Modified:
Approved
Joe Winograd
50+ years in computer industry •Everything from development to sales •CIO •Windows •Document Imaging •EE MVE 2015,2016,2018 •EE FELLOW 2017
Update 21-May-2015: I temporarily removed the source code to make major changes to the program. Regards, Joe

INTRODUCTION

This article presents a solution to a question asked here at Experts Exchange. The situation is that there's a large number of subfolders (400 in the original question), each of which has a number of PDF files (two in the original question). The goal is to combine/merge the PDF files in each subfolder (in ascending date order) into a single PDF file, storing the combined file in each subfolder. The source PDF files in each subfolder may have any file names and the user should be able to specify the file name of the combined file.

REQUIRED SOFTWARE

The method presented in this article requires AutoHotkey, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started. After installation, AutoHotkey will own the AHK file type, supporting the solution discussed in the remainder of this article.

The program utilizes another excellent (free!) piece of software — PDF Toolkit (PDFtk). It comes in both command line and GUI versions. The command line version is called PDFtk Server. Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable — (with a supporting DLL, ) that runs on XP, Vista, W7, and W8. That is, it does not have to run on a "server" OS.

In order to implement the solution in this article, you must have AutoHotkey and PDFtk on your computer (downloads are available at the links above). The solution should work on XP, Vista, W7, and W8 (32-bit and 64-bit), but I did all of the development and testing on W7/64-bit, so that is the only certified platform.

RUNNING THE PROGRAM

Download the source code for the program, which is attached to this article in a plain text file called Combine-Merge-PDF-files-20140826.ahk. After installing AutoHotkey, it will own the file type AHK, so simply double-click on the downloaded source code file in Windows/File Explorer (or whatever file manager you prefer) to run it. You may also compile the source code into a stand-alone/no-install executable by right-clicking on the source code file and selecting Compile Script:

AutoHotkey Compile Script
After compiling it, simply run the EXE file that the compiler created.

SOLUTION DESCRIPTION

The solution works on any number of subfolders and any number of PDF files in each subfolder (it ignores non-PDF files). It provides an option to combine/merge the files sorted in three ways — by file name, by modified date ascending (oldest first), by modified date descending (newest first).

The remainder of this section discusses the solution in detail by going through the user interface, showing the screenshots from various executions of the program (all screenshots are from a W7/64-bit system).

The first step is to check for the installation of PDFtk Server by looking for in default locations. If it isn't found, you will see this error dialog:

PDFtk not found
If you installed it in a non-default location, click OK and you will get a browse-for-file dialog:

Navigate to PDFtk
Navigate to pdftk.exe and select it.

The next step is to select the root folder:

Select root folder
You may navigate to it or type/paste it in. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not you include the ending backslash in the path). It checks to see if you entered a folder and if the folder exists. If either is not true, it gives you the opportunity to try again or exit the program:

Root folder not specified
Root folder does not exist
Note: whether or not a folder can be reported as null with the browse-for-folder dialog depends on the operating system, so the program checks for it.

Now it's time to enter the parameters for the run:

Enter-Parameters.jpg
In the top box, enter the name for the combined/merged file (without the .PDF file type). You must enter a name, otherwise it shows this:

Combined file name not specified
The program then checks for characters that are invalid in a file name and displays this dialog if it finds any:

Invalid character
Once the file name is valid, it appends .pdf as the file type. For example, if you enter

combined PDF file

in the box, then the name of each combined/merged file will be

combined PDF file.pdf

Then select a radio button for the order in which to combine/merge the files (default is By file name). Finally, select a radio button for which folders to process: subfolders only (the default), the root folder only, or the root folder and subfolders.

The program now processes the selected folders (if subfolders are selected, they are processed to an unlimited depth). It calls PDFtk to combine all of the PDF files in each subfolder into a single PDF file in each subfolder, with a file name as described above. During processing, it displays a green progress bar that moves to the right so that you know it is working, not hanging. The progress bar also displays the name of the current subfolder being processed and the percentage completion:

Progress Bar
The percentage completion is based on the number of folders (not on time or number of PDFs).

If a call to PDFtk results in an error code, you will see this dialog:

PDFtk fatal error
It shows you the folder causing the error so you may investigate that folder to determine the problem. The most common reason for this error is that an input file has the same name as the output file — PDFtk does not allow this. This would happen if you ran the program a second time, giving it the same combined file name without having first deleted (or moved) the combined file from the previous run.

When the program finishes, it displays this Operation Completed dialog:

Operation-Completed.jpg
The operational statistics are stored in a plain text file in the source folder. The file name of this results file is "Operational_Statistics_YY-MM-DD_HH.MM.SS.txt", where the date/time is the beginning time of the run. Since seconds (SS) are in the file name, it is not possible to have a duplicate file (so there's no issue with respect to overwriting a file).

The results file contains this: 

Operational Statistics from Combine-Merge-PDFs
Name of merged file: combined PDF file.pdf
Root folder: D:\0tempD\test combine\
Sort order: By file name
Folders processed: Root folder and subfolders
Number of folders processed: 11
Beginning date and time: 2014-08-26_01.52.39
Ending date and time: 2014-08-26_01.52.42
Elapsed time (minutes:seconds): 0:3

The elapsed time measurement begins after the parameters are entered so that it measures just the processing time, not including the time spent waiting for user input.

If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
7
Ask questions about what you read
If you have a question about something within an article, you can receive help directly from the article author. Experts Exchange article authors are available to answer questions and further the discussion.
Get 7 days free