How to Combine-Merge PDF Files in Many Subfolders

Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT
Published:
Updated:
Edited by: Andrew Leniart
Article Update 13-March-2020: I removed the source code. The article that remains should act as a "design roadmap" for members who want to write the code in the programming language of your choice. If you are interested in discussing the program further, please contact me via the EE message system.

INTRODUCTION

This article presents a solution to a question asked here at Experts Exchange. The situation is that there's a large number of subfolders (400 in the original question), each of which has a number of PDF files (two in the original question). The goal is to combine/merge the PDF files in each subfolder (in ascending date order) into a single PDF file, storing the combined file in each subfolder. The source PDF files in each subfolder may have any file names and the user should be able to specify the file name of the combined file.

REQUIRED SOFTWARE

The method presented in this article requires AutoHotkey, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started. After installation, AutoHotkey will own the AHK file type, supporting the solution discussed in the remainder of this article.

The program utilizes another excellent (free!) piece of software — PDF Toolkit (PDFtk). It comes in both command line and GUI versions. The command line version is called PDFtk Server. Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable — (with a supporting DLL, ) that runs on XP, Vista, W7, and W8. That is, it does not have to run on a "server" OS.

In order to implement the solution in this article, you must have AutoHotkey and PDFtk on your computer (downloads are available at the links above). The solution should work on XP, Vista, W7, and W8 (32-bit and 64-bit), but I did all of the development and testing on W7/64-bit, so that is the only certified platform.

RUNNING THE PROGRAM

Download the source code for the program, which is attached to this article in a plain text file called Combine-Merge-PDF-files-20140826.ahk. After installing AutoHotkey, it will own the file type AHK, so simply double-click on the downloaded source code file in Windows/File Explorer (or whatever file manager you prefer) to run it. You may also compile the source code into a stand-alone/no-install executable by right-clicking on the source code file and selecting Compile Script:



After compiling it, simply run the EXE file that the compiler created.

SOLUTION DESCRIPTION

The solution works on any number of subfolders and any number of PDF files in each subfolder (it ignores non-PDF files). It provides an option to combine/merge the files sorted in three ways — by file name, by modified date ascending (oldest first), by modified date descending (newest first).

The remainder of this section discusses the solution in detail by going through the user interface, showing the screenshots from various executions of the program (all screenshots are from a W7/64-bit system).

The first step is to check for the installation of PDFtk Server by looking for in default locations. If it isn't found, you will see this error dialog:


If you installed it in a non-default location, click OK and you will get a browse-for-file dialog:


Navigate to pdftk.exe and select it.

The next step is to select the root folder:


You may navigate to it or type/paste it in. It looks for an ending backslash on the path name and if one was not entered, it appends one (in other words, it works whether or not you include the ending backslash in the path). It checks to see if you entered a folder and if the folder exists. If either is not true, it gives you the opportunity to try again or exit the program:



Note: whether or not a folder can be reported as null with the browse-for-folder dialog depends on the operating system, so the program checks for it.

Now it's time to enter the parameters for the run:


In the top box, enter the name for the combined/merged file (without the .PDF file type). You must enter a name, otherwise it shows this:


The program then checks for characters that are invalid in a file name and displays this dialog if it finds any:


Once the file name is valid, it appends .pdf as the file type. For example, if you enter

combined PDF file

in the box, then the name of each combined/merged file will be

combined PDF file.pdf

Then select a radio button for the order in which to combine/merge the files (default is By file name). Finally, select a radio button for which folders to process: subfolders only (the default), the root folder only, or the root folder and subfolders.

The program now processes the selected folders (if subfolders are selected, they are processed to an unlimited depth). It calls PDFtk to combine all of the PDF files in each subfolder into a single PDF file in each subfolder, with a file name as described above. During processing, it displays a green progress bar that moves to the right so that you know it is working, not hanging. The progress bar also displays the name of the current subfolder being processed and the percentage completion:


The percentage completion is based on the number of folders (not on time or number of PDFs).

If a call to PDFtk results in an error code, you will see this dialog:


It shows you the folder causing the error so you may investigate that folder to determine the problem. The most common reason for this error is that an input file has the same name as the output file — PDFtk does not allow this. This would happen if you ran the program a second time, giving it the same combined file name without having first deleted (or moved) the combined file from the previous run.

When the program finishes, it displays this Operation Completed dialog:


The operational statistics are stored in a plain text file in the source folder. The file name of this results file is "Operational_Statistics_YY-MM-DD_HH.MM.SS.txt", where the date/time is the beginning time of the run. Since seconds (SS) are in the file name, it is not possible to have a duplicate file (so there's no issue with respect to overwriting a file).

The results file contains this:

Operational Statistics from Combine-Merge-PDFs
Name of merged file: combined PDF file.pdf
Root folder: D:\0tempD\test combine\
Sort order: By file name
Folders processed: Root folder and subfolders
Number of folders processed: 11
Beginning date and time: 2014-08-26_01.52.39
Ending date and time: 2014-08-26_01.52.42
Elapsed time (minutes:seconds): 0:3

The elapsed time measurement begins after the parameters are entered so that it measures just the processing time, not including the time spent waiting for user input.

If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
7
10,079 Views
Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT

Comments (25)

Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Well, that's good news. In addition to PDFtk, you can use all the Xpdf tools, as well as the NirSoft utilities. Great stuff!

Also, using the AutoHotkey compiler on your own computer (which is installed as part of a standard AutoHotkey installation), you can compile the program described in this article (How to Combine-Merge PDF Files in Many Subfolders) into a stand-alone/no-install EXE file which can then be run on your client's machines. Indeed, any AutoHotkey program can be compiled into a stand-alone/no-install EXE file with the AutoHotkey compiler — see the first screenshot in this article. Regards, Joe
Dear sir,
Kind Attn. Mr. Joe Winograd
Requesting you to kindly re-attach the removed source code as I am desperately looking for this particular solution for merging several pdf files in to single pdf from multiple subfolders. This is what I need pl. re-upload. If the software is updated with some added features requesting you to pl. re-upload it soon.

Regards
Pinakin
Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Hi Pinakin,
I am not yet ready to re-attach the source code. However, I did receive your email and may be able to help you in another way, since you have an immediate need. I'll reply to your email soon. Regards, Joe
Hi

Will the "Combine-Merge-PDF-files-20140826.ahk"  file not be attached again?
Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Hi Centex,
I've decided not to post the full program. I'll be rewriting the article as a "design roadmap" with some crucial code snippets, such as how to call PDFtk Server, but will not be posting the complete source code. Regards, Joe

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.