Link to home
Start Free TrialLog in
Avatar of E=mc2
E=mc2Flag for Canada

asked on

Combine PDFs into one file in multiple folder

I have 1 folder called PDFFILES, and in that folder that are many suborders.
In each subfolder, either 1 or 2 or 3 pdfs might be found there.

I want to run a script, that will look in the main PDFILES subfolder.
Then it needs to look into each subfolder and if it finds more than 1 pdf in each subfolder, I would like them to be joined all in 1 pdf file, per subfolder.

Can this be done?
Many thanks for all your assistance.
Avatar of Lee W, MVP
Lee W, MVP
Flag of United States of America image

It can be, but not DIRECTLY in a vbscript as far as I can tell (without potentially purchasing third party software).  At one client, I have a vbscript that launches a free command line tool to do this.  The vbscript locates the files I need, copies them to a working directory after renaming so I can assemble in a specific order, and assembles them into a single PDF.  Your circumstance may be easier if the files are all in the same directory and ALL need to be combined.
Avatar of E=mc2

ASKER

Thanks kindly.  And to clarify, in each subfolder, there would be files which have the same exact 5 digits.  Such as 12345Q.pdf, 12345A.pdf, and perhaps even 12345C.pdf etc.. and I want them all combined into one file called 12345.pdf.

what software do you recommend, even if I need to purchase this?
Avatar of Bill Prew
Bill Prew

PDFtk is a great free / low cost option for merge and split type manipulations and has command line support.

You could do all of what you want from a fairly simple BAT script if you wanted to go that route?

https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/


»bp
PDKtk is what I've been using as the command line tool to combine things with the vbScript.
Just to give you an idea, here's an approach in BAT script, pretty simple, you just have to edit the SET statements near the top...

@echo off
setlocal EnableDelayedExpansion

rem Define location of files and folders
set BaseDir=B:\EE\EE29072521\PDFFILES
set DestDir=B:\EE\EE29072521\PDFMERGE
set PDFtk=C:\_pf\PDFtk\bin\pdftk.exe

rem Look at each subfolder in the base source folder
for /d %%D in ("%BaseDir%\*.*") do (

    rem Look at the PDF files in this folder and get a name from one
    for %%F in ("%%~D\*.pdf") do (
        set Name=%%~nF
    )

    rem Remove the rightmost character for the name to get the name for the merged PDF file
    set Name=!Name:~0,-1!

    rem Merge all PDF files into one and store in the destination folder
    "%PDFtk%" "%%~D\*.pdf" cat output "%DestDir%\!Name!.pdf"
)

Open in new window


»bp
SOLUTION
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of E=mc2

ASKER

Thanks. I would like to try CAPTAIN for sure.. thanks,
Avatar of E=mc2

ASKER

Thanks Bill.  I tried your script however a few questions.. it's not merging the files at all, and I wanted to know if the PDFFILES and the PDFMERGE folders need to be different?
Can I change the"%%~D\*.pdf") be changed to Z, instead of D?
It worked as expected in a test here, did you get any errors?

Thanks Bill.  I tried your script however a few questions.. it's not merging the files at all, and I wanted to know if the PDFFILES and the PDFMERGE folders need to be different?

No, they don't need to be different, I set it up that way to be flexible.

Can I change the"%%~D\*.pdf") be changed to Z, instead of D?

Why do you want to change that, it's just the reference to the loop variable %%D, it doesn't relate to the actual file name.


»bp
Out of my office now on my mobile. Will reply properly when I return, probably in an hour or two.
Avatar of E=mc2

ASKER

HI Bill, the script does not work then. It says it can't find the path... it looks like it identified the files in each folder but then the part right after that, well it's not working.  Could it be the pdftk path that you specified?
Well, you would need to change that PDFtk path to be wherever you installed the software, which I'm sure is different than where I did...


»bp
Avatar of E=mc2

ASKER

Hi Bill...Ok I changed the path, but now it takes the combined file and put's it in the destination folder, but not in the original subfolder it was created in, and the original files are still there they are not deleted.
Okay, try this adjustment.  Test carefully!

@echo off
setlocal EnableDelayedExpansion

rem Define location of files and folders
set BaseDir=B:\EE\EE29072521\PDFFILES
set PDFtk=C:\_pf\PDFtk\bin\pdftk.exe

rem Look at each subfolder in the base source folder
for /d %%D in ("%BaseDir%\*.*") do (

    rem Look at the PDF files in this folder and get a name from one
    for %%F in ("%%~D\*.pdf") do (
        set Name=%%~nF
    )

    rem Remove the rightmost character for the name to get the name for the merged PDF file
    set Name=!Name:~0,-1!

    rem Merge all PDF files into one and store in the destination folder
    "%PDFtk%" "%%~D\*.pdf" cat output "%%~D\!Name!.pdf"

    rem Delete all put merged PDF file...
    ren "%%~D\!Name!.pdf" "!Name!.xxx"
    del /q "%%~D\*.pdf"
    ren "%%~D\!Name!.xxx" "!Name!.pdf"
)

Open in new window


»bp
> the original files are still there they are not deleted

I've had many users ask for this feature in my programs that combine files and I always recommend against it strongly. The reason is that if the combining/merging process goes haywire, you don't want to delete the source files. My advice is always to wait until you know that the combining process has been 100% successful before deleting the source files, which means doing the deletions in a separate step after the combining process. All of that said, since you really seem to want a "Delete the source files after they are combined" option, I put it in the GUI with this dialog:

User generated image
Note that I made the No button the default so that an accidental Enter key won't select the "Delete the source files after they are combined" option. I added it to the CLI via a K (Keep) or D (Delete) option as the second parameter on the command line, moving the Source and Destination folders to the third and fourth parameters.

Also, I added the Recurse Subfolders feature to the GUI with this dialog:

User generated image
So, you'll be able to use the GUI for your purposes if you feel more comfortable with that (rather than the CLI). Regards, Joe
Avatar of E=mc2

ASKER

Thanks Joe. I don't have the CAPTAIN program to try yet though, however thanks for the screen shots.
Avatar of E=mc2

ASKER

Thanks Bill.  I tried the script, however sometimes in a subfolder, there will be on file only with an A to Z appended to the end fo the 5 digits, like this..  12345.pdf...  when it finds a file like this, it takes off the last digit.
Only if it finds to files that start with the same 5 digits, within the same subfolder and if those files contain a letter at the end, then it should just join the two files, and remove the letter at the end..  
So in  practical scenario, let's say in one subfolder it finds 98765A.pdf and 98765B.pdf and also 98765.pdf.... it should join all three into 98765.pdf and delete the other files.
You're welcome, 100questions (???). I'm not ready for distribution to everyone on the Internet (yet), so I'll write to you in the EE Message system to make arrangements for you to get it. I need to do some additional testing/QA, especially of the new features, create an installer and upload it, and update the Quick Start Guide, but I should be able to do all that in the next few hours. Regards, Joe
Okay, give this small change a try.

@echo off
setlocal EnableDelayedExpansion

rem Define location of files and folders
set BaseDir=B:\EE\EE29072521\PDFFILES
set PDFtk=C:\_pf\PDFtk\bin\pdftk.exe

rem Look at each subfolder in the base source folder
for /d %%D in ("%BaseDir%\*.*") do (

    rem Look at the PDF files in this folder and get a name from one
    for %%F in ("%%~D\*.pdf") do (
        set Name=%%~nF
    )

    rem Remove the rightmost character for the name to get the name for the merged PDF file
    set Name=!Name:~0,5!

    rem Merge all PDF files into one and store in the destination folder
    "%PDFtk%" "%%~D\*.pdf" cat output "%%~D\!Name!.tmp"

    rem Delete all put merged PDF file...
    del /q "%%~D\*.pdf"
    ren "%%~D\!Name!.tmp" "!Name!.pdf"
)

Open in new window


»bp
Hi 100questions,
I just sent you a PM via the EE Message system. Looking forward to hearing back from you. Regards, Joe
Avatar of E=mc2

ASKER

Bill, thanks for this change, however now it combines anything it finds in a subfolder all together, regardless of the first 5 digits, if they are they same or not.
All of my script so far have assumed that only one set of files to be combined exist in each subfolder.  Are you saying now that is not true, and that there could be:

11111.pdf
22222.pdf
22222A.pdf
33333A.pdf
33333B.pdf

all in the same subfolder, and you want three different PDF's created?

11111.pdf
22222.pdf
33333.pdf


»bp
all in the same subfolder, and you want three different PDF's created?

11111.pdf
22222.pdf
33333.pdf
As you can see from the Quick Start Guide that I posted, that's what CAPTAIN does, so I hope the answer to Bill's question is Yes. :)
Avatar of E=mc2

ASKER

Hi Bill, yes exactly..
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> Tested and working solution provided.

Yes, deserves to be a solution. Haven't tested it myself, but have no doubts that it works — 100% trust Bill's comment on that.

> Since Joe's beta test solution was done outside the question

I don't know what "outside the question" means. I am simply recommending a software package that needs to be downloaded and installed, same as I and other members have done thousands of times here at EE. My list of recommended software products includes AutoHotkey, GIMP, GraphicsMagick, ImageMagick, IrfanView, the NirSoft utilities, PaperPort, PDFtk, Power PDF, the Xpdf utilities — the list is huge. This one, CAPTAIN, happens to be my own product. I don't want to publish the download site publicly, but am happy to provide the download link to any EE member who requests it via PM. I don't think that is any more "outside the question" than providing a download link to other software packages — the only difference is in public exposure.

CAPTAIN does exactly what was requested in this question and has been confirmed in production usage by many users, as well as my own internal testing and QA prior to releasing it to users over a several year period. As such, I am objecting to Bill's close. I would be happy to see his #a42399975 post as the Accepted Solution, but believe that my #a42395610 post (the one with the CAPTAIN Quick Start Guide attached) is worthy as an Assisted Solution.

Btw, CAPTAIN is not "beta test" (Bill's words). It is in production usage by many users. I do currently have an enhanced beta version of CAPTAIN that is capable of combining PDF files or TIFF files, but that's a post for another day. :)  Regards, Joe
I recommend closing this in a different way from the previous suggestion, as explained in my #a42415506 post.