• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 662
  • Last Modified:

Merge PDF files

My computer has Adobe Acrobat XI Standard.

I have over 5,000 files in a folder and I want to merge the files that have the same Prefix name in the filename. i.e.

12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf

K11200235.pdf
K11200235 Support.pdf


12009-23.pdf


The Results: The first 3 would get merged into 1 file

12345-12.pdf

The next 2 would get merged into i file:

K11200235.pdf

 The last file would get copied or merged by itself -> 12009-23.pdf

and they would be in a destination folder other than the folder with the 5,000 .pdf files.

I found a code , but I don't know why its not working, i haven't scripted before, but I have run a macro or two..please help.


Sub MergeFiles()
 Set fso = CreateObject("Scripting.FileSystemObject")
 sFolder = "C:\test\"
 Set oFolder = fso.GetFolder(sFolder)

 bFirstDoc = True

 If oFolder.Files.Count < 2 Then
 MsgBox "You need to have at least two PDF files in the same folder to merge."
 Call fso.CopyFile(oFolder.Files.Name, oFolder & "\Results")
 Exit Sub
 End If

 Set AcroApp = CreateObject("AcroExch.App")
 Set oMainDoc = CreateObject("AcroExch.PDDoc")
 Set oTempDoc = CreateObject("AcroExch.PDDoc")

 For Each oFile In oFolder.Files
 If LCase(Left(oFile.Name, 8)) = ".pdf" Then

 If bFirstDoc Then
 bFirstDoc = False

 oMainDoc.Open sFolder & "\" & oFiles.Name
 Else


 oTempDoc.Open sFolder & "\" & oFiles.Name
 oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
 oTempDoc.Close
 End If

 End If
 Next
 
 oMainDoc.Save 1,sFolder & "\Output.pdf"
 oMainDoc.Close
 MsgBox "Done! See Output.pdf file."

AcroExchapp.exit
Set AcroExch.App = Nothing
Set oMainDoc = Nothing
Set oTempDoc = Nothing

 End Sub
0
lynmke
Asked:
lynmke
  • 9
  • 9
1 Solution
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
I can't help you with that script, but I may be able to modify an article and program that I wrote here at EE called How To Combine-Merge-Append a Large Batch of TIFF Files. That program combines TIFF files based on the file prefixes, just as you're looking for, utilizing a program called IrfanView, with its "/multitif" option. It so happens that a recent release of IrfanView (Version 4.36, released 27-Jun-2013) introduced a new option called "/multipdf", which performs the same function as "/multitif", but with PDF files (latest release of IrfanView is 4.37, released 16-Dec-2013). Let me know if this approach interests you and I'll start looking into it. If it doesn't interest you, I'm sure some other expert will jump in to help with your code, but VB is not an area of my expertise. Regards, Joe
0
 
lynmkeAuthor Commented:
Thanks Joe. The script above should work apparently, since I have Acrobat Standard version,  ( atleast after tweaking).
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
OK, but one question for you. I notice that the number of lead-in characters varies in your files. For example, these have 8 lead-in characters:

12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf

12009-23.pdf

But these have 9 lead-in characters:

K11200235.pdf
K11200235 Support.pdf

Now, consider these two files:

K1120023.pdf
K11200235.pdf

With an 8-character lead-in, these two files would be combined into the same file; with a 9-character lead-in, they would come out of the process as two separate files. How would you handle these two files? Thanks, Joe
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
lynmkeAuthor Commented:
Hi Joe,

Yes, if the file is as above then they would be separate files.

However, I doubled checked and I have only 3 types of formats to merge  as below ( there is always a space between the numbers and text on the 1st two examples below);

K11200235 Support.pdf
K11200235 Additional.pdf

or

12-1234 Support.pdf
13-2311 Additional.pdf


or

RD9876_DCE.pdf
RD9876_GRF.pdf

sorry for the confusion.
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
Here's the problem. Suppose you have:

RD9876_DCE.pdf
RD9876_DRF.pdf

Do these get merged into a single file called <RD9876_D.pdf>, or a single file called <RD9876_.pdf>, or a single file called <RD9876.pdf>, etc., or do they come out of the process as <RD9876_DCE.pdf> and <RD9876_DRF.pdf>?

You could say that an underscore is a separator, which would resolve the example above, but how about this  example:

RD9876DCE.pdf
RD9876DRF.pdf

Do these get merged into a single file called <RD9876D.pdf>, or a single file called <RD9876.pdf>, or a single file called <RD987.pdf>, or a single file called <RD98.pdf>, etc., or do they come out of the process as <RD9876DCE.pdf> and <RD9876DRF.pdf>?

Seems to me that declaring the number of lead-in characters to match is critical. Otherwise, ambiguities like the above could exist. Regards, Joe
0
 
lynmkeAuthor Commented:
Yes Joe, declaring the number in lead in characters is critical. In this case, only this prefixes matter.

K11200235

12-1234

RD9876

i am also running a different code that I found, its almost running..getting erros in line 44 -> Type mismatch: 'UBound' ( see below)

Set fso = CreateObject("Scripting.FileSystemObject")

sFolder = "C:\test\"
dFolder = "C:\test\final"
 Set oFolder = fso.GetFolder(sFolder)
Dim file_group

'Sort the list in the Array name.
 'listArray = SortedFiles(oFolder)
 'listArray = SortedFiles(sFolder)
 file_names = SortedFiles(sFolder)
 
'msgbox "file_names : " & file_names(1)
 
'listArray = Quicksort(file_names, 1, oFolder.Files.Count)
 listArray = Quick_Sort(file_names, 1, oFolder.Files.Count - 1)
 
'msgbox "testa " & listArray(0) & " testb " & listArray(1)
 
f_filename = ""
 l_filename = ""
 'file_group(0) = ""
 'msgbox uBound(listArray)
 For x = 0 To UBound(listArray)
 f_filename = listArray(x)
 i = x + 1
MsgBox "listArray " & listArray(i)
 Do While InStr(1, listArray(i), f_filename, 1) > 0
ReDim Preserve file_group(i)
 file_group(i) = listArray(i)
 i = i + 1
 MsgBox "Step1"
 Loop
 x = i
MergePDFFiles (file_group)
 
ReDim file_group(0)
 
Next
MsgBox "Done"
 
Function MergePDFFiles(ByRef pdf_files)
bFirstDoc = True
recs = UBound(pdf_files)
 If recs < 2 Then
'If oFolder.Files.Count < 2 Then
' MsgBox "needed 2 pdf."
Set oMainDoc = CreateObject("AcroExch.PDDoc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
Exit Function
End If
'For Each oFile In oFolder.Files
For i = 0 To UBound(pdf_files)
 MsgBox "MergePDFFiles"
 If bFirstDoc Then
bFirstDoc = False
Set oMainDoc = CreateObject("AcroExch.PDDoc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
Else
Set oTempDoc = CreateObject("AcroExch.PDDoc")
oTempDoc.Open sFolder & "\" & pdf_files(i) & ".pdf"
oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
oTempDoc.Close
End If
Next

oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
oTempDoc.Close
 'MsgBox "ok"

End Function

' Return an array containing the names of the
 ' files in the directory sorted alphabetically.
 Function SortedFiles(dir_path)
 Dim file_names
 Set fso = CreateObject("Scripting.FileSystemObject")

' Get the FSO Folder (directory) object.
 Set fso_folder = fso.GetFolder(dir_path)
 
' Make the list of names.
 ReDim file_names(fso_folder.Files.Count)
'msgbox "filecount " & fso_folder.Files.Count
i = 0
 For Each fso_file In fso_folder.Files
 'MsgBox "SortFiles"
 file_names(i) = Mid(fso_file.Name, 1, Len(fso_file.Name) - 4) 'File name minus the extension.
 i = i + 1
 ntemp = file_names(i)
 'MsgBox i & " " & ntemp
Next 'fso_file
 
' Sort the list of files.
 'Quick_sort file_names, 1, fso_folder.Files.Count
 
' Return the sorted list.
 SortedFiles = file_names
 
End Function
 
Function Quick_Sort(ByRef SortArray, ByRef First, ByRef Last)
 'Dim Low As Long, High As Long
 'Dim Temp As Variant, List_Separator As Variant
 Dim List_Separator
Low = First
 High = Last
 'msgbox "QuickSorta " & SortArray(0) & "QuickSortb " & SortArray(1)
 List_Separator = SortArray((First + Last) / 2)
 Do
 Do While (SortArray(Low) < List_Separator)
 Low = Low + 1
 Loop
 Do While (SortArray(High) > List_Separator)
 High = High - 1
 Loop
 If (Low <= High) Then
 Temp = SortArray(Low)
 SortArray(Low) = SortArray(High)
 SortArray(High) = Temp
 Low = Low + 1
 High = High - 1
 End If
 Loop While (Low <= High)
 If (First < High) Then Quick_Sort SortArray, First, High
 If (Low < Last) Then Quick_Sort SortArray, Low, Last

'msgbox "ArrayCount: " & UBound(SortArray)
'For i = 0 To UBound(SortArray)
 ' msgbox "fortest: " & SortArray(i)
 'Next

'Return the sorted list
 Quick_Sort = SortArray

End Function
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
I'm thinking of a more general program that would work for lots of folks with a similar, if not identical, situation. In your case, you're saying that

12-1234

is the prefix (although I suspect that's a typo and you really meant to say 12-12345). Someone else might think the prefix is

12

Thus, you would say that the number of characters needed to match in the case above is 7 (or 8, depending on the typo), while someone else might say it's 2.

Sorry, can't help with the VB Script...outside of my expertise. To get more of the right experts in the mix, you may want to change your Topics to:

VB Script
Visual Basic Classic
Visual Basic.NET

Regards, Joe
0
 
lynmkeAuthor Commented:
Thanks Joe,

I hope someone with VB knowledge can assist checking the code. somehow I need to put a condition that will choose the first characters of the .pdf filename.
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
In this case, only this prefixes matter.
K11200235
12-1234
RD9876
Are you saying that these three are the only prefixes you have?
0
 
lynmkeAuthor Commented:
Hi Joe,

No, the "K11200235" filename can also be just numbers "11200235".

I figured out that I could invoke a f fileName.Contains(phrase) Then Return True...for each filename fortmat before the first Ubound loop in the above script.. but "How" should i frame this condition?

Thanks!
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
I haven't heard back from you on my previous question, but if those are the only three prefixes that matter, I have a 3-line solution for you. The PDF Toolkit (PDFtk) is an excellent (free!) product that I've been using for many years. It has numerous features to manipulate PDFs and comes in both command line and GUI versions. The command line version is called PDFtk Server and may be downloaded here:
http://www.pdflabs.com/tools/pdftk-server/

Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable (pdftk.exe, with a supporting DLL, libiconv2.dll) that runs on XP, Vista, W7, and W8 (it does not have to run on a "server" OS...it also runs on Mac, but I've never used it on that).

Here's the 3-line solution for you using the PDFtk command line:

pdftk D:\FolderIn\K11200235*.pdf cat output D:\FolderOut\K11200235.pdf
pdftk D:\FolderIn\12-1234*.pdf cat output D:\FolderOut\12-1234.pdf
pdftk D:\FolderIn\RD9876*.pdf cat output D:\FolderOut\RD9876.pdf

The "cat" operation "catenates" (joins/merges/combines) the input files into the output file. So those three lines will do it for you if those are the only three prefixes. If there are others, you can add a line for each one with the appropriate wildcarded file name for the input files (prefix*.pdf) and the appropriate file name for the combined output file (prefix.pdf). If that works for you, great. However, I was thinking of a more general solution that would help lots of other folks with a similar problem, but where it's not an easy task to specify all of the possible input prefixes.

Btw, If you'd like to see the full syntax for the PDFtk command line and some usage examples, here are the links:
http://www.pdflabs.com/docs/pdftk-man-page/
http://www.pdflabs.com/docs/pdftk-cli-examples/

If PDFtk doesn't work for you, then I hope a VB expert comes along soon. :)  Regards, Joe
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
It took me a while to write that last response and I had a browser tab open with the question, so I didn't see your reply until after I hit submit. Anyway, you could certainly add this line:

pdftk D:\FolderIn\11200235*.pdf cat output D:\FolderOut\11200235.pdf

Or are you saying that the <11200235*.pdf> files should wind up in the same combined file as the <K11200235*.pdf> files? If so, should the combined file be called <K11200235.pdf> or <11200235.pdf>?
0
 
lynmkeAuthor Commented:
Well, I have to use like Left (filenanme, 8..) or something, so that the condition compares the first 8 or 9 characters without being specific ..i have 5,000 pdf files with different number  or character format combination.

This K11200235.pdf is different file from 11200235.pdf, so these two will be different, however files

K11200235 Support.pdf
K11200235 Additional.pdf
K11200235.pdf


will be merge as K11200235.pdf. and so on..

p.s. thanks offline shortly ..its 1300hrs eastern
0
 
lynmkeAuthor Commented:
any ideas
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
Let's forget about programming language for a moment and define the exact specifications. You can't fix the VB code (or code in any language) if you don't know what you're trying to fix it to do!

Let's take your most recent example:

K11200235 Support.pdf
K11200235 Additional.pdf
K11200235.pdf

will be merged into:

K11200235.pdf

Since you have 5,000 PDF files with different character/number format combinations, it's possible that you could have these files:

K112002356 Support.pdf
K112002356 Additional.pdf
K112002356.pdf

I presume these would get merged into:

K112002356.pdf

But, since they share the same 9 lead-in characters with K11200235, you'd have to be careful about not merging them into:

K11200235.pdf

Are you willing to ignore this problem? If not, it's very tricky; if so, here's a possible description for the exact specifications of the program:

(1) Sort the file names alphabetically ascending, but giving special treatment to the dot/period ("."), because it sorts after space, comma, hyphen, and others, but before digits, at-sign, and others. In other words, the sorted list should be:

K112002356.pdf
K112002356 Additional.pdf
K112002356 Support.pdf

But if you sort ascending the full file name, including file type, it would be this (since space sorts before dot):

K112002356 Additional.pdf
K112002356 Support.pdf
K112002356.pdf

(2) Loop through the alphabetical list looking at all of the characters right before the ".pdf" (call this Prefix) and count the number of characters (call this Prefix_Count). For example, in the file:

K11200235.pdf

The Prefix is K11200235 and the Prefix_Count is 9.

(3) Each time the program looks at a file in the list, it compares the Prefix of the current file with the Prefix of the previous file. Since they are sorted alphabetically, a match on Prefix_Count characters means the current file should be merged with the previous file, and so on, until it finds a Prefix that doesn't match, which then starts a new file. The glitch, of course, is that Prefix K11200235 would match Prefix K112002356 for Prefix_Count (9) characters, which is why I asked if you're willing to ignore this problem. The only way I can think of to solve this nasty problem is to handle the longer Prefix_Count files first. In other words, if you process K112002356 (Prefix_Count of 10) before K11200235, it will work. That's what I meant by tricky – you'd have to sort the file list by Prefix alphabetically ascending and then sort by Prefix_Count descending within that.

Regards, Joe
0
 
lynmkeAuthor Commented:
Thanks Joe!
0
 
Joe Winograd, EE MVE 2015&2016DeveloperCommented:
You're welcome! Good luck with the project. Regards, Joe
0
 
bobmarishCommented:
Read the blog posted at following web link: http://pdfutility.blogspot.com/2013/11/manage-your-large-sized-pdf-documents.html and know easy way to PDF split and merge provides you with an ideal solution to split or even merge the PDF document files, as per your requirement.
0
 
lynmkeAuthor Commented:
Thanks bobmarish
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

  • 9
  • 9
Tackle projects and never again get stuck behind a technical roadblock.
Join Now