lynmke
asked on
Merge PDF files
My computer has Adobe Acrobat XI Standard.
I have over 5,000 files in a folder and I want to merge the files that have the same Prefix name in the filename. i.e.
12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf
K11200235.pdf
K11200235 Support.pdf
12009-23.pdf
The Results: The first 3 would get merged into 1 file
12345-12.pdf
The next 2 would get merged into i file:
K11200235.pdf
The last file would get copied or merged by itself -> 12009-23.pdf
and they would be in a destination folder other than the folder with the 5,000 .pdf files.
I found a code , but I don't know why its not working, i haven't scripted before, but I have run a macro or two..please help.
Sub MergeFiles()
Set fso = CreateObject("Scripting.Fi leSystemOb ject")
sFolder = "C:\test\"
Set oFolder = fso.GetFolder(sFolder)
bFirstDoc = True
If oFolder.Files.Count < 2 Then
MsgBox "You need to have at least two PDF files in the same folder to merge."
Call fso.CopyFile(oFolder.Files .Name, oFolder & "\Results")
Exit Sub
End If
Set AcroApp = CreateObject("AcroExch.App ")
Set oMainDoc = CreateObject("AcroExch.PDD oc")
Set oTempDoc = CreateObject("AcroExch.PDD oc")
For Each oFile In oFolder.Files
If LCase(Left(oFile.Name, 8)) = ".pdf" Then
If bFirstDoc Then
bFirstDoc = False
oMainDoc.Open sFolder & "\" & oFiles.Name
Else
oTempDoc.Open sFolder & "\" & oFiles.Name
oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
oTempDoc.Close
End If
End If
Next
oMainDoc.Save 1,sFolder & "\Output.pdf"
oMainDoc.Close
MsgBox "Done! See Output.pdf file."
AcroExchapp.exit
Set AcroExch.App = Nothing
Set oMainDoc = Nothing
Set oTempDoc = Nothing
End Sub
I have over 5,000 files in a folder and I want to merge the files that have the same Prefix name in the filename. i.e.
12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf
K11200235.pdf
K11200235 Support.pdf
12009-23.pdf
The Results: The first 3 would get merged into 1 file
12345-12.pdf
The next 2 would get merged into i file:
K11200235.pdf
The last file would get copied or merged by itself -> 12009-23.pdf
and they would be in a destination folder other than the folder with the 5,000 .pdf files.
I found a code , but I don't know why its not working, i haven't scripted before, but I have run a macro or two..please help.
Sub MergeFiles()
Set fso = CreateObject("Scripting.Fi
sFolder = "C:\test\"
Set oFolder = fso.GetFolder(sFolder)
bFirstDoc = True
If oFolder.Files.Count < 2 Then
MsgBox "You need to have at least two PDF files in the same folder to merge."
Call fso.CopyFile(oFolder.Files
Exit Sub
End If
Set AcroApp = CreateObject("AcroExch.App
Set oMainDoc = CreateObject("AcroExch.PDD
Set oTempDoc = CreateObject("AcroExch.PDD
For Each oFile In oFolder.Files
If LCase(Left(oFile.Name, 8)) = ".pdf" Then
If bFirstDoc Then
bFirstDoc = False
oMainDoc.Open sFolder & "\" & oFiles.Name
Else
oTempDoc.Open sFolder & "\" & oFiles.Name
oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
oTempDoc.Close
End If
End If
Next
oMainDoc.Save 1,sFolder & "\Output.pdf"
oMainDoc.Close
MsgBox "Done! See Output.pdf file."
AcroExchapp.exit
Set AcroExch.App = Nothing
Set oMainDoc = Nothing
Set oTempDoc = Nothing
End Sub
I can't help you with that script, but I may be able to modify an article and program that I wrote here at EE called How To Combine-Merge-Append a Large Batch of TIFF Files. That program combines TIFF files based on the file prefixes, just as you're looking for, utilizing a program called IrfanView, with its "/multitif" option. It so happens that a recent release of IrfanView (Version 4.36, released 27-Jun-2013) introduced a new option called "/multipdf", which performs the same function as "/multitif", but with PDF files (latest release of IrfanView is 4.37, released 16-Dec-2013). Let me know if this approach interests you and I'll start looking into it. If it doesn't interest you, I'm sure some other expert will jump in to help with your code, but VB is not an area of my expertise. Regards, Joe
ASKER
Thanks Joe. The script above should work apparently, since I have Acrobat Standard version, ( atleast after tweaking).
OK, but one question for you. I notice that the number of lead-in characters varies in your files. For example, these have 8 lead-in characters:
12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf
12009-23.pdf
But these have 9 lead-in characters:
K11200235.pdf
K11200235 Support.pdf
Now, consider these two files:
K1120023.pdf
K11200235.pdf
With an 8-character lead-in, these two files would be combined into the same file; with a 9-character lead-in, they would come out of the process as two separate files. How would you handle these two files? Thanks, Joe
12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf
12009-23.pdf
But these have 9 lead-in characters:
K11200235.pdf
K11200235 Support.pdf
Now, consider these two files:
K1120023.pdf
K11200235.pdf
With an 8-character lead-in, these two files would be combined into the same file; with a 9-character lead-in, they would come out of the process as two separate files. How would you handle these two files? Thanks, Joe
ASKER
Hi Joe,
Yes, if the file is as above then they would be separate files.
However, I doubled checked and I have only 3 types of formats to merge as below ( there is always a space between the numbers and text on the 1st two examples below);
K11200235 Support.pdf
K11200235 Additional.pdf
or
12-1234 Support.pdf
13-2311 Additional.pdf
or
RD9876_DCE.pdf
RD9876_GRF.pdf
sorry for the confusion.
Yes, if the file is as above then they would be separate files.
However, I doubled checked and I have only 3 types of formats to merge as below ( there is always a space between the numbers and text on the 1st two examples below);
K11200235 Support.pdf
K11200235 Additional.pdf
or
12-1234 Support.pdf
13-2311 Additional.pdf
or
RD9876_DCE.pdf
RD9876_GRF.pdf
sorry for the confusion.
Here's the problem. Suppose you have:
RD9876_DCE.pdf
RD9876_DRF.pdf
Do these get merged into a single file called <RD9876_D.pdf>, or a single file called <RD9876_.pdf>, or a single file called <RD9876.pdf>, etc., or do they come out of the process as <RD9876_DCE.pdf> and <RD9876_DRF.pdf>?
You could say that an underscore is a separator, which would resolve the example above, but how about this example:
RD9876DCE.pdf
RD9876DRF.pdf
Do these get merged into a single file called <RD9876D.pdf>, or a single file called <RD9876.pdf>, or a single file called <RD987.pdf>, or a single file called <RD98.pdf>, etc., or do they come out of the process as <RD9876DCE.pdf> and <RD9876DRF.pdf>?
Seems to me that declaring the number of lead-in characters to match is critical. Otherwise, ambiguities like the above could exist. Regards, Joe
RD9876_DCE.pdf
RD9876_DRF.pdf
Do these get merged into a single file called <RD9876_D.pdf>, or a single file called <RD9876_.pdf>, or a single file called <RD9876.pdf>, etc., or do they come out of the process as <RD9876_DCE.pdf> and <RD9876_DRF.pdf>?
You could say that an underscore is a separator, which would resolve the example above, but how about this example:
RD9876DCE.pdf
RD9876DRF.pdf
Do these get merged into a single file called <RD9876D.pdf>, or a single file called <RD9876.pdf>, or a single file called <RD987.pdf>, or a single file called <RD98.pdf>, etc., or do they come out of the process as <RD9876DCE.pdf> and <RD9876DRF.pdf>?
Seems to me that declaring the number of lead-in characters to match is critical. Otherwise, ambiguities like the above could exist. Regards, Joe
ASKER
Yes Joe, declaring the number in lead in characters is critical. In this case, only this prefixes matter.
K11200235
12-1234
RD9876
i am also running a different code that I found, its almost running..getting erros in line 44 -> Type mismatch: 'UBound' ( see below)
Set fso = CreateObject("Scripting.Fi leSystemOb ject")
sFolder = "C:\test\"
dFolder = "C:\test\final"
Set oFolder = fso.GetFolder(sFolder)
Dim file_group
'Sort the list in the Array name.
'listArray = SortedFiles(oFolder)
'listArray = SortedFiles(sFolder)
file_names = SortedFiles(sFolder)
'msgbox "file_names : " & file_names(1)
'listArray = Quicksort(file_names, 1, oFolder.Files.Count)
listArray = Quick_Sort(file_names, 1, oFolder.Files.Count - 1)
'msgbox "testa " & listArray(0) & " testb " & listArray(1)
f_filename = ""
l_filename = ""
'file_group(0) = ""
'msgbox uBound(listArray)
For x = 0 To UBound(listArray)
f_filename = listArray(x)
i = x + 1
MsgBox "listArray " & listArray(i)
Do While InStr(1, listArray(i), f_filename, 1) > 0
ReDim Preserve file_group(i)
file_group(i) = listArray(i)
i = i + 1
MsgBox "Step1"
Loop
x = i
MergePDFFiles (file_group)
ReDim file_group(0)
Next
MsgBox "Done"
Function MergePDFFiles(ByRef pdf_files)
bFirstDoc = True
recs = UBound(pdf_files)
If recs < 2 Then
'If oFolder.Files.Count < 2 Then
' MsgBox "needed 2 pdf."
Set oMainDoc = CreateObject("AcroExch.PDD oc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
Exit Function
End If
'For Each oFile In oFolder.Files
For i = 0 To UBound(pdf_files)
MsgBox "MergePDFFiles"
If bFirstDoc Then
bFirstDoc = False
Set oMainDoc = CreateObject("AcroExch.PDD oc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
Else
Set oTempDoc = CreateObject("AcroExch.PDD oc")
oTempDoc.Open sFolder & "\" & pdf_files(i) & ".pdf"
oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
oTempDoc.Close
End If
Next
oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
oTempDoc.Close
'MsgBox "ok"
End Function
' Return an array containing the names of the
' files in the directory sorted alphabetically.
Function SortedFiles(dir_path)
Dim file_names
Set fso = CreateObject("Scripting.Fi leSystemOb ject")
' Get the FSO Folder (directory) object.
Set fso_folder = fso.GetFolder(dir_path)
' Make the list of names.
ReDim file_names(fso_folder.File s.Count)
'msgbox "filecount " & fso_folder.Files.Count
i = 0
For Each fso_file In fso_folder.Files
'MsgBox "SortFiles"
file_names(i) = Mid(fso_file.Name, 1, Len(fso_file.Name) - 4) 'File name minus the extension.
i = i + 1
ntemp = file_names(i)
'MsgBox i & " " & ntemp
Next 'fso_file
' Sort the list of files.
'Quick_sort file_names, 1, fso_folder.Files.Count
' Return the sorted list.
SortedFiles = file_names
End Function
Function Quick_Sort(ByRef SortArray, ByRef First, ByRef Last)
'Dim Low As Long, High As Long
'Dim Temp As Variant, List_Separator As Variant
Dim List_Separator
Low = First
High = Last
'msgbox "QuickSorta " & SortArray(0) & "QuickSortb " & SortArray(1)
List_Separator = SortArray((First + Last) / 2)
Do
Do While (SortArray(Low) < List_Separator)
Low = Low + 1
Loop
Do While (SortArray(High) > List_Separator)
High = High - 1
Loop
If (Low <= High) Then
Temp = SortArray(Low)
SortArray(Low) = SortArray(High)
SortArray(High) = Temp
Low = Low + 1
High = High - 1
End If
Loop While (Low <= High)
If (First < High) Then Quick_Sort SortArray, First, High
If (Low < Last) Then Quick_Sort SortArray, Low, Last
'msgbox "ArrayCount: " & UBound(SortArray)
'For i = 0 To UBound(SortArray)
' msgbox "fortest: " & SortArray(i)
'Next
'Return the sorted list
Quick_Sort = SortArray
End Function
K11200235
12-1234
RD9876
i am also running a different code that I found, its almost running..getting erros in line 44 -> Type mismatch: 'UBound' ( see below)
Set fso = CreateObject("Scripting.Fi
sFolder = "C:\test\"
dFolder = "C:\test\final"
Set oFolder = fso.GetFolder(sFolder)
Dim file_group
'Sort the list in the Array name.
'listArray = SortedFiles(oFolder)
'listArray = SortedFiles(sFolder)
file_names = SortedFiles(sFolder)
'msgbox "file_names : " & file_names(1)
'listArray = Quicksort(file_names, 1, oFolder.Files.Count)
listArray = Quick_Sort(file_names, 1, oFolder.Files.Count - 1)
'msgbox "testa " & listArray(0) & " testb " & listArray(1)
f_filename = ""
l_filename = ""
'file_group(0) = ""
'msgbox uBound(listArray)
For x = 0 To UBound(listArray)
f_filename = listArray(x)
i = x + 1
MsgBox "listArray " & listArray(i)
Do While InStr(1, listArray(i), f_filename, 1) > 0
ReDim Preserve file_group(i)
file_group(i) = listArray(i)
i = i + 1
MsgBox "Step1"
Loop
x = i
MergePDFFiles (file_group)
ReDim file_group(0)
Next
MsgBox "Done"
Function MergePDFFiles(ByRef pdf_files)
bFirstDoc = True
recs = UBound(pdf_files)
If recs < 2 Then
'If oFolder.Files.Count < 2 Then
' MsgBox "needed 2 pdf."
Set oMainDoc = CreateObject("AcroExch.PDD
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
Exit Function
End If
'For Each oFile In oFolder.Files
For i = 0 To UBound(pdf_files)
MsgBox "MergePDFFiles"
If bFirstDoc Then
bFirstDoc = False
Set oMainDoc = CreateObject("AcroExch.PDD
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
Else
Set oTempDoc = CreateObject("AcroExch.PDD
oTempDoc.Open sFolder & "\" & pdf_files(i) & ".pdf"
oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
oTempDoc.Close
End If
Next
oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
oTempDoc.Close
'MsgBox "ok"
End Function
' Return an array containing the names of the
' files in the directory sorted alphabetically.
Function SortedFiles(dir_path)
Dim file_names
Set fso = CreateObject("Scripting.Fi
' Get the FSO Folder (directory) object.
Set fso_folder = fso.GetFolder(dir_path)
' Make the list of names.
ReDim file_names(fso_folder.File
'msgbox "filecount " & fso_folder.Files.Count
i = 0
For Each fso_file In fso_folder.Files
'MsgBox "SortFiles"
file_names(i) = Mid(fso_file.Name, 1, Len(fso_file.Name) - 4) 'File name minus the extension.
i = i + 1
ntemp = file_names(i)
'MsgBox i & " " & ntemp
Next 'fso_file
' Sort the list of files.
'Quick_sort file_names, 1, fso_folder.Files.Count
' Return the sorted list.
SortedFiles = file_names
End Function
Function Quick_Sort(ByRef SortArray, ByRef First, ByRef Last)
'Dim Low As Long, High As Long
'Dim Temp As Variant, List_Separator As Variant
Dim List_Separator
Low = First
High = Last
'msgbox "QuickSorta " & SortArray(0) & "QuickSortb " & SortArray(1)
List_Separator = SortArray((First + Last) / 2)
Do
Do While (SortArray(Low) < List_Separator)
Low = Low + 1
Loop
Do While (SortArray(High) > List_Separator)
High = High - 1
Loop
If (Low <= High) Then
Temp = SortArray(Low)
SortArray(Low) = SortArray(High)
SortArray(High) = Temp
Low = Low + 1
High = High - 1
End If
Loop While (Low <= High)
If (First < High) Then Quick_Sort SortArray, First, High
If (Low < Last) Then Quick_Sort SortArray, Low, Last
'msgbox "ArrayCount: " & UBound(SortArray)
'For i = 0 To UBound(SortArray)
' msgbox "fortest: " & SortArray(i)
'Next
'Return the sorted list
Quick_Sort = SortArray
End Function
I'm thinking of a more general program that would work for lots of folks with a similar, if not identical, situation. In your case, you're saying that
12-1234
is the prefix (although I suspect that's a typo and you really meant to say 12-12345). Someone else might think the prefix is
12
Thus, you would say that the number of characters needed to match in the case above is 7 (or 8, depending on the typo), while someone else might say it's 2.
Sorry, can't help with the VB Script...outside of my expertise. To get more of the right experts in the mix, you may want to change your Topics to:
VB Script
Visual Basic Classic
Visual Basic.NET
Regards, Joe
12-1234
is the prefix (although I suspect that's a typo and you really meant to say 12-12345). Someone else might think the prefix is
12
Thus, you would say that the number of characters needed to match in the case above is 7 (or 8, depending on the typo), while someone else might say it's 2.
Sorry, can't help with the VB Script...outside of my expertise. To get more of the right experts in the mix, you may want to change your Topics to:
VB Script
Visual Basic Classic
Visual Basic.NET
Regards, Joe
ASKER
Thanks Joe,
I hope someone with VB knowledge can assist checking the code. somehow I need to put a condition that will choose the first characters of the .pdf filename.
I hope someone with VB knowledge can assist checking the code. somehow I need to put a condition that will choose the first characters of the .pdf filename.
In this case, only this prefixes matter.Are you saying that these three are the only prefixes you have?
K11200235
12-1234
RD9876
ASKER
Hi Joe,
No, the "K11200235" filename can also be just numbers "11200235".
I figured out that I could invoke a f fileName.Contains(phrase) Then Return True...for each filename fortmat before the first Ubound loop in the above script.. but "How" should i frame this condition?
Thanks!
No, the "K11200235" filename can also be just numbers "11200235".
I figured out that I could invoke a f fileName.Contains(phrase) Then Return True...for each filename fortmat before the first Ubound loop in the above script.. but "How" should i frame this condition?
Thanks!
I haven't heard back from you on my previous question, but if those are the only three prefixes that matter, I have a 3-line solution for you. The PDF Toolkit (PDFtk) is an excellent (free!) product that I've been using for many years. It has numerous features to manipulate PDFs and comes in both command line and GUI versions. The command line version is called PDFtk Server and may be downloaded here:
http://www.pdflabs.com/tools/pdftk-server/
Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable (pdftk.exe, with a supporting DLL, libiconv2.dll) that runs on XP, Vista, W7, and W8 (it does not have to run on a "server" OS...it also runs on Mac, but I've never used it on that).
Here's the 3-line solution for you using the PDFtk command line:
pdftk D:\FolderIn\K11200235*.pdf cat output D:\FolderOut\K11200235.pdf
pdftk D:\FolderIn\12-1234*.pdf cat output D:\FolderOut\12-1234.pdf
pdftk D:\FolderIn\RD9876*.pdf cat output D:\FolderOut\RD9876.pdf
The "cat" operation "catenates" (joins/merges/combines) the input files into the output file. So those three lines will do it for you if those are the only three prefixes. If there are others, you can add a line for each one with the appropriate wildcarded file name for the input files (prefix*.pdf) and the appropriate file name for the combined output file (prefix.pdf). If that works for you, great. However, I was thinking of a more general solution that would help lots of other folks with a similar problem, but where it's not an easy task to specify all of the possible input prefixes.
Btw, If you'd like to see the full syntax for the PDFtk command line and some usage examples, here are the links:
http://www.pdflabs.com/docs/pdftk-man-page/
http://www.pdflabs.com/docs/pdftk-cli-examples/
If PDFtk doesn't work for you, then I hope a VB expert comes along soon. :) Regards, Joe
http://www.pdflabs.com/tools/pdftk-server/
Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable (pdftk.exe, with a supporting DLL, libiconv2.dll) that runs on XP, Vista, W7, and W8 (it does not have to run on a "server" OS...it also runs on Mac, but I've never used it on that).
Here's the 3-line solution for you using the PDFtk command line:
pdftk D:\FolderIn\K11200235*.pdf
pdftk D:\FolderIn\12-1234*.pdf cat output D:\FolderOut\12-1234.pdf
pdftk D:\FolderIn\RD9876*.pdf cat output D:\FolderOut\RD9876.pdf
The "cat" operation "catenates" (joins/merges/combines) the input files into the output file. So those three lines will do it for you if those are the only three prefixes. If there are others, you can add a line for each one with the appropriate wildcarded file name for the input files (prefix*.pdf) and the appropriate file name for the combined output file (prefix.pdf). If that works for you, great. However, I was thinking of a more general solution that would help lots of other folks with a similar problem, but where it's not an easy task to specify all of the possible input prefixes.
Btw, If you'd like to see the full syntax for the PDFtk command line and some usage examples, here are the links:
http://www.pdflabs.com/docs/pdftk-man-page/
http://www.pdflabs.com/docs/pdftk-cli-examples/
If PDFtk doesn't work for you, then I hope a VB expert comes along soon. :) Regards, Joe
It took me a while to write that last response and I had a browser tab open with the question, so I didn't see your reply until after I hit submit. Anyway, you could certainly add this line:
pdftk D:\FolderIn\11200235*.pdf cat output D:\FolderOut\11200235.pdf
Or are you saying that the <11200235*.pdf> files should wind up in the same combined file as the <K11200235*.pdf> files? If so, should the combined file be called <K11200235.pdf> or <11200235.pdf>?
pdftk D:\FolderIn\11200235*.pdf cat output D:\FolderOut\11200235.pdf
Or are you saying that the <11200235*.pdf> files should wind up in the same combined file as the <K11200235*.pdf> files? If so, should the combined file be called <K11200235.pdf> or <11200235.pdf>?
ASKER
Well, I have to use like Left (filenanme, 8..) or something, so that the condition compares the first 8 or 9 characters without being specific ..i have 5,000 pdf files with different number or character format combination.
This K11200235.pdf is different file from 11200235.pdf, so these two will be different, however files
K11200235 Support.pdf
K11200235 Additional.pdf
K11200235.pdf
will be merge as K11200235.pdf. and so on..
p.s. thanks offline shortly ..its 1300hrs eastern
This K11200235.pdf is different file from 11200235.pdf, so these two will be different, however files
K11200235 Support.pdf
K11200235 Additional.pdf
K11200235.pdf
will be merge as K11200235.pdf. and so on..
p.s. thanks offline shortly ..its 1300hrs eastern
ASKER
any ideas
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks Joe!
You're welcome! Good luck with the project. Regards, Joe
Read the blog posted at following web link: http://pdfutility.blogspot .com/2013/ 11/manage- your-large -sized-pdf -documents .html and know easy way to PDF split and merge provides you with an ideal solution to split or even merge the PDF document files, as per your requirement.
ASKER
Thanks bobmarish