Link to home
Start Free TrialLog in
Avatar of lynmke
lynmke

asked on

Merge PDF files

My computer has Adobe Acrobat XI Standard.

I have over 5,000 files in a folder and I want to merge the files that have the same Prefix name in the filename. i.e.

12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf

K11200235.pdf
K11200235 Support.pdf


12009-23.pdf


The Results: The first 3 would get merged into 1 file

12345-12.pdf

The next 2 would get merged into i file:

K11200235.pdf

 The last file would get copied or merged by itself -> 12009-23.pdf

and they would be in a destination folder other than the folder with the 5,000 .pdf files.

I found a code , but I don't know why its not working, i haven't scripted before, but I have run a macro or two..please help.


Sub MergeFiles()
 Set fso = CreateObject("Scripting.FileSystemObject")
 sFolder = "C:\test\"
 Set oFolder = fso.GetFolder(sFolder)

 bFirstDoc = True

 If oFolder.Files.Count < 2 Then
 MsgBox "You need to have at least two PDF files in the same folder to merge."
 Call fso.CopyFile(oFolder.Files.Name, oFolder & "\Results")
 Exit Sub
 End If

 Set AcroApp = CreateObject("AcroExch.App")
 Set oMainDoc = CreateObject("AcroExch.PDDoc")
 Set oTempDoc = CreateObject("AcroExch.PDDoc")

 For Each oFile In oFolder.Files
 If LCase(Left(oFile.Name, 8)) = ".pdf" Then

 If bFirstDoc Then
 bFirstDoc = False

 oMainDoc.Open sFolder & "\" & oFiles.Name
 Else


 oTempDoc.Open sFolder & "\" & oFiles.Name
 oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
 oTempDoc.Close
 End If

 End If
 Next
 
 oMainDoc.Save 1,sFolder & "\Output.pdf"
 oMainDoc.Close
 MsgBox "Done! See Output.pdf file."

AcroExchapp.exit
Set AcroExch.App = Nothing
Set oMainDoc = Nothing
Set oTempDoc = Nothing

 End Sub
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

I can't help you with that script, but I may be able to modify an article and program that I wrote here at EE called How To Combine-Merge-Append a Large Batch of TIFF Files. That program combines TIFF files based on the file prefixes, just as you're looking for, utilizing a program called IrfanView, with its "/multitif" option. It so happens that a recent release of IrfanView (Version 4.36, released 27-Jun-2013) introduced a new option called "/multipdf", which performs the same function as "/multitif", but with PDF files (latest release of IrfanView is 4.37, released 16-Dec-2013). Let me know if this approach interests you and I'll start looking into it. If it doesn't interest you, I'm sure some other expert will jump in to help with your code, but VB is not an area of my expertise. Regards, Joe
Avatar of lynmke
lynmke

ASKER

Thanks Joe. The script above should work apparently, since I have Acrobat Standard version,  ( atleast after tweaking).
OK, but one question for you. I notice that the number of lead-in characters varies in your files. For example, these have 8 lead-in characters:

12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf

12009-23.pdf

But these have 9 lead-in characters:

K11200235.pdf
K11200235 Support.pdf

Now, consider these two files:

K1120023.pdf
K11200235.pdf

With an 8-character lead-in, these two files would be combined into the same file; with a 9-character lead-in, they would come out of the process as two separate files. How would you handle these two files? Thanks, Joe
Avatar of lynmke

ASKER

Hi Joe,

Yes, if the file is as above then they would be separate files.

However, I doubled checked and I have only 3 types of formats to merge  as below ( there is always a space between the numbers and text on the 1st two examples below);

K11200235 Support.pdf
K11200235 Additional.pdf

or

12-1234 Support.pdf
13-2311 Additional.pdf


or

RD9876_DCE.pdf
RD9876_GRF.pdf

sorry for the confusion.
Here's the problem. Suppose you have:

RD9876_DCE.pdf
RD9876_DRF.pdf

Do these get merged into a single file called <RD9876_D.pdf>, or a single file called <RD9876_.pdf>, or a single file called <RD9876.pdf>, etc., or do they come out of the process as <RD9876_DCE.pdf> and <RD9876_DRF.pdf>?

You could say that an underscore is a separator, which would resolve the example above, but how about this  example:

RD9876DCE.pdf
RD9876DRF.pdf

Do these get merged into a single file called <RD9876D.pdf>, or a single file called <RD9876.pdf>, or a single file called <RD987.pdf>, or a single file called <RD98.pdf>, etc., or do they come out of the process as <RD9876DCE.pdf> and <RD9876DRF.pdf>?

Seems to me that declaring the number of lead-in characters to match is critical. Otherwise, ambiguities like the above could exist. Regards, Joe
Avatar of lynmke

ASKER

Yes Joe, declaring the number in lead in characters is critical. In this case, only this prefixes matter.

K11200235

12-1234

RD9876

i am also running a different code that I found, its almost running..getting erros in line 44 -> Type mismatch: 'UBound' ( see below)

Set fso = CreateObject("Scripting.FileSystemObject")

sFolder = "C:\test\"
dFolder = "C:\test\final"
 Set oFolder = fso.GetFolder(sFolder)
Dim file_group

'Sort the list in the Array name.
 'listArray = SortedFiles(oFolder)
 'listArray = SortedFiles(sFolder)
 file_names = SortedFiles(sFolder)
 
'msgbox "file_names : " & file_names(1)
 
'listArray = Quicksort(file_names, 1, oFolder.Files.Count)
 listArray = Quick_Sort(file_names, 1, oFolder.Files.Count - 1)
 
'msgbox "testa " & listArray(0) & " testb " & listArray(1)
 
f_filename = ""
 l_filename = ""
 'file_group(0) = ""
 'msgbox uBound(listArray)
 For x = 0 To UBound(listArray)
 f_filename = listArray(x)
 i = x + 1
MsgBox "listArray " & listArray(i)
 Do While InStr(1, listArray(i), f_filename, 1) > 0
ReDim Preserve file_group(i)
 file_group(i) = listArray(i)
 i = i + 1
 MsgBox "Step1"
 Loop
 x = i
MergePDFFiles (file_group)
 
ReDim file_group(0)
 
Next
MsgBox "Done"
 
Function MergePDFFiles(ByRef pdf_files)
bFirstDoc = True
recs = UBound(pdf_files)
 If recs < 2 Then
'If oFolder.Files.Count < 2 Then
' MsgBox "needed 2 pdf."
Set oMainDoc = CreateObject("AcroExch.PDDoc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
Exit Function
End If
'For Each oFile In oFolder.Files
For i = 0 To UBound(pdf_files)
 MsgBox "MergePDFFiles"
 If bFirstDoc Then
bFirstDoc = False
Set oMainDoc = CreateObject("AcroExch.PDDoc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
Else
Set oTempDoc = CreateObject("AcroExch.PDDoc")
oTempDoc.Open sFolder & "\" & pdf_files(i) & ".pdf"
oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
oTempDoc.Close
End If
Next

oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
oTempDoc.Close
 'MsgBox "ok"

End Function

' Return an array containing the names of the
 ' files in the directory sorted alphabetically.
 Function SortedFiles(dir_path)
 Dim file_names
 Set fso = CreateObject("Scripting.FileSystemObject")

' Get the FSO Folder (directory) object.
 Set fso_folder = fso.GetFolder(dir_path)
 
' Make the list of names.
 ReDim file_names(fso_folder.Files.Count)
'msgbox "filecount " & fso_folder.Files.Count
i = 0
 For Each fso_file In fso_folder.Files
 'MsgBox "SortFiles"
 file_names(i) = Mid(fso_file.Name, 1, Len(fso_file.Name) - 4) 'File name minus the extension.
 i = i + 1
 ntemp = file_names(i)
 'MsgBox i & " " & ntemp
Next 'fso_file
 
' Sort the list of files.
 'Quick_sort file_names, 1, fso_folder.Files.Count
 
' Return the sorted list.
 SortedFiles = file_names
 
End Function
 
Function Quick_Sort(ByRef SortArray, ByRef First, ByRef Last)
 'Dim Low As Long, High As Long
 'Dim Temp As Variant, List_Separator As Variant
 Dim List_Separator
Low = First
 High = Last
 'msgbox "QuickSorta " & SortArray(0) & "QuickSortb " & SortArray(1)
 List_Separator = SortArray((First + Last) / 2)
 Do
 Do While (SortArray(Low) < List_Separator)
 Low = Low + 1
 Loop
 Do While (SortArray(High) > List_Separator)
 High = High - 1
 Loop
 If (Low <= High) Then
 Temp = SortArray(Low)
 SortArray(Low) = SortArray(High)
 SortArray(High) = Temp
 Low = Low + 1
 High = High - 1
 End If
 Loop While (Low <= High)
 If (First < High) Then Quick_Sort SortArray, First, High
 If (Low < Last) Then Quick_Sort SortArray, Low, Last

'msgbox "ArrayCount: " & UBound(SortArray)
'For i = 0 To UBound(SortArray)
 ' msgbox "fortest: " & SortArray(i)
 'Next

'Return the sorted list
 Quick_Sort = SortArray

End Function
I'm thinking of a more general program that would work for lots of folks with a similar, if not identical, situation. In your case, you're saying that

12-1234

is the prefix (although I suspect that's a typo and you really meant to say 12-12345). Someone else might think the prefix is

12

Thus, you would say that the number of characters needed to match in the case above is 7 (or 8, depending on the typo), while someone else might say it's 2.

Sorry, can't help with the VB Script...outside of my expertise. To get more of the right experts in the mix, you may want to change your Topics to:

VB Script
Visual Basic Classic
Visual Basic.NET

Regards, Joe
Avatar of lynmke

ASKER

Thanks Joe,

I hope someone with VB knowledge can assist checking the code. somehow I need to put a condition that will choose the first characters of the .pdf filename.
In this case, only this prefixes matter.
K11200235
12-1234
RD9876
Are you saying that these three are the only prefixes you have?
Avatar of lynmke

ASKER

Hi Joe,

No, the "K11200235" filename can also be just numbers "11200235".

I figured out that I could invoke a f fileName.Contains(phrase) Then Return True...for each filename fortmat before the first Ubound loop in the above script.. but "How" should i frame this condition?

Thanks!
I haven't heard back from you on my previous question, but if those are the only three prefixes that matter, I have a 3-line solution for you. The PDF Toolkit (PDFtk) is an excellent (free!) product that I've been using for many years. It has numerous features to manipulate PDFs and comes in both command line and GUI versions. The command line version is called PDFtk Server and may be downloaded here:
http://www.pdflabs.com/tools/pdftk-server/

Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable (pdftk.exe, with a supporting DLL, libiconv2.dll) that runs on XP, Vista, W7, and W8 (it does not have to run on a "server" OS...it also runs on Mac, but I've never used it on that).

Here's the 3-line solution for you using the PDFtk command line:

pdftk D:\FolderIn\K11200235*.pdf cat output D:\FolderOut\K11200235.pdf
pdftk D:\FolderIn\12-1234*.pdf cat output D:\FolderOut\12-1234.pdf
pdftk D:\FolderIn\RD9876*.pdf cat output D:\FolderOut\RD9876.pdf

The "cat" operation "catenates" (joins/merges/combines) the input files into the output file. So those three lines will do it for you if those are the only three prefixes. If there are others, you can add a line for each one with the appropriate wildcarded file name for the input files (prefix*.pdf) and the appropriate file name for the combined output file (prefix.pdf). If that works for you, great. However, I was thinking of a more general solution that would help lots of other folks with a similar problem, but where it's not an easy task to specify all of the possible input prefixes.

Btw, If you'd like to see the full syntax for the PDFtk command line and some usage examples, here are the links:
http://www.pdflabs.com/docs/pdftk-man-page/
http://www.pdflabs.com/docs/pdftk-cli-examples/

If PDFtk doesn't work for you, then I hope a VB expert comes along soon. :)  Regards, Joe
It took me a while to write that last response and I had a browser tab open with the question, so I didn't see your reply until after I hit submit. Anyway, you could certainly add this line:

pdftk D:\FolderIn\11200235*.pdf cat output D:\FolderOut\11200235.pdf

Or are you saying that the <11200235*.pdf> files should wind up in the same combined file as the <K11200235*.pdf> files? If so, should the combined file be called <K11200235.pdf> or <11200235.pdf>?
Avatar of lynmke

ASKER

Well, I have to use like Left (filenanme, 8..) or something, so that the condition compares the first 8 or 9 characters without being specific ..i have 5,000 pdf files with different number  or character format combination.

This K11200235.pdf is different file from 11200235.pdf, so these two will be different, however files

K11200235 Support.pdf
K11200235 Additional.pdf
K11200235.pdf


will be merge as K11200235.pdf. and so on..

p.s. thanks offline shortly ..its 1300hrs eastern
Avatar of lynmke

ASKER

any ideas
ASKER CERTIFIED SOLUTION
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of lynmke

ASKER

Thanks Joe!
You're welcome! Good luck with the project. Regards, Joe
Read the blog posted at following web link: http://pdfutility.blogspot.com/2013/11/manage-your-large-sized-pdf-documents.html and know easy way to PDF split and merge provides you with an ideal solution to split or even merge the PDF document files, as per your requirement.
Avatar of lynmke

ASKER

Thanks bobmarish