Solved

Merge PDF files

Posted on 2014-01-13
19
566 Views
Last Modified: 2014-02-20
My computer has Adobe Acrobat XI Standard.

I have over 5,000 files in a folder and I want to merge the files that have the same Prefix name in the filename. i.e.

12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf

K11200235.pdf
K11200235 Support.pdf


12009-23.pdf


The Results: The first 3 would get merged into 1 file

12345-12.pdf

The next 2 would get merged into i file:

K11200235.pdf

 The last file would get copied or merged by itself -> 12009-23.pdf

and they would be in a destination folder other than the folder with the 5,000 .pdf files.

I found a code , but I don't know why its not working, i haven't scripted before, but I have run a macro or two..please help.


Sub MergeFiles()
 Set fso = CreateObject("Scripting.FileSystemObject")
 sFolder = "C:\test\"
 Set oFolder = fso.GetFolder(sFolder)

 bFirstDoc = True

 If oFolder.Files.Count < 2 Then
 MsgBox "You need to have at least two PDF files in the same folder to merge."
 Call fso.CopyFile(oFolder.Files.Name, oFolder & "\Results")
 Exit Sub
 End If

 Set AcroApp = CreateObject("AcroExch.App")
 Set oMainDoc = CreateObject("AcroExch.PDDoc")
 Set oTempDoc = CreateObject("AcroExch.PDDoc")

 For Each oFile In oFolder.Files
 If LCase(Left(oFile.Name, 8)) = ".pdf" Then

 If bFirstDoc Then
 bFirstDoc = False

 oMainDoc.Open sFolder & "\" & oFiles.Name
 Else


 oTempDoc.Open sFolder & "\" & oFiles.Name
 oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
 oTempDoc.Close
 End If

 End If
 Next
 
 oMainDoc.Save 1,sFolder & "\Output.pdf"
 oMainDoc.Close
 MsgBox "Done! See Output.pdf file."

AcroExchapp.exit
Set AcroExch.App = Nothing
Set oMainDoc = Nothing
Set oTempDoc = Nothing

 End Sub
0
Comment
Question by:lynmke
  • 9
  • 9
19 Comments
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39778280
I can't help you with that script, but I may be able to modify an article and program that I wrote here at EE called How To Combine-Merge-Append a Large Batch of TIFF Files. That program combines TIFF files based on the file prefixes, just as you're looking for, utilizing a program called IrfanView, with its "/multitif" option. It so happens that a recent release of IrfanView (Version 4.36, released 27-Jun-2013) introduced a new option called "/multipdf", which performs the same function as "/multitif", but with PDF files (latest release of IrfanView is 4.37, released 16-Dec-2013). Let me know if this approach interests you and I'll start looking into it. If it doesn't interest you, I'm sure some other expert will jump in to help with your code, but VB is not an area of my expertise. Regards, Joe
0
 

Author Comment

by:lynmke
ID: 39779026
Thanks Joe. The script above should work apparently, since I have Acrobat Standard version,  ( atleast after tweaking).
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39779367
OK, but one question for you. I notice that the number of lead-in characters varies in your files. For example, these have 8 lead-in characters:

12345-12.pdf
12345-12 Additional.pdf
12345-12 Support.pdf

12009-23.pdf

But these have 9 lead-in characters:

K11200235.pdf
K11200235 Support.pdf

Now, consider these two files:

K1120023.pdf
K11200235.pdf

With an 8-character lead-in, these two files would be combined into the same file; with a 9-character lead-in, they would come out of the process as two separate files. How would you handle these two files? Thanks, Joe
0
 

Author Comment

by:lynmke
ID: 39779447
Hi Joe,

Yes, if the file is as above then they would be separate files.

However, I doubled checked and I have only 3 types of formats to merge  as below ( there is always a space between the numbers and text on the 1st two examples below);

K11200235 Support.pdf
K11200235 Additional.pdf

or

12-1234 Support.pdf
13-2311 Additional.pdf


or

RD9876_DCE.pdf
RD9876_GRF.pdf

sorry for the confusion.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39779539
Here's the problem. Suppose you have:

RD9876_DCE.pdf
RD9876_DRF.pdf

Do these get merged into a single file called <RD9876_D.pdf>, or a single file called <RD9876_.pdf>, or a single file called <RD9876.pdf>, etc., or do they come out of the process as <RD9876_DCE.pdf> and <RD9876_DRF.pdf>?

You could say that an underscore is a separator, which would resolve the example above, but how about this  example:

RD9876DCE.pdf
RD9876DRF.pdf

Do these get merged into a single file called <RD9876D.pdf>, or a single file called <RD9876.pdf>, or a single file called <RD987.pdf>, or a single file called <RD98.pdf>, etc., or do they come out of the process as <RD9876DCE.pdf> and <RD9876DRF.pdf>?

Seems to me that declaring the number of lead-in characters to match is critical. Otherwise, ambiguities like the above could exist. Regards, Joe
0
 

Author Comment

by:lynmke
ID: 39779611
Yes Joe, declaring the number in lead in characters is critical. In this case, only this prefixes matter.

K11200235

12-1234

RD9876

i am also running a different code that I found, its almost running..getting erros in line 44 -> Type mismatch: 'UBound' ( see below)

Set fso = CreateObject("Scripting.FileSystemObject")

sFolder = "C:\test\"
dFolder = "C:\test\final"
 Set oFolder = fso.GetFolder(sFolder)
Dim file_group

'Sort the list in the Array name.
 'listArray = SortedFiles(oFolder)
 'listArray = SortedFiles(sFolder)
 file_names = SortedFiles(sFolder)
 
'msgbox "file_names : " & file_names(1)
 
'listArray = Quicksort(file_names, 1, oFolder.Files.Count)
 listArray = Quick_Sort(file_names, 1, oFolder.Files.Count - 1)
 
'msgbox "testa " & listArray(0) & " testb " & listArray(1)
 
f_filename = ""
 l_filename = ""
 'file_group(0) = ""
 'msgbox uBound(listArray)
 For x = 0 To UBound(listArray)
 f_filename = listArray(x)
 i = x + 1
MsgBox "listArray " & listArray(i)
 Do While InStr(1, listArray(i), f_filename, 1) > 0
ReDim Preserve file_group(i)
 file_group(i) = listArray(i)
 i = i + 1
 MsgBox "Step1"
 Loop
 x = i
MergePDFFiles (file_group)
 
ReDim file_group(0)
 
Next
MsgBox "Done"
 
Function MergePDFFiles(ByRef pdf_files)
bFirstDoc = True
recs = UBound(pdf_files)
 If recs < 2 Then
'If oFolder.Files.Count < 2 Then
' MsgBox "needed 2 pdf."
Set oMainDoc = CreateObject("AcroExch.PDDoc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
Exit Function
End If
'For Each oFile In oFolder.Files
For i = 0 To UBound(pdf_files)
 MsgBox "MergePDFFiles"
 If bFirstDoc Then
bFirstDoc = False
Set oMainDoc = CreateObject("AcroExch.PDDoc")
oMainDoc.Open sFolder & "\" & f_filename & ".pdf" 'oFile.Name
Else
Set oTempDoc = CreateObject("AcroExch.PDDoc")
oTempDoc.Open sFolder & "\" & pdf_files(i) & ".pdf"
oMainDoc.InsertPages oMainDoc.GetNumPages - 1, oTempDoc, 0, oTempDoc.GetNumPages, False
oTempDoc.Close
End If
Next

oMainDoc.Save 1, dFolder & f_filename & ".pdf"
oMainDoc.Close
oTempDoc.Close
 'MsgBox "ok"

End Function

' Return an array containing the names of the
 ' files in the directory sorted alphabetically.
 Function SortedFiles(dir_path)
 Dim file_names
 Set fso = CreateObject("Scripting.FileSystemObject")

' Get the FSO Folder (directory) object.
 Set fso_folder = fso.GetFolder(dir_path)
 
' Make the list of names.
 ReDim file_names(fso_folder.Files.Count)
'msgbox "filecount " & fso_folder.Files.Count
i = 0
 For Each fso_file In fso_folder.Files
 'MsgBox "SortFiles"
 file_names(i) = Mid(fso_file.Name, 1, Len(fso_file.Name) - 4) 'File name minus the extension.
 i = i + 1
 ntemp = file_names(i)
 'MsgBox i & " " & ntemp
Next 'fso_file
 
' Sort the list of files.
 'Quick_sort file_names, 1, fso_folder.Files.Count
 
' Return the sorted list.
 SortedFiles = file_names
 
End Function
 
Function Quick_Sort(ByRef SortArray, ByRef First, ByRef Last)
 'Dim Low As Long, High As Long
 'Dim Temp As Variant, List_Separator As Variant
 Dim List_Separator
Low = First
 High = Last
 'msgbox "QuickSorta " & SortArray(0) & "QuickSortb " & SortArray(1)
 List_Separator = SortArray((First + Last) / 2)
 Do
 Do While (SortArray(Low) < List_Separator)
 Low = Low + 1
 Loop
 Do While (SortArray(High) > List_Separator)
 High = High - 1
 Loop
 If (Low <= High) Then
 Temp = SortArray(Low)
 SortArray(Low) = SortArray(High)
 SortArray(High) = Temp
 Low = Low + 1
 High = High - 1
 End If
 Loop While (Low <= High)
 If (First < High) Then Quick_Sort SortArray, First, High
 If (Low < Last) Then Quick_Sort SortArray, Low, Last

'msgbox "ArrayCount: " & UBound(SortArray)
'For i = 0 To UBound(SortArray)
 ' msgbox "fortest: " & SortArray(i)
 'Next

'Return the sorted list
 Quick_Sort = SortArray

End Function
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39779673
I'm thinking of a more general program that would work for lots of folks with a similar, if not identical, situation. In your case, you're saying that

12-1234

is the prefix (although I suspect that's a typo and you really meant to say 12-12345). Someone else might think the prefix is

12

Thus, you would say that the number of characters needed to match in the case above is 7 (or 8, depending on the typo), while someone else might say it's 2.

Sorry, can't help with the VB Script...outside of my expertise. To get more of the right experts in the mix, you may want to change your Topics to:

VB Script
Visual Basic Classic
Visual Basic.NET

Regards, Joe
0
 

Author Comment

by:lynmke
ID: 39790213
Thanks Joe,

I hope someone with VB knowledge can assist checking the code. somehow I need to put a condition that will choose the first characters of the .pdf filename.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39790225
In this case, only this prefixes matter.
K11200235
12-1234
RD9876
Are you saying that these three are the only prefixes you have?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:lynmke
ID: 39790239
Hi Joe,

No, the "K11200235" filename can also be just numbers "11200235".

I figured out that I could invoke a f fileName.Contains(phrase) Then Return True...for each filename fortmat before the first Ubound loop in the above script.. but "How" should i frame this condition?

Thanks!
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39790241
I haven't heard back from you on my previous question, but if those are the only three prefixes that matter, I have a 3-line solution for you. The PDF Toolkit (PDFtk) is an excellent (free!) product that I've been using for many years. It has numerous features to manipulate PDFs and comes in both command line and GUI versions. The command line version is called PDFtk Server and may be downloaded here:
http://www.pdflabs.com/tools/pdftk-server/

Don't be misled by "Server" in the name. I don't know why they called it that, but it's just an executable (pdftk.exe, with a supporting DLL, libiconv2.dll) that runs on XP, Vista, W7, and W8 (it does not have to run on a "server" OS...it also runs on Mac, but I've never used it on that).

Here's the 3-line solution for you using the PDFtk command line:

pdftk D:\FolderIn\K11200235*.pdf cat output D:\FolderOut\K11200235.pdf
pdftk D:\FolderIn\12-1234*.pdf cat output D:\FolderOut\12-1234.pdf
pdftk D:\FolderIn\RD9876*.pdf cat output D:\FolderOut\RD9876.pdf

The "cat" operation "catenates" (joins/merges/combines) the input files into the output file. So those three lines will do it for you if those are the only three prefixes. If there are others, you can add a line for each one with the appropriate wildcarded file name for the input files (prefix*.pdf) and the appropriate file name for the combined output file (prefix.pdf). If that works for you, great. However, I was thinking of a more general solution that would help lots of other folks with a similar problem, but where it's not an easy task to specify all of the possible input prefixes.

Btw, If you'd like to see the full syntax for the PDFtk command line and some usage examples, here are the links:
http://www.pdflabs.com/docs/pdftk-man-page/
http://www.pdflabs.com/docs/pdftk-cli-examples/

If PDFtk doesn't work for you, then I hope a VB expert comes along soon. :)  Regards, Joe
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39790244
It took me a while to write that last response and I had a browser tab open with the question, so I didn't see your reply until after I hit submit. Anyway, you could certainly add this line:

pdftk D:\FolderIn\11200235*.pdf cat output D:\FolderOut\11200235.pdf

Or are you saying that the <11200235*.pdf> files should wind up in the same combined file as the <K11200235*.pdf> files? If so, should the combined file be called <K11200235.pdf> or <11200235.pdf>?
0
 

Author Comment

by:lynmke
ID: 39790275
Well, I have to use like Left (filenanme, 8..) or something, so that the condition compares the first 8 or 9 characters without being specific ..i have 5,000 pdf files with different number  or character format combination.

This K11200235.pdf is different file from 11200235.pdf, so these two will be different, however files

K11200235 Support.pdf
K11200235 Additional.pdf
K11200235.pdf


will be merge as K11200235.pdf. and so on..

p.s. thanks offline shortly ..its 1300hrs eastern
0
 

Author Comment

by:lynmke
ID: 39790714
any ideas
0
 
LVL 51

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 39791161
Let's forget about programming language for a moment and define the exact specifications. You can't fix the VB code (or code in any language) if you don't know what you're trying to fix it to do!

Let's take your most recent example:

K11200235 Support.pdf
K11200235 Additional.pdf
K11200235.pdf

will be merged into:

K11200235.pdf

Since you have 5,000 PDF files with different character/number format combinations, it's possible that you could have these files:

K112002356 Support.pdf
K112002356 Additional.pdf
K112002356.pdf

I presume these would get merged into:

K112002356.pdf

But, since they share the same 9 lead-in characters with K11200235, you'd have to be careful about not merging them into:

K11200235.pdf

Are you willing to ignore this problem? If not, it's very tricky; if so, here's a possible description for the exact specifications of the program:

(1) Sort the file names alphabetically ascending, but giving special treatment to the dot/period ("."), because it sorts after space, comma, hyphen, and others, but before digits, at-sign, and others. In other words, the sorted list should be:

K112002356.pdf
K112002356 Additional.pdf
K112002356 Support.pdf

But if you sort ascending the full file name, including file type, it would be this (since space sorts before dot):

K112002356 Additional.pdf
K112002356 Support.pdf
K112002356.pdf

(2) Loop through the alphabetical list looking at all of the characters right before the ".pdf" (call this Prefix) and count the number of characters (call this Prefix_Count). For example, in the file:

K11200235.pdf

The Prefix is K11200235 and the Prefix_Count is 9.

(3) Each time the program looks at a file in the list, it compares the Prefix of the current file with the Prefix of the previous file. Since they are sorted alphabetically, a match on Prefix_Count characters means the current file should be merged with the previous file, and so on, until it finds a Prefix that doesn't match, which then starts a new file. The glitch, of course, is that Prefix K11200235 would match Prefix K112002356 for Prefix_Count (9) characters, which is why I asked if you're willing to ignore this problem. The only way I can think of to solve this nasty problem is to handle the longer Prefix_Count files first. In other words, if you process K112002356 (Prefix_Count of 10) before K11200235, it will work. That's what I meant by tricky – you'd have to sort the file list by Prefix alphabetically ascending and then sort by Prefix_Count descending within that.

Regards, Joe
0
 

Author Closing Comment

by:lynmke
ID: 39824008
Thanks Joe!
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
ID: 39824084
You're welcome! Good luck with the project. Regards, Joe
0
 

Expert Comment

by:bobmarish
ID: 39837925
Read the blog posted at following web link: http://pdfutility.blogspot.com/2013/11/manage-your-large-sized-pdf-documents.html and know easy way to PDF split and merge provides you with an ideal solution to split or even merge the PDF document files, as per your requirement.
0
 

Author Comment

by:lynmke
ID: 39875787
Thanks bobmarish
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
This video walks the viewer through the process of creating envelopes and labels, with multiple names and addresses. Navigate to the “Start Mail Merge” button in the Mailings tab: Follow the step-by-step process until asked to find the address doc…
An overview on how to enroll an hourly employee into the employee database and how to give them access into the clock in terminal.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now