How do I merge 100,000's of TIFF's into Multiple PDF's?

Folks,

I have hundred's of thousands of TIFF's that I need to merge in multiple PDF's (i.e. lots of scanned pages that have to be merged to recreate their original documents). I have an Excel spreadsheet listing the TIFFs' names and locations and also indicating which PDF file they belong in, (e.g. Tiif's 1-10 are the 1st PDF, 11-75 the 2nd, etc.).

I was happily running a VBA macro using PDFCreator to do this. Unfortunately, PDFCreator appears somewhat tempremental...
 - after much mucking about I'm still not absolutely confident that pages will maintain their original sequence.
 - for no reason I could ever identify, PDFCreator started producing huge PDF's with all pages in landscape. Each time this happened I had to uninstall and reinstall it.

I tried using ABBY (FineRead 8 Pro). It's good for merging all the TIFF's in a folder into a single PDF, but I would have to select each folder manually and volumes are simply too great for this.

I then experimented with PDFTK (having converted individual TIFF's into single-page PDF's). The problem with this was that there were too many files to specify on the command line and using wildcards doesn't guarantee that the page sequence will be correct.

So ...
(1) Is there a bullet-proof way to safely control PDFCreator from VBA?
(2) Is there a better/safer alternative using any mixture of the following...
  - Acrobat 8 Standard.
  - ABBY FineReader 8 Professional Edition.
  - PDFTK.
  - Excel 2007.
  - Windows Scripting.
  - I could probably get access to Omnipage (a recent full version, but I don't know the number). Not my preferred solution, as I don't have a license, so I'd have to use a colleague's PC after hours.

I'm running XP SP2. The TIFF's are currently in a small number of humungous folders, but I'd have no problem in moving them so that each document's TIFF's were in their own sub-folder.

Many Thanks,
Brian.
LVL 26
redmondbAsked:
Who is Participating?
 
Mechanic_KharkovConnect With a Mentor Commented:
"The only issue would be in recognising that the run had completed (so I could start the batch again for the next group of files). This would require me to recognise that the progress dialogue's Fast Visible Text had changed to the following...        Close        Process completed        The following errors occurred:I've never used FVT before, so could you suggest, please, some AutoIt code that would detect this?"

I have no acrobat but I have created a little app to play with. It shows command line parameter and after about 2.5 sec changes text on the form. The script below is to run that stub application.

Script au3 is also present in the attached archive.

File WaitForDialogTextChange.zip (206 KB) uploaded
Your Download-Link #1:http://rapidshare.de/files/46780014/WaitForDialogTextChange.zip.html


for $i = 1 to 3
 
	$Filename = "filename_#" & String ($i) ;compose fake name
 
	if ShellExecute("StubAppWasteTime.exe", $Filename) <> 1 Then Exit
		
	;initial wait for window init
	WinWaitActive("[TITLE:Stub Application; CLASS:TMainForm]")
	;Here possibly some extra work with this window
	;...
 
	ToolTip("Start wait for text")
 
	;now wait for desired text in window
	Do
		Sleep(100)
		$Text = WinGetText("[TITLE:Stub Application; CLASS:TMainForm]","")
	Until (StringInStr($Text, "Process completed") <> 0) or ($Text == 0)
 
	if $Text == 0 Then Exit;  ;window not found
		
	ToolTip("") ;clear ToolTip
 
	;MsgBox(0, "Text read was:", $Text)
 
	;then click Ok button
	ControlClick("[TITLE:Stub Application; CLASS:TMainForm]", "", "[CLASS:TButton; INSTANCE:2]")
 
	ToolTip("Start wait for window to close")
 
	;wait for window destroying
	while WinExists ("[TITLE:Stub Application; CLASS:TMainForm]") == 1
		Sleep(100)
	WEnd
 
	ToolTip("")
 
Next

Open in new window

0
 
techhealthCommented:
I had no experience with ABBY, but from what you described ABBY would be the best choice, since it works as expected when dealing with TIFFs in a single folder.  Then all you need to do is running a script/VBA to put related TIFFs into separate folders.  The script would read the Excel file, create the necessary list of folders, and put related TIFFs into each folder.  Then you can either in the same script to invoke ABBY (is it command-line capable?) on each folder to create the PDFs, or have a separate script to do that for easier debugging.  You can also use the script to do any kind of post-processing, e.g., moving the PDFs to some other location.
0
 
redmondbAuthor Commented:
Thanks, techhealth, but I'm afraid (my version of) ABBY doesn't have that kind of command-line processing.

Regards,
Brian
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

 
Mechanic_KharkovCommented:
Easy to learn scripting tool with ability to press buttons inside any application (even it has no automation abilities), enter texts in dialog boxes, etc. - AutoIt.
http://www.autoitscript.com/autoit3/index.shtml

Easy to understand, nice to use. Quick automate any of Your favorite software. Just try.
0
 
Karl Heinz KremerCommented:
Use iText, the library that was used to create pdftk. All you need is somebody who knows how to program in Java. There are enough examples available online to so that you can create an application that can merge all the files.

Another option would be to run pdftk in batches: Run it on a limited number of files (so that you can specify all of them on the command line. You will end up with a number of files that all have let's say 100 pages. In the second go around you merge 100 of those files together and then you add a third round to come up with the final document.
0
 
techhealthCommented:
Ever checked out the SDK from Adobe?  I think that has some nice tools you can use, including command-line tools.  But I haven't looked at it for long so not sure.  Will try to find some more details...
0
 
Karl Heinz KremerCommented:
The SDK does not contain any tools that would be useful in this case. The SDK gives you the tools to create a application that you could use to merge these files, but without programming, it does not help the asker.
0
 
Mechanic_KharkovCommented:
If windows scripting is not terrible to You, why don't You try very similar to VB scripting engine of AutoIt? I showed link above. This tool is free, but powerful. You could write Your own script to control any of listed above software within it's user interface (sending keystrokes or even clicking mouse buttons in desired positions). In the script You can manage Your files as You need, and can arrange filenames to process with any desired loops. So, just read samples, and You'll like it.
0
 
redmondbAuthor Commented:
Folks,

First of all. many thanks to all for the suggestions and apologies for my delay in responding.

Mechanic_Kharkov, ironically, not only is AutoIt a tool I've use for quite a while,but I actually used it in my PDFCreator attempt (to cope with an annoying Excel DDE time-out message). While AutoIt can be hit and miss for a complicated series of dialogues, as soon as I saw your suggestion, it reminded me that Abby's Automation functionality allows the creation of a batch job which prompts for a list of input files and then automatically carries out all the remaining steps. So an AutoIT script to run an Abby batch job would be straightforward - the only non-trivial bit being the processing of the File Open dialogue, which I've sucessfully done before.  The only issue would be in recognising that the run had completed (so I could start the batch again for the next group of files). This would require me to recognise that the progress dialogue's Fast Visible Text had changed to the following...
        Close
        Process completed
        The following errors occurred:
I've never used FVT before, so could you suggest, please, some AutoIt code that would detect this?

techhealth, best wishes on the search, but khkremer comment doesn't sound encouraging so I'd be concerned that it would be a waste of your time. (FWIW, this kind of solution was my ideal, but it was my failure to find a way to do it that lead me here in the first place.)

khkremer...
 - thanks for the warning to us about the SDK.
 - I've never written anything in Java, so that would very much be a last resort for me.
 - Sorry, I perhaps didn't make my needs clear. The aim isn't to produce a single super-pdf, but rather to create a number of them with varying numbers of pages (from 20 to more than 1000 pages). "Iterative" running of PDFTK might still be a possibility, but I was surprised to see you mention passing as many as a hundred files per run as I never thought that the command line could be that long. However, from a bit of googling, you're dead right, in fact the limit seems to be 8k. If I take one document's files and rename them (1.tif, 2. tif, etc.) I could PDFTK more than a thousand files on a single run. I'll do a test over the weekend to see that PDFTK is happy with this and also to get an idea of the % of files larger than that.

Regards to all,
Brian.
0
 
techhealthCommented:
I took a brief look at SDK and realized this is how you use it in your scenario: run JavaScript inside Acrobat.  Acrobat is a full featured JavaScript host, which has no problem dealing with the file system or other external resources.  SDK provides the documentation on the JavaScript API/object model/methods to carry out  tasks.  They even have code examples on combining files in different formats into one PDF file.

You already have Acrobat, and the SDK can be downloaded (documentations can be viewed online too) so the only prerequisite is JavaScript.   If you're relatively well versed in JavaScript, you should be able to pick it up pretty quickly.
0
 
SStoryConnect With a Mentor Commented:
Below is my code for calling the free GNU library for creating PDF's

I installed that library from:
http://sourceforge.net/projects/gnuwin32/files/tiff/

The version I installed at the time was:
tiff-win32-3.6.1-2.exe
   Private Function CreatePDFFromTiff() As Boolean
        Dim OutputPath As String
        Dim args As String
        Dim psi As ProcessStartInfo
        Dim P As Process
        Try
            OutputPath = Chr(34) & PDF_DOC_PATH & Chr(34)
            args = "-o " & OutputPath & " " & Chr(34) & MULTIPAGE_TIFF_DOC_PATH & Chr(34)
            psi = New ProcessStartInfo("c:\program files\gnuwin32\bin\tiff2pdf.exe", args)
            psi.CreateNoWindow = True
            P = System.Diagnostics.Process.Start(psi)
            P.WaitForExit()
            Status(OutputPath & " PDF file was created")
            Return True
        Catch ex As Exception
            Status("ERROR creating PDF: " & ex.Message)
            bErrors = True
            Return False
        End Try
    End Function

Open in new window

0
 
redmondbAuthor Commented:
First of all, my apologies to all concerned that I lost track of this question and only came across it when I found the "Pending Closure" message.

I don't know if this is possible, but ideally I'd like to increase the points on this to 1000 and split it between Mechanic_Kharkov and SStory. Is this possible?

Thanks,
redmondb
0
 
redmondbAuthor Commented:
please see my previous comment.

Regards,
redmondb
0
 
redmondbAuthor Commented:
please see my previous comment.

Regards,
redmondb
0
 
redmondbAuthor Commented:
please see my previous comment.

Regards,
redmondb
0
 
redmondbAuthor Commented:
Sorry for the multiple posts - the site apparently doesn't support Opera for submitting Objections.

Regards,
redmondb
0
 
redmondbAuthor Commented:
Thanks, Vee-Mod. Apologies again for losing this.

Regards,
redmondb
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.