Solved

How do I merge 100,000's of TIFF's into Multiple PDF's?

Posted on 2009-04-04
19
1,119 Views
Last Modified: 2012-05-06
Folks,

I have hundred's of thousands of TIFF's that I need to merge in multiple PDF's (i.e. lots of scanned pages that have to be merged to recreate their original documents). I have an Excel spreadsheet listing the TIFFs' names and locations and also indicating which PDF file they belong in, (e.g. Tiif's 1-10 are the 1st PDF, 11-75 the 2nd, etc.).

I was happily running a VBA macro using PDFCreator to do this. Unfortunately, PDFCreator appears somewhat tempremental...
 - after much mucking about I'm still not absolutely confident that pages will maintain their original sequence.
 - for no reason I could ever identify, PDFCreator started producing huge PDF's with all pages in landscape. Each time this happened I had to uninstall and reinstall it.

I tried using ABBY (FineRead 8 Pro). It's good for merging all the TIFF's in a folder into a single PDF, but I would have to select each folder manually and volumes are simply too great for this.

I then experimented with PDFTK (having converted individual TIFF's into single-page PDF's). The problem with this was that there were too many files to specify on the command line and using wildcards doesn't guarantee that the page sequence will be correct.

So ...
(1) Is there a bullet-proof way to safely control PDFCreator from VBA?
(2) Is there a better/safer alternative using any mixture of the following...
  - Acrobat 8 Standard.
  - ABBY FineReader 8 Professional Edition.
  - PDFTK.
  - Excel 2007.
  - Windows Scripting.
  - I could probably get access to Omnipage (a recent full version, but I don't know the number). Not my preferred solution, as I don't have a license, so I'd have to use a colleague's PC after hours.

I'm running XP SP2. The TIFF's are currently in a small number of humungous folders, but I'd have no problem in moving them so that each document's TIFF's were in their own sub-folder.

Many Thanks,
Brian.
0
Comment
Question by:redmondb
  • 8
  • 3
  • 3
  • +2
19 Comments
 
LVL 11

Expert Comment

by:techhealth
ID: 24069326
I had no experience with ABBY, but from what you described ABBY would be the best choice, since it works as expected when dealing with TIFFs in a single folder.  Then all you need to do is running a script/VBA to put related TIFFs into separate folders.  The script would read the Excel file, create the necessary list of folders, and put related TIFFs into each folder.  Then you can either in the same script to invoke ABBY (is it command-line capable?) on each folder to create the PDFs, or have a separate script to do that for easier debugging.  You can also use the script to do any kind of post-processing, e.g., moving the PDFs to some other location.
0
 
LVL 26

Author Comment

by:redmondb
ID: 24069352
Thanks, techhealth, but I'm afraid (my version of) ABBY doesn't have that kind of command-line processing.

Regards,
Brian
0
 
LVL 5

Expert Comment

by:Mechanic_Kharkov
ID: 24069947
Easy to learn scripting tool with ability to press buttons inside any application (even it has no automation abilities), enter texts in dialog boxes, etc. - AutoIt.
http://www.autoitscript.com/autoit3/index.shtml

Easy to understand, nice to use. Quick automate any of Your favorite software. Just try.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 24070983
Use iText, the library that was used to create pdftk. All you need is somebody who knows how to program in Java. There are enough examples available online to so that you can create an application that can merge all the files.

Another option would be to run pdftk in batches: Run it on a limited number of files (so that you can specify all of them on the command line. You will end up with a number of files that all have let's say 100 pages. In the second go around you merge 100 of those files together and then you add a third round to come up with the final document.
0
 
LVL 11

Expert Comment

by:techhealth
ID: 24071810
Ever checked out the SDK from Adobe?  I think that has some nice tools you can use, including command-line tools.  But I haven't looked at it for long so not sure.  Will try to find some more details...
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 24073291
The SDK does not contain any tools that would be useful in this case. The SDK gives you the tools to create a application that you could use to merge these files, but without programming, it does not help the asker.
0
 
LVL 5

Expert Comment

by:Mechanic_Kharkov
ID: 24074681
If windows scripting is not terrible to You, why don't You try very similar to VB scripting engine of AutoIt? I showed link above. This tool is free, but powerful. You could write Your own script to control any of listed above software within it's user interface (sending keystrokes or even clicking mouse buttons in desired positions). In the script You can manage Your files as You need, and can arrange filenames to process with any desired loops. So, just read samples, and You'll like it.
0
 
LVL 26

Author Comment

by:redmondb
ID: 24094003
Folks,

First of all. many thanks to all for the suggestions and apologies for my delay in responding.

Mechanic_Kharkov, ironically, not only is AutoIt a tool I've use for quite a while,but I actually used it in my PDFCreator attempt (to cope with an annoying Excel DDE time-out message). While AutoIt can be hit and miss for a complicated series of dialogues, as soon as I saw your suggestion, it reminded me that Abby's Automation functionality allows the creation of a batch job which prompts for a list of input files and then automatically carries out all the remaining steps. So an AutoIT script to run an Abby batch job would be straightforward - the only non-trivial bit being the processing of the File Open dialogue, which I've sucessfully done before.  The only issue would be in recognising that the run had completed (so I could start the batch again for the next group of files). This would require me to recognise that the progress dialogue's Fast Visible Text had changed to the following...
        Close
        Process completed
        The following errors occurred:
I've never used FVT before, so could you suggest, please, some AutoIt code that would detect this?

techhealth, best wishes on the search, but khkremer comment doesn't sound encouraging so I'd be concerned that it would be a waste of your time. (FWIW, this kind of solution was my ideal, but it was my failure to find a way to do it that lead me here in the first place.)

khkremer...
 - thanks for the warning to us about the SDK.
 - I've never written anything in Java, so that would very much be a last resort for me.
 - Sorry, I perhaps didn't make my needs clear. The aim isn't to produce a single super-pdf, but rather to create a number of them with varying numbers of pages (from 20 to more than 1000 pages). "Iterative" running of PDFTK might still be a possibility, but I was surprised to see you mention passing as many as a hundred files per run as I never thought that the command line could be that long. However, from a bit of googling, you're dead right, in fact the limit seems to be 8k. If I take one document's files and rename them (1.tif, 2. tif, etc.) I could PDFTK more than a thousand files on a single run. I'll do a test over the weekend to see that PDFTK is happy with this and also to get an idea of the % of files larger than that.

Regards to all,
Brian.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 11

Expert Comment

by:techhealth
ID: 24097029
I took a brief look at SDK and realized this is how you use it in your scenario: run JavaScript inside Acrobat.  Acrobat is a full featured JavaScript host, which has no problem dealing with the file system or other external resources.  SDK provides the documentation on the JavaScript API/object model/methods to carry out  tasks.  They even have code examples on combining files in different formats into one PDF file.

You already have Acrobat, and the SDK can be downloaded (documentations can be viewed online too) so the only prerequisite is JavaScript.   If you're relatively well versed in JavaScript, you should be able to pick it up pretty quickly.
0
 
LVL 5

Accepted Solution

by:
Mechanic_Kharkov earned 250 total points
ID: 24169429
"The only issue would be in recognising that the run had completed (so I could start the batch again for the next group of files). This would require me to recognise that the progress dialogue's Fast Visible Text had changed to the following...        Close        Process completed        The following errors occurred:I've never used FVT before, so could you suggest, please, some AutoIt code that would detect this?"

I have no acrobat but I have created a little app to play with. It shows command line parameter and after about 2.5 sec changes text on the form. The script below is to run that stub application.

Script au3 is also present in the attached archive.

File WaitForDialogTextChange.zip (206 KB) uploaded
Your Download-Link #1:http://rapidshare.de/files/46780014/WaitForDialogTextChange.zip.html


for $i = 1 to 3
 

	$Filename = "filename_#" & String ($i) ;compose fake name
 

	if ShellExecute("StubAppWasteTime.exe", $Filename) <> 1 Then Exit

		

	;initial wait for window init

	WinWaitActive("[TITLE:Stub Application; CLASS:TMainForm]")

	;Here possibly some extra work with this window

	;...
 

	ToolTip("Start wait for text")
 

	;now wait for desired text in window

	Do

		Sleep(100)

		$Text = WinGetText("[TITLE:Stub Application; CLASS:TMainForm]","")

	Until (StringInStr($Text, "Process completed") <> 0) or ($Text == 0)
 

	if $Text == 0 Then Exit;  ;window not found

		

	ToolTip("") ;clear ToolTip
 

	;MsgBox(0, "Text read was:", $Text)
 

	;then click Ok button

	ControlClick("[TITLE:Stub Application; CLASS:TMainForm]", "", "[CLASS:TButton; INSTANCE:2]")
 

	ToolTip("Start wait for window to close")
 

	;wait for window destroying

	while WinExists ("[TITLE:Stub Application; CLASS:TMainForm]") == 1

		Sleep(100)

	WEnd
 

	ToolTip("")
 

Next

Open in new window

0
 
LVL 25

Assisted Solution

by:SStory
SStory earned 250 total points
ID: 30615533
Below is my code for calling the free GNU library for creating PDF's

I installed that library from:
http://sourceforge.net/projects/gnuwin32/files/tiff/

The version I installed at the time was:
tiff-win32-3.6.1-2.exe
   Private Function CreatePDFFromTiff() As Boolean
        Dim OutputPath As String
        Dim args As String
        Dim psi As ProcessStartInfo
        Dim P As Process
        Try
            OutputPath = Chr(34) & PDF_DOC_PATH & Chr(34)
            args = "-o " & OutputPath & " " & Chr(34) & MULTIPAGE_TIFF_DOC_PATH & Chr(34)
            psi = New ProcessStartInfo("c:\program files\gnuwin32\bin\tiff2pdf.exe", args)
            psi.CreateNoWindow = True
            P = System.Diagnostics.Process.Start(psi)
            P.WaitForExit()
            Status(OutputPath & " PDF file was created")
            Return True
        Catch ex As Exception
            Status("ERROR creating PDF: " & ex.Message)
            bErrors = True
            Return False
        End Try
    End Function

Open in new window

0
 
LVL 26

Author Comment

by:redmondb
ID: 32969466
First of all, my apologies to all concerned that I lost track of this question and only came across it when I found the "Pending Closure" message.

I don't know if this is possible, but ideally I'd like to increase the points on this to 1000 and split it between Mechanic_Kharkov and SStory. Is this possible?

Thanks,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969498
please see my previous comment.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969499
please see my previous comment.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969504
please see my previous comment.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969512
Sorry for the multiple posts - the site apparently doesn't support Opera for submitting Objections.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32983289
Thanks, Vee-Mod. Apologies again for losing this.

Regards,
redmondb
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

This article focuses on how to remove password security from multiple PDF files by Adobe Acrobat program. Sometimes it is essential to access the stored data items and to print, edit as well as copy content from Portable Document Format files in abs…
PaperPort 14.5 Patch 1 update is often not detected or downloaded automatically. This article provides direct download links to solve the problem for retail (non-bundled) versions of the Standard and Professional editions, as well as the Professiona…
This video is the first in a two-part series that discusses PaperPort's "Send To Bar" feature . This first video tutorial explains the purpose of the Send To Bar, how to use it, and how to hide unwanted items that are automatically created on it whe…
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now