[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

How do I merge 100,000's of TIFF's into Multiple PDF's?

Posted on 2009-04-04
19
Medium Priority
?
1,176 Views
Last Modified: 2012-05-06
Folks,

I have hundred's of thousands of TIFF's that I need to merge in multiple PDF's (i.e. lots of scanned pages that have to be merged to recreate their original documents). I have an Excel spreadsheet listing the TIFFs' names and locations and also indicating which PDF file they belong in, (e.g. Tiif's 1-10 are the 1st PDF, 11-75 the 2nd, etc.).

I was happily running a VBA macro using PDFCreator to do this. Unfortunately, PDFCreator appears somewhat tempremental...
 - after much mucking about I'm still not absolutely confident that pages will maintain their original sequence.
 - for no reason I could ever identify, PDFCreator started producing huge PDF's with all pages in landscape. Each time this happened I had to uninstall and reinstall it.

I tried using ABBY (FineRead 8 Pro). It's good for merging all the TIFF's in a folder into a single PDF, but I would have to select each folder manually and volumes are simply too great for this.

I then experimented with PDFTK (having converted individual TIFF's into single-page PDF's). The problem with this was that there were too many files to specify on the command line and using wildcards doesn't guarantee that the page sequence will be correct.

So ...
(1) Is there a bullet-proof way to safely control PDFCreator from VBA?
(2) Is there a better/safer alternative using any mixture of the following...
  - Acrobat 8 Standard.
  - ABBY FineReader 8 Professional Edition.
  - PDFTK.
  - Excel 2007.
  - Windows Scripting.
  - I could probably get access to Omnipage (a recent full version, but I don't know the number). Not my preferred solution, as I don't have a license, so I'd have to use a colleague's PC after hours.

I'm running XP SP2. The TIFF's are currently in a small number of humungous folders, but I'd have no problem in moving them so that each document's TIFF's were in their own sub-folder.

Many Thanks,
Brian.
0
Comment
Question by:redmondb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 3
  • 3
  • +2
19 Comments
 
LVL 11

Expert Comment

by:techhealth
ID: 24069326
I had no experience with ABBY, but from what you described ABBY would be the best choice, since it works as expected when dealing with TIFFs in a single folder.  Then all you need to do is running a script/VBA to put related TIFFs into separate folders.  The script would read the Excel file, create the necessary list of folders, and put related TIFFs into each folder.  Then you can either in the same script to invoke ABBY (is it command-line capable?) on each folder to create the PDFs, or have a separate script to do that for easier debugging.  You can also use the script to do any kind of post-processing, e.g., moving the PDFs to some other location.
0
 
LVL 26

Author Comment

by:redmondb
ID: 24069352
Thanks, techhealth, but I'm afraid (my version of) ABBY doesn't have that kind of command-line processing.

Regards,
Brian
0
 
LVL 5

Expert Comment

by:Mechanic_Kharkov
ID: 24069947
Easy to learn scripting tool with ability to press buttons inside any application (even it has no automation abilities), enter texts in dialog boxes, etc. - AutoIt.
http://www.autoitscript.com/autoit3/index.shtml

Easy to understand, nice to use. Quick automate any of Your favorite software. Just try.
0
Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 24070983
Use iText, the library that was used to create pdftk. All you need is somebody who knows how to program in Java. There are enough examples available online to so that you can create an application that can merge all the files.

Another option would be to run pdftk in batches: Run it on a limited number of files (so that you can specify all of them on the command line. You will end up with a number of files that all have let's say 100 pages. In the second go around you merge 100 of those files together and then you add a third round to come up with the final document.
0
 
LVL 11

Expert Comment

by:techhealth
ID: 24071810
Ever checked out the SDK from Adobe?  I think that has some nice tools you can use, including command-line tools.  But I haven't looked at it for long so not sure.  Will try to find some more details...
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 24073291
The SDK does not contain any tools that would be useful in this case. The SDK gives you the tools to create a application that you could use to merge these files, but without programming, it does not help the asker.
0
 
LVL 5

Expert Comment

by:Mechanic_Kharkov
ID: 24074681
If windows scripting is not terrible to You, why don't You try very similar to VB scripting engine of AutoIt? I showed link above. This tool is free, but powerful. You could write Your own script to control any of listed above software within it's user interface (sending keystrokes or even clicking mouse buttons in desired positions). In the script You can manage Your files as You need, and can arrange filenames to process with any desired loops. So, just read samples, and You'll like it.
0
 
LVL 26

Author Comment

by:redmondb
ID: 24094003
Folks,

First of all. many thanks to all for the suggestions and apologies for my delay in responding.

Mechanic_Kharkov, ironically, not only is AutoIt a tool I've use for quite a while,but I actually used it in my PDFCreator attempt (to cope with an annoying Excel DDE time-out message). While AutoIt can be hit and miss for a complicated series of dialogues, as soon as I saw your suggestion, it reminded me that Abby's Automation functionality allows the creation of a batch job which prompts for a list of input files and then automatically carries out all the remaining steps. So an AutoIT script to run an Abby batch job would be straightforward - the only non-trivial bit being the processing of the File Open dialogue, which I've sucessfully done before.  The only issue would be in recognising that the run had completed (so I could start the batch again for the next group of files). This would require me to recognise that the progress dialogue's Fast Visible Text had changed to the following...
        Close
        Process completed
        The following errors occurred:
I've never used FVT before, so could you suggest, please, some AutoIt code that would detect this?

techhealth, best wishes on the search, but khkremer comment doesn't sound encouraging so I'd be concerned that it would be a waste of your time. (FWIW, this kind of solution was my ideal, but it was my failure to find a way to do it that lead me here in the first place.)

khkremer...
 - thanks for the warning to us about the SDK.
 - I've never written anything in Java, so that would very much be a last resort for me.
 - Sorry, I perhaps didn't make my needs clear. The aim isn't to produce a single super-pdf, but rather to create a number of them with varying numbers of pages (from 20 to more than 1000 pages). "Iterative" running of PDFTK might still be a possibility, but I was surprised to see you mention passing as many as a hundred files per run as I never thought that the command line could be that long. However, from a bit of googling, you're dead right, in fact the limit seems to be 8k. If I take one document's files and rename them (1.tif, 2. tif, etc.) I could PDFTK more than a thousand files on a single run. I'll do a test over the weekend to see that PDFTK is happy with this and also to get an idea of the % of files larger than that.

Regards to all,
Brian.
0
 
LVL 11

Expert Comment

by:techhealth
ID: 24097029
I took a brief look at SDK and realized this is how you use it in your scenario: run JavaScript inside Acrobat.  Acrobat is a full featured JavaScript host, which has no problem dealing with the file system or other external resources.  SDK provides the documentation on the JavaScript API/object model/methods to carry out  tasks.  They even have code examples on combining files in different formats into one PDF file.

You already have Acrobat, and the SDK can be downloaded (documentations can be viewed online too) so the only prerequisite is JavaScript.   If you're relatively well versed in JavaScript, you should be able to pick it up pretty quickly.
0
 
LVL 5

Accepted Solution

by:
Mechanic_Kharkov earned 1000 total points
ID: 24169429
"The only issue would be in recognising that the run had completed (so I could start the batch again for the next group of files). This would require me to recognise that the progress dialogue's Fast Visible Text had changed to the following...        Close        Process completed        The following errors occurred:I've never used FVT before, so could you suggest, please, some AutoIt code that would detect this?"

I have no acrobat but I have created a little app to play with. It shows command line parameter and after about 2.5 sec changes text on the form. The script below is to run that stub application.

Script au3 is also present in the attached archive.

File WaitForDialogTextChange.zip (206 KB) uploaded
Your Download-Link #1:http://rapidshare.de/files/46780014/WaitForDialogTextChange.zip.html


for $i = 1 to 3
 
	$Filename = "filename_#" & String ($i) ;compose fake name
 
	if ShellExecute("StubAppWasteTime.exe", $Filename) <> 1 Then Exit
		
	;initial wait for window init
	WinWaitActive("[TITLE:Stub Application; CLASS:TMainForm]")
	;Here possibly some extra work with this window
	;...
 
	ToolTip("Start wait for text")
 
	;now wait for desired text in window
	Do
		Sleep(100)
		$Text = WinGetText("[TITLE:Stub Application; CLASS:TMainForm]","")
	Until (StringInStr($Text, "Process completed") <> 0) or ($Text == 0)
 
	if $Text == 0 Then Exit;  ;window not found
		
	ToolTip("") ;clear ToolTip
 
	;MsgBox(0, "Text read was:", $Text)
 
	;then click Ok button
	ControlClick("[TITLE:Stub Application; CLASS:TMainForm]", "", "[CLASS:TButton; INSTANCE:2]")
 
	ToolTip("Start wait for window to close")
 
	;wait for window destroying
	while WinExists ("[TITLE:Stub Application; CLASS:TMainForm]") == 1
		Sleep(100)
	WEnd
 
	ToolTip("")
 
Next

Open in new window

0
 
LVL 25

Assisted Solution

by:SStory
SStory earned 1000 total points
ID: 30615533
Below is my code for calling the free GNU library for creating PDF's

I installed that library from:
http://sourceforge.net/projects/gnuwin32/files/tiff/

The version I installed at the time was:
tiff-win32-3.6.1-2.exe
   Private Function CreatePDFFromTiff() As Boolean
        Dim OutputPath As String
        Dim args As String
        Dim psi As ProcessStartInfo
        Dim P As Process
        Try
            OutputPath = Chr(34) & PDF_DOC_PATH & Chr(34)
            args = "-o " & OutputPath & " " & Chr(34) & MULTIPAGE_TIFF_DOC_PATH & Chr(34)
            psi = New ProcessStartInfo("c:\program files\gnuwin32\bin\tiff2pdf.exe", args)
            psi.CreateNoWindow = True
            P = System.Diagnostics.Process.Start(psi)
            P.WaitForExit()
            Status(OutputPath & " PDF file was created")
            Return True
        Catch ex As Exception
            Status("ERROR creating PDF: " & ex.Message)
            bErrors = True
            Return False
        End Try
    End Function

Open in new window

0
 
LVL 26

Author Comment

by:redmondb
ID: 32969466
First of all, my apologies to all concerned that I lost track of this question and only came across it when I found the "Pending Closure" message.

I don't know if this is possible, but ideally I'd like to increase the points on this to 1000 and split it between Mechanic_Kharkov and SStory. Is this possible?

Thanks,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969498
please see my previous comment.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969499
please see my previous comment.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969504
please see my previous comment.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32969512
Sorry for the multiple posts - the site apparently doesn't support Opera for submitting Objections.

Regards,
redmondb
0
 
LVL 26

Author Comment

by:redmondb
ID: 32983289
Thanks, Vee-Mod. Apologies again for losing this.

Regards,
redmondb
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PaperPort (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) is among the most important applications that I run on my Windows computers. I use it every day, for nearly all of my document and photo scanning, as well as most of my…
PaperPort has a feature called the "Send To Bar". It provides a convenient, drag-and-drop interface for using other installed software, such as Microsoft Office. However, this article shows that the latest Office 2016 apps (installed with an Office …
This video is the first in a two-part series that discusses PaperPort's "Send To Bar" feature . This first video tutorial explains the purpose of the Send To Bar, how to use it, and how to hide unwanted items that are automatically created on it whe…
In this second video of the Xpdf series, we discuss and demonstrate the PDFimages utility, which, in a single command, is able to extract all the images from a PDF file and save each one in a separate image file (PBM, PPM, or JPG). Download and inst…
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question