Link to home
Start Free TrialLog in
Avatar of redmondb
redmondbFlag for Afghanistan

asked on

How do I merge 100,000's of TIFF's into Multiple PDF's?

Folks,

I have hundred's of thousands of TIFF's that I need to merge in multiple PDF's (i.e. lots of scanned pages that have to be merged to recreate their original documents). I have an Excel spreadsheet listing the TIFFs' names and locations and also indicating which PDF file they belong in, (e.g. Tiif's 1-10 are the 1st PDF, 11-75 the 2nd, etc.).

I was happily running a VBA macro using PDFCreator to do this. Unfortunately, PDFCreator appears somewhat tempremental...
 - after much mucking about I'm still not absolutely confident that pages will maintain their original sequence.
 - for no reason I could ever identify, PDFCreator started producing huge PDF's with all pages in landscape. Each time this happened I had to uninstall and reinstall it.

I tried using ABBY (FineRead 8 Pro). It's good for merging all the TIFF's in a folder into a single PDF, but I would have to select each folder manually and volumes are simply too great for this.

I then experimented with PDFTK (having converted individual TIFF's into single-page PDF's). The problem with this was that there were too many files to specify on the command line and using wildcards doesn't guarantee that the page sequence will be correct.

So ...
(1) Is there a bullet-proof way to safely control PDFCreator from VBA?
(2) Is there a better/safer alternative using any mixture of the following...
  - Acrobat 8 Standard.
  - ABBY FineReader 8 Professional Edition.
  - PDFTK.
  - Excel 2007.
  - Windows Scripting.
  - I could probably get access to Omnipage (a recent full version, but I don't know the number). Not my preferred solution, as I don't have a license, so I'd have to use a colleague's PC after hours.

I'm running XP SP2. The TIFF's are currently in a small number of humungous folders, but I'd have no problem in moving them so that each document's TIFF's were in their own sub-folder.

Many Thanks,
Brian.
Avatar of techhealth
techhealth
Flag of United States of America image

I had no experience with ABBY, but from what you described ABBY would be the best choice, since it works as expected when dealing with TIFFs in a single folder.  Then all you need to do is running a script/VBA to put related TIFFs into separate folders.  The script would read the Excel file, create the necessary list of folders, and put related TIFFs into each folder.  Then you can either in the same script to invoke ABBY (is it command-line capable?) on each folder to create the PDFs, or have a separate script to do that for easier debugging.  You can also use the script to do any kind of post-processing, e.g., moving the PDFs to some other location.
Avatar of redmondb

ASKER

Thanks, techhealth, but I'm afraid (my version of) ABBY doesn't have that kind of command-line processing.

Regards,
Brian
Easy to learn scripting tool with ability to press buttons inside any application (even it has no automation abilities), enter texts in dialog boxes, etc. - AutoIt.
http://www.autoitscript.com/autoit3/index.shtml

Easy to understand, nice to use. Quick automate any of Your favorite software. Just try.
Use iText, the library that was used to create pdftk. All you need is somebody who knows how to program in Java. There are enough examples available online to so that you can create an application that can merge all the files.

Another option would be to run pdftk in batches: Run it on a limited number of files (so that you can specify all of them on the command line. You will end up with a number of files that all have let's say 100 pages. In the second go around you merge 100 of those files together and then you add a third round to come up with the final document.
Ever checked out the SDK from Adobe?  I think that has some nice tools you can use, including command-line tools.  But I haven't looked at it for long so not sure.  Will try to find some more details...
The SDK does not contain any tools that would be useful in this case. The SDK gives you the tools to create a application that you could use to merge these files, but without programming, it does not help the asker.
If windows scripting is not terrible to You, why don't You try very similar to VB scripting engine of AutoIt? I showed link above. This tool is free, but powerful. You could write Your own script to control any of listed above software within it's user interface (sending keystrokes or even clicking mouse buttons in desired positions). In the script You can manage Your files as You need, and can arrange filenames to process with any desired loops. So, just read samples, and You'll like it.
Folks,

First of all. many thanks to all for the suggestions and apologies for my delay in responding.

Mechanic_Kharkov, ironically, not only is AutoIt a tool I've use for quite a while,but I actually used it in my PDFCreator attempt (to cope with an annoying Excel DDE time-out message). While AutoIt can be hit and miss for a complicated series of dialogues, as soon as I saw your suggestion, it reminded me that Abby's Automation functionality allows the creation of a batch job which prompts for a list of input files and then automatically carries out all the remaining steps. So an AutoIT script to run an Abby batch job would be straightforward - the only non-trivial bit being the processing of the File Open dialogue, which I've sucessfully done before.  The only issue would be in recognising that the run had completed (so I could start the batch again for the next group of files). This would require me to recognise that the progress dialogue's Fast Visible Text had changed to the following...
        Close
        Process completed
        The following errors occurred:
I've never used FVT before, so could you suggest, please, some AutoIt code that would detect this?

techhealth, best wishes on the search, but khkremer comment doesn't sound encouraging so I'd be concerned that it would be a waste of your time. (FWIW, this kind of solution was my ideal, but it was my failure to find a way to do it that lead me here in the first place.)

khkremer...
 - thanks for the warning to us about the SDK.
 - I've never written anything in Java, so that would very much be a last resort for me.
 - Sorry, I perhaps didn't make my needs clear. The aim isn't to produce a single super-pdf, but rather to create a number of them with varying numbers of pages (from 20 to more than 1000 pages). "Iterative" running of PDFTK might still be a possibility, but I was surprised to see you mention passing as many as a hundred files per run as I never thought that the command line could be that long. However, from a bit of googling, you're dead right, in fact the limit seems to be 8k. If I take one document's files and rename them (1.tif, 2. tif, etc.) I could PDFTK more than a thousand files on a single run. I'll do a test over the weekend to see that PDFTK is happy with this and also to get an idea of the % of files larger than that.

Regards to all,
Brian.
I took a brief look at SDK and realized this is how you use it in your scenario: run JavaScript inside Acrobat.  Acrobat is a full featured JavaScript host, which has no problem dealing with the file system or other external resources.  SDK provides the documentation on the JavaScript API/object model/methods to carry out  tasks.  They even have code examples on combining files in different formats into one PDF file.

You already have Acrobat, and the SDK can be downloaded (documentations can be viewed online too) so the only prerequisite is JavaScript.   If you're relatively well versed in JavaScript, you should be able to pick it up pretty quickly.
ASKER CERTIFIED SOLUTION
Avatar of Mechanic_Kharkov
Mechanic_Kharkov
Flag of Ukraine image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
First of all, my apologies to all concerned that I lost track of this question and only came across it when I found the "Pending Closure" message.

I don't know if this is possible, but ideally I'd like to increase the points on this to 1000 and split it between Mechanic_Kharkov and SStory. Is this possible?

Thanks,
redmondb
please see my previous comment.

Regards,
redmondb
please see my previous comment.

Regards,
redmondb
please see my previous comment.

Regards,
redmondb
Sorry for the multiple posts - the site apparently doesn't support Opera for submitting Objections.

Regards,
redmondb
Thanks, Vee-Mod. Apologies again for losing this.

Regards,
redmondb