<

Batch Conversion of PDF, TIFF, and Other Image Formats via Command Line Interface to PDF, PDF Searchable, and TIFF with Power PDF Advanced

Published on
53,952 Points
9,352 Views
6 Endorsements
Last Modified:
Approved
Joe Winograd, Fellow&MVE
50+ years in computer industry. Everything from development to sales. CIO. Document imaging. EE MVE 2015, EE MVE 2016, EE FELLOW 2017.
This article explains how to perform batch conversion of PDF, TIFF, and other image file formats into PDF, PDF Searchable, and TIFF files via a command line interface, using Nuance's latest document imaging software — Power PDF Advanced.

Power PDF is the newest product from the Document Imaging division of Nuance Communications. It is available in two editions — Power PDF Standard and Power PDF Advanced. Nuance offers a free 30-day trial of the Advanced edition, available for download here:

http://www.nuance.com/for-business/imaging-solutions/document-conversion/power-pdf-converter/free-trial/index.htm


For whatever reason, Nuance does not offer a free trial of the Standard edition. Everything in this article about Version 1.0 is based on my experience with the free trial of the Advanced edition running in W7 Pro 64-bit; everything about Version 1.1 is based on the licensed Power PDF Advanced, also on W7 Pro 64-bit.


Disclaimer before going further: I have no affiliation with Nuance and no financial interest in it whatsoever. I am simply a happy user/customer of several of its document imaging products, including OmniPage, PaperPort, and PDF Converter (the latter is being replaced by Power PDF).


Update on 22-Feb-2015: The initial submission of this article was about version 1.0 of Power PDF Advanced. Nuance recently released version 1.1, but since EE members may still have 1.0, I decided not to delete the information about 1.0, but instead added a section at the bottom of the article about 1.1.

End of update.


Update on 18-Jun-2015: Although undocumented in the Help text, I just discovered that the input files may be other image file types, including GIF, JPG, and PNG (the Help text documents only PDF and TIF).

End of update.


Update on 23-Aug-2017: Since my previous article update about version 1.1, Nuance released 1.2 in February 2016, 2.0 in June 2016, and 2.1 in May 2017. Some key points about these versions:


(1) The list of OCR languages is identical in all five releases (1.0, 1.1, 1.2, 2.0, 2.1).


(2) The Help text in both 1.2 and 2.0 is identical to the Help text in 1.1, which is shown below in the 1.1 section of the article.


(3) The Help text in 2.1 has three differences from 1.1/1.2/2.0, namely:

• the -T output type parameter has an additional choice: TXT (thanks to member Amine BEN DHIAB for reporting this in the comments section below on 16-Jun-2017)

• the -V version parameter for PDF output has five additional choices: X1A, X3, X4, X4P, PDFE

• there's a new -Y option for PDF output, which means convert to gray

In terms of big differences between the Advanced edition of 1.x and 2.x, here's a Nuance article discussing it:
What's new in Power PDF Advanced 2.0?

End of update.


The Graphical User Interface (GUI) of Power PDF Advanced (PPA) utilizes the Ribbon interface, made popular by Microsoft in Office 2007 and continued in all Office releases since then. The Home section of PPA's Ribbon looks like this:


PPA Home Ribbon

The Advanced Processing section of the Ribbon has a Batch Converter function:

 

PPA Batch Converter GUI

This is an excellent feature that provides for batch conversion of PDF and TIFF files into PDF, Searchable PDF (via built-in OCR), TIFF, and Multipage TIFF. But as I was putting the product through its paces during the 30-day trial, I wondered if there is a Command Line Interface (CLI) for batch conversion. Nothing is documented in the offline or online help about a CLI for the batch converter, and Nuance hasn't (yet) released a user guide/manual for the product. So I took a look at the installation folder "c:\Program Files (x86)\Nuance\Power PDF\" to see if anything jumped out with respect to a CLI for batch conversion, and sure enough, found "BatchConverter.exe" and "BatchConverter.com" executables. At first I thought they were there to support the [Batch>Batch Controls>Batch Converter] item shown on the Ribbon (in the screenshot above), but then I decided to run "BatchConverter" in a Command Prompt in the off-chance that it provides a CLI. Jackpot! I gave it the standard "/?" parameter for help and, lo and behold, this came back (turns out that "-?" also works):


(See important note after this Help text — spoiler alert: batchconverter.exe is wrong — batchconverter.com is correct.)


Usage:


batchconverter.exe -I{inputfile} -O{outputfile} -T{output type} [-A{PDF/A format}] [-R] [-Q] [-P{page number}] [-S{colorspace}] [-D{resolution}] [-Cm{cmpmono}] [-Cg{cmpgray}] [-Cc{cmpcolor}] [-L{lang1, lang2, lang3}] [-J{char}] [-K] [-Ao] [-As] [-?] [-F{logfile}]


[JW Comment: The actual help output uses the less-than and greater-than symbols in the syntax, but they cause display problems in the article, so I changed them above to the opening-brace and closing-brace symbols.]


-I  input file full path. * can be used for filename (*.pdf, *.tif)

 -O  output file or folder full path.

 -T  output type: TIF|PDF|PDFS

 -A  PDF/A format: 1A|1B|2A|2B|2U|3A|3B|3U. Default:Non-PDF/A

 -R  recursive processing, subfolders under the input will be processed too. The input folder structure will be created under the output folder.

 -Q  quiet processing, no password dialogs will be showed thus password protected pdf files will not be converted successfully.

 -P  process single page

 -?  Shows usage info and the list of languages + 3letter language identifiers.

 Tiff output specific parameters

 -S  colorspace: auto|mono|gray|rgb. Default: auto.

 -D  resolution: auto|72|96|150|200|300|600|1200|2400. Default: auto.

 -Cm  image compression for monochrome images: none|g3|g4|lzw. Default: g4.

 -Cg  image compression for grayscale images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 -Cc  image compression for color images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 Searchable PDF output specific parameters

 -L  Language(s) for OCR. Comma delimited list of 3letter language identifiers (-Leng,ger,fre,nor). If not presented, auto is used.

 -J  Reject char. Default: ~.

 -K  If presented, conversion keeps original images.

 -Ao  If presented, pages are oriented and straightened automatically.


That's not the end of the help output — it continues with a list of 123 languages for OCR support. To avoid scrolling for many screens in this article, I removed the list of languages from the help output above and placed it in the attached text file called "Power-PDF-Advanced-OCR-Languages.txt".


Important Note:   BatchConverter.exe is the Batch Converter GUI, while BatchConverter.com is the Batch Converter CLI. If you are in a Command Prompt in the folder containing both files and enter the command BatchConverter (that is, without the .com or .exe extension) the CLI will run and work as expected. However, if you enter the command BatchConverter.exe, it will not run the CLI. It won't even recognize the /? or -? parameter. The syntax in the help output shown in the "Usage:" section above is incorrect! It says the command is batchconverter.exe, but it should say batchconverter.com.


I then ran numerous tests and it worked flawlessly! Here's an example that converts an image-only PDF to a searchable PDF (-TPDFS), keeping the images in the output file (-K), and using English for the OCR (-Leng):


batchconverter -Id:\testin\testimage.pdf -Od:\testout\testsearch.pdf -TPDFS -K -Leng


All of the parameters provide extremely useful features, but in my opinion, these stand out:

 

 Creation of a PDF Searchable file (the -TPDFS option), with an additional option (-K) that keeps the images.

 

 Processing subfolders to an unlimited depth and automatically replicating the input folder structure under the output folder, called recursive processing (the -R option).

 

 A wide variety of compression methods for TIFF files, with separate methods for monochrome (-Cm), grayscale (-Cg), and color (-Cc) images.

 

 An amazing choice of languages for OCR (the -L option).

As mentioned above, I was not able to find any documentation on the CLI, except for the output from the -? parameter. Everything I know about the Batch Converter CLI is from the command itself — and experimentation with it. So if anyone reading this article learns anything interesting about it, please post it here for all to see.


In conclusion, PPA's CLI provides capabilities that cannot be achieved in the GUI. It may be called from batch files, scripts, and custom programs. I'm looking forward to using it and I hope this article documenting it helps some other EE members — and if you do find this article to be helpful, please click the big green Vote this article as helpful button below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe


Update for Version 1.1


This is the Help text for Version 1.1 (see important note after this output) :


Usage:

 batchconverter.com -I{inputfile} -O{outputfile} -T{output type} [-R] [-Q] [-P{page number}]
 [-S{colorspace}] [-D{resolution}] [-Cm{cmpmono}] [-Cg{cmpgray}] [-Cc{cmpcolor}] [-V{version}]
 [-W] [-G] [-E{sequence name}] [-L{lang1, lang2, lang3}] [-J{char}] [-K] [-Ao] [-As] [-?] [-F{logfile}]


[JW Comment: The actual help output uses the less-than and greater-than symbols in the syntax, but they cause display problems in the article, so I changed them above to the opening-brace and closing-brace symbols.]


-I  input file full path. * can be used for filename (*.pdf, *.tif)

 -O  output file or folder full path.

 -T  output type: TIF|PDF|PDFS

 -R  recursive processing, subfolders under the input will be processed too. The input folder structure will be created under the output folder.

 -Q  quiet processing, no password dialogs will be showed thus password prote cted pdf files will not be converted successfully.

 -P  process single page

 -?  Shows usage info and the list of languages + 3letter language identifiers.

 Tiff output specific parameters

 -S  colorspace: auto|mono|gray|rgb. Default: auto.

 -D  resolution: auto|72|96|150|200|300|600|1200|2400. Default: auto.

 -Cm  image compression for monochrome images: none|g3|g4|lzw. Default: g4.

 -Cg  image compression for grayscale images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 -Cc  image compression for color images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 PDF output specific parameters

 -V  Version: 1A|1B|2A|2B|2U|3A|3B|3U|1.3|1.4|1.5|1.6|1.7|auto

 -W  Optimize for web viewing

 -G  Remove tags

 -E  Execute sequence

 Searchable PDF output specific parameters

 -L  Language(s) for OCR. Comma delimited list of 3letter language identifiers (-Leng,ger,fre,nor). If not presented, auto is used.

 -J  Reject char. Default: ~.

 -K  If presented, conversion keeps original images.

 -Ao  If presented, pages are oriented and straightened automatically.


Important Note:  Although not stated in the Help output above (which shows only PDF and TIF), I discovered through experimentation that the input file type may also be GIF, JPG, JPEG, PNG, and TIFF. In other words, the -I (Input) parameter should say this:


-I  input file full path. * can be used for filename (*.gif, *.jpg, *.jpeg *.pdf, *.png, *.tif, *.tiff)


I tried BMP, but that did not work. If readers discover other file types that work, please post a comment to let us know.


As with 1.0, the 1.1 Help output includes the long list of languages, which I omitted above. I compared the 1.0 and 1.1 language lists — they are identical.


Note that the syntax error documented in the 1.0 portion of the article has been fixed in 1.1, that is, the Usage output now shows the correct  batchconverter.com (rather than the incorrect  batchconverter.exe of 1.0).


One very important bug that is relevant to this article has been fixed in 1.1, namely, the -P parameter (to process a single page) in conjunction with an output type of PDF or PDFS. This did not work in 1.0 — it created an output PDF file with all of the pages of the input file rather than just the one page specified (it worked if the output type was TIF). The -P parameter now works for all output types in 1.1.


The Release Notes for 1.1, including a list of new features, a list of known issues, and a list of bug fixes, are in a PDF file called Nuance Power PDF 1.1 Release Notes in the Files section of this wiki.


If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe


Power-PDF-Advanced-OCR-Languages.txt

 

6
Comment
  • 9
  • 2
  • 2
  • +3
17 Comments
 
 

Administrative Comment

by:lherrou
Joe, in the new system, there is "Status" and "State". They are different. It's more complicated than it appears on the surface.
0
 

Expert Comment

by:Edward Quicksall
I tried "C:\Program Files (x86)\Nuance\Power PDF\batchconverter.exe" -I * -O ..\..\ -T PDF but nothing happens. Executing just "C:\Program Files (x86)\Nuance\Power PDF\batchconverter.exe" and filling in the parameters does work. I have also tried "C:\Program Files (x86)\Nuance\Power PDF\batchconverter.exe" -I g:\EPSCAN\001\* -O g:\ -T PDF (with and without spaces after the dash tags.
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Edward,
Please read my article carefully. You should not be executing batchconverter.exe — should be batchconverter.com (see the Important Note  after the V1.0 help output). Regards, Joe
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 

Expert Comment

by:Edward Quicksall
Thanks, "C:\Program Files (x86)\Nuance\Power PDF\batchconverter.com" -Ig:\EPSCAN\001\*.pdf -Og:\ -TPDF works just fine. Sorry I didn't notice .com.
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
You're welcome, Edward. I'm glad that it's working for you now. If you take a moment to click the thumbs-up icon at the end of the article, I'll appreciate it. Regards, Joe
1
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Edward,
Sorry to bug you, but that was the wrong thumbs-up icon. That one was to endorse that particular comment, rather than the article. Please look at the end of the article (not the end of the comments) and you'll see a thumbs-up icon that currently has a 5 in it — hopefully, your endorsement will be 6. :)  Thanks, Joe
0
 

Expert Comment

by:Amine BEN DHIAB
Excellent article, very usefull
I add, output type can be TXT also
1
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Amine,
Thanks for joining EE today, reading my article, and providing such nice compliments — I appreciate it!

I tried the output type of TXT and it did not work for me (I'm running Power PDF Advanced Version 2.0 — haven't upgrade to 2.1 yet). Here are the results of my attempt to make the output type TXT:

C:\Program Files (x86)\Nuance\Power PDF>batchconverter.com -Ic:\temp\TestIn.pdf -Oc:\temp\TestOut.txt -TTXT
Unknown output format: TXT
No output format specified.

Open in new window

Perhaps I'm not understanding what you mean — are you doing something different from that?
Regards, Joe
0
 

Expert Comment

by:Amine BEN DHIAB
Hello Joe,

Yes, I used this syntax but with version 2.1

Thanks
1
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Thanks, Amine, that's good to know. Upgrading to 2.1 is on my to-do list. When I've done the upgrade, I'll try it again and post back here with the results. Regards, Joe
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Amine,
I just upgraded to Power PDF Advanced Version 2.1 and am happy to confirm your comment! Version 2.1 does, indeed, allow conversion to text files by using the -TTXT command line option. Thank you very much for making me aware of this — I really appreciate it! Feedback like yours from members helps to improve articles, making them far more valuable to the EE community. My last update to this article was about Version 1.1, but there have been three upgrades since then — 1.2, 2.0, and 2.1. I plan to update this article with additional information about these newer versions after I've had more experience with the latest V2.1. Thanks again, Amine, for the excellent feedback. Regards, Joe
0
 

Expert Comment

by:Willem van der Plas
Dear mr. Winograd, hello Joe,
I have been seeking desperately for a way to batch compress thousands of PDF files in my library, which has many folders and subfolders and sub-subfolders, until now in vain. So your article is great news for me.
My question: is the batchconverter.com "trick" also for the PowerPDF standard version, or for the advanced version alone?
Thanks for helping me out, so I can decide which version I buy!
Regards
Willem
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Willem,
Thanks for joining EE a week ago and reading my article — I appreciate that! I apologize for not responding sooner. I'm very glad to hear that my article is great news for you.

I have only the Advanced edition, so I can't test this for you, but based on the documentation, my understanding is that the command line batchconverter.com is available only in the Advanced edition. The Standard edition does have some batch capabilities, but not that one. Here's a link to the Nuance site comparing the features of the editions:
Power PDF Standard and Power PDF Advanced Comparison Chart

Also, there's a very good price now at Amazon for the Advanced Edition — just $89.00:
https://www.amazon.com/Power-PDF-Advanced-2-0-Retail/dp/B01LWC9TKP

That's for version 2.0, but there's a free upgrade to 2.1. At that price, I recommend Advanced — it is actually less expensive than Standard, which is still $99.99 at Amazon (and at the Nuance store).

Be careful of lower prices — they may be for version 1.x, which cannot be upgraded to 2.x. Regards, Joe
0
 

Expert Comment

by:Willem van der Plas
Thanks for your outstanding help, Joe!
All the best from Europe,
Willem
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
You're very welcome, Willem. Unfortunately, I don't see that great deal on PPA2.0 at the Amazon UK site and don't know if you'll find it any Amazon EU site. Also, no deal on it at the Nuance UK site. Both sites have it at £139.99 — list price. :(  Good luck shopping! Cheers, Joe
0
 

Expert Comment

by:Chris S
Hello,

i tried to use the command line but i always get the error message "File open error" although I have write to all directories stated in the command?! It always says "Converting H:\TEST.HTML TO H:\TEST.PDF" which I think means that the Syntax itself is correct but either the Input or Output file cannot be opened?
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Chris,
First, thanks for joining Experts Exchange today and reading my article — I appreciate it!

The input file cannot be an HTML file type. The Help output, even in the latest version 2.1 of Power PDF Advanced, still says this:

-I input file full path. * can be used for filename (*.pdf, *.tif)

As I mentioned in the article, I discovered through experimentation that the input file type may also be GIF, JPG, JPEG, PNG, and TIFF. But I just tried HTML in both v2.0 and v2.1, and can confirm that the input file may not be HTML. As you saw yourself, it gives an error message that says "File open error." My advice is to open the HTML file in whatever web browser you prefer and then print it to whatever PDF print driver you prefer. Once you have the PDF file, run Power PDF again, this time using the PDF file as the input instead of the HTML file. Of course, you may not even need to do that if you're happy with the file from the PDF printer. Regards, Joe
0

Featured Post

Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

Join & Write a Comment

This Micro Tutorial will teach you how to add a cinematic look to any film or video out there. There are very few simple steps that you will follow to do so. This will be demonstrated using Adobe Premiere Pro CS6.
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month