Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Batch Conversion of PDF, TIFF, and Other Image Formats via Command Line Interface to PDF, PDF Searchable, and TIFF with Power PDF Advanced

Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
Development•Sales
CIO•Document Imaging
EE — FELLOW 2017
MVE 2015,2016,2018
RENOWNED 2018,2019
CERTIFIED GOLD 2020
Published:
Updated:
Edited by: Andrew Leniart
This article explains how to perform batch conversion of PDF, TIFF, and other image file formats into PDF, PDF Searchable, and TIFF files via a command line interface, using Power PDF Advanced (originally from the Document Imaging Division of Nuance, now from Kofax).

Power PDF was from the Document Imaging division of Nuance Communications, now from Kofax (see 3-Dec-2019 article update below). It is available in two editions — Power PDF Standard and Power PDF Advanced. Kofax offers a free 30-day trial of the Advanced edition, available for download here:


https://www.kofax.com/Products/power-pdf/advanced-free-trial


And the Standard edition, available for download here:


https://www.kofax.com/Products/power-pdf/standard-free-trial


Everything in this article about Version 1.0 is based on my experience with the free trial of the Advanced edition running in W7 Pro 64-bit; everything about Version 1.1 and later is based on the licensed Power PDF Advanced, also on W7 Pro 64-bit.


Disclaimer before going further: I have no affiliation with Kofax or Nuance and no financial interest in them whatsoever. I am simply a happy user/customer of several of the document imaging products, including OmniPage, PaperPort, and PDF Converter (the latter is being replaced by Power PDF).


Update on 22-Feb-2015: The initial submission of this article was about version 1.0 of Power PDF Advanced. Nuance recently released version 1.1, but since EE members may still have 1.0, I decided not to delete the information about 1.0, but instead added a section at the bottom of the article about 1.1.

End of update.


Update on 18-Jun-2015: Although undocumented in the Help text, I just discovered that the input files may be other image file types, including GIF, JPG, and PNG (the Help text documents only PDF and TIF).

End of update.


Update on 23-Aug-2017: Since my previous article update about version 1.1, Nuance released 1.2 in February 2016, 2.0 in June 2016, and 2.1 in May 2017. Some key points about these versions:


(1) The list of OCR languages is identical in all five releases (1.0, 1.1, 1.2, 2.0, 2.1).


(2) The Help text in both 1.2 and 2.0 is identical to the Help text in 1.1, which is shown below in the 1.1 section of the article.


(3) The Help text in 2.1 has three differences from 1.1/1.2/2.0, namely:

• the -T output type parameter has an additional choice: TXT (thanks to member Amine BEN DHIAB for reporting this in the comments section below on 16-Jun-2017)

• the -V version parameter for PDF output has five additional choices: X1A, X3, X4, X4P, PDFE

• there's a new -Y option for PDF output, which means convert to gray

In terms of big differences between the Advanced edition of 1.x and 2.x, here's a Nuance article discussing it:
What's new in Power PDF Advanced 2.0?

End of update.


Update on 9-Aug-2018: Since my previous article update about versions 1.2, 2.0, and 2.1, Nuance released Version 3.0 on 5-Jun-2018. As with my prior article updates, I did not delete information about the earlier versions, instead adding a section at the bottom of the article about 3.0.

End of update.


Update on 3-Dec-2019: The Nuance Document Imaging Division, which is responsible for Power PDF, was acquired by Kofax. The press release announcing it is here:

Kofax Announces the Closing of its Acquisition of Nuance Document Imaging

Kofax recently released the first version of Power PDF under its stewardship — 3.1. As with my prior article updates, I did not delete information about the earlier versions, instead adding a section at the bottom of the article about 3.1.

End of update.


The Graphical User Interface (GUI) of Power PDF Advanced (PPA) utilizes the Ribbon interface, made popular by Microsoft in Office 2007 and continued in all Office releases since then. The Home section of PPA's Ribbon looks like this:


PPA Home Ribbon

The Advanced Processing section of the Ribbon has a Batch Converter function:

 

PPA Batch Converter GUI

This is an excellent feature that provides for batch conversion of PDF and TIFF files into PDF, Searchable PDF (via built-in OCR), TIFF, and Multipage TIFF. But as I was putting the product through its paces during the 30-day trial, I wondered if there is a Command Line Interface (CLI) for batch conversion. Nothing is documented in the offline or online help about a CLI for the batch converter, and Nuance hasn't (yet) released a user guide/manual for the product. So I took a look at the installation folder "c:\Program Files (x86)\Nuance\Power PDF\" to see if anything jumped out with respect to a CLI for batch conversion, and sure enough, found "BatchConverter.exe" and "BatchConverter.com" executables. At first I thought they were there to support the [Batch>Batch Controls>Batch Converter] item shown on the Ribbon (in the screenshot above), but then I decided to run "BatchConverter" in a Command Prompt in the off-chance that it provides a CLI. Jackpot! I gave it the standard "/?" parameter for help and, lo and behold, this came back (turns out that "-?" also works):


(See important note after this Help text — spoiler alert: batchconverter.exe is wrong — batchconverter.com is correct.)


Usage:


batchconverter.exe -I<inputfile> -O<outputfile> -T<output type>
[-A<PDF/A format>] [-R] [-Q] [-P<page number>]
[-S<colorspace>] [-D<resolution>] [-Cm<cmpmono>]
[-Cg<cmpgray>] [-Cc<cmpcolor>] [-L<lang1, lang2, lang3>]
[-J<char>] [-K] [-Ao] [-As] [-?] [-F<logfile>]


-I  input file full path. * can be used for filename (*.pdf, *.tif)

 -O  output file or folder full path.

 -T  output type: TIF|PDF|PDFS

 -A  PDF/A format: 1A|1B|2A|2B|2U|3A|3B|3U. Default:Non-PDF/A

 -R  recursive processing, subfolders under the input will be processed too. The input folder structure will be created under the output folder.

 -Q  quiet processing, no password dialogs will be showed thus password protected pdf files will not be converted successfully.

 -P  process single page

 -?  Shows usage info and the list of languages + 3letter language identifiers.

 Tiff output specific parameters

 -S  colorspace: auto|mono|gray|rgb. Default: auto.

 -D  resolution: auto|72|96|150|200|300|600|1200|2400. Default: auto.

 -Cm  image compression for monochrome images: none|g3|g4|lzw. Default: g4.

 -Cg  image compression for grayscale images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 -Cc  image compression for color images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 Searchable PDF output specific parameters

 -L  Language(s) for OCR. Comma delimited list of 3letter language identifiers (-Leng,ger,fre,nor). If not presented, auto is used.

 -J  Reject char. Default: ~.

 -K  If presented, conversion keeps original images.

 -Ao  If presented, pages are oriented and straightened automatically.


That's not the end of the help output — it continues with a list of 123 languages for OCR support. To avoid scrolling for many screens in this article, I removed the list of languages from the help output above and placed it in the attached text file called "Power-PDF-Advanced-OCR-Languages.txt".


Important Note:   BatchConverter.exe is the Batch Converter GUI, while BatchConverter.com is the Batch Converter CLI. If you are in a Command Prompt in the folder containing both files and enter the command BatchConverter (that is, without the .com or .exe extension) the CLI will run and work as expected. However, if you enter the command BatchConverter.exe, it will not run the CLI. It won't even recognize the /? or -? parameter. The syntax in the help output shown in the "Usage:" section above is incorrect! It says the command is batchconverter.exe, but it should say batchconverter.com.


I then ran numerous tests and it worked flawlessly! Here's an example that converts an image-only PDF to a searchable PDF (-TPDFS), keeping the images in the output file (-K), and using English for the OCR (-Leng):


batchconverter -Id:\testin\testimage.pdf -Od:\testout\testsearch.pdf -TPDFS -K -Leng


All the parameters provide extremely useful features, but in my opinion, these stand out:

 

 Creation of a PDF Searchable file (the -TPDFS option), with an additional option (-K) that keeps the images.

 

 Processing subfolders to an unlimited depth and automatically replicating the input folder structure under the output folder, called recursive processing (the -R option).

 

 A wide variety of compression methods for TIFF files, with separate methods for monochrome (-Cm), grayscale (-Cg), and color (-Cc) images.

 

 An amazing choice of languages for OCR (the -L option).

As mentioned above, I was not able to find any documentation on the CLI, except for the output from the -? parameter. Everything I know about the Batch Converter CLI is from the command itself — and experimentation with it. So if anyone reading this article learns anything interesting about it, please post it here for all to see.


In conclusion, PPA's CLI provides capabilities that cannot be achieved in the GUI. It may be called from batch files, scripts, and custom programs. I'm looking forward to using it and I hope this article documenting it helps some other EE members. If you do find this article to be helpful, please endorse it by clicking the thumbs-up icon at the end of the article. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe


Update for Version 1.1

This is the Help text for Version 1.1 (see important note after this output) :


Usage:


batchconverter.com -I<inputfile> -O<outputfile> -T<output type>
[-R] [-Q] [-P<page number>] [-S<colorspace>]
[-D<resolution>] [-Cm<cmpmono>] [-Cg<cmpgray>]
[-Cc<cmpcolor>] [-V<version>] [-W] [-G] [-E<sequence name>]
[-L<lang1, lang2, lang3>] [-J<char>] [-K] [-Ao] [-As] [-?] [-F<logfile>]


-I  input file full path. * can be used for filename (*.pdf, *.tif)

 -O  output file or folder full path.

 -T  output type: TIF|PDF|PDFS

 -R  recursive processing, subfolders under the input will be processed too. The input folder structure will be created under the output folder.

 -Q  quiet processing, no password dialogs will be showed thus password prote cted pdf files will not be converted successfully.

 -P  process single page

 -?  Shows usage info and the list of languages + 3letter language identifiers.

 Tiff output specific parameters

 -S  colorspace: auto|mono|gray|rgb. Default: auto.

 -D  resolution: auto|72|96|150|200|300|600|1200|2400. Default: auto.

 -Cm  image compression for monochrome images: none|g3|g4|lzw. Default: g4.

 -Cg  image compression for grayscale images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 -Cc  image compression for color images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

 PDF output specific parameters

 -V  Version: 1A|1B|2A|2B|2U|3A|3B|3U|1.3|1.4|1.5|1.6|1.7|auto

 -W  Optimize for web viewing

 -G  Remove tags

 -E  Execute sequence

 Searchable PDF output specific parameters

 -L  Language(s) for OCR. Comma delimited list of 3letter language identifiers (-Leng,ger,fre,nor). If not presented, auto is used.

 -J  Reject char. Default: ~.

 -K  If presented, conversion keeps original images.

 -Ao  If presented, pages are oriented and straightened automatically.


Important Note:  Although not stated in the Help output above (which shows only PDF and TIF), I discovered through experimentation that the input file type may also be GIF, JPG, JPEG, PNG, and TIFF. In other words, the -I (Input) parameter should say this:


-I  input file full path. * can be used for filename (*.gif, *.jpg, *.jpeg *.pdf, *.png, *.tif, *.tiff)


I tried BMP, but that did not work. If readers discover other file types that work, please post a comment to let us know.


As with 1.0, the 1.1 Help output includes the long list of languages, which I omitted above. I compared the 1.0 and 1.1 language lists — they are identical.


Note that the syntax error documented in the 1.0 portion of the article has been fixed in 1.1, that is, the Usage output now shows the correct  batchconverter.com (rather than the incorrect  batchconverter.exe of 1.0).


One very important bug that is relevant to this article has been fixed in 1.1, namely, the -P parameter (to process a single page) in conjunction with an output type of PDF or PDFS. This did not work in 1.0 — it created an output PDF file with all the pages of the input file rather than just the one page specified (it worked if the output type was TIF). The -P parameter now works for all output types in 1.1.


The Release Notes for 1.1, including a list of new features, a list of known issues, and a list of bug fixes, are in a PDF file called Nuance Power PDF 1.1 Release Notes in the Files section of this wiki.


Update for Version 3.0

This is the Help text for Version 3.0:


Usage:

batchconverter.com -I<inputfile> -O<outputfile> -T<output type>
[-R] [-Q] [-P<page number>] [-S<colorspace>]
[-D<resolution>] [-Cm<cmpmono>] [-Cg<cmpgray>]
[-Cc<cmpcolor>] [-V<version>] [-W] [-G] [-E<sequence name>]
[-L<lang1, lang2, lang3>] [-J<char>] [-K] [-Ao] [-?] [-F<logfile>]

-I      input file full path. * can be used for filename (*.pdf, *.tif)

-O      output file or folder full path.

-T      output type: TIF|PDF|PDFS|TXT|MTIF

-R      recursive processing, subfolders under the input will be processed too. The input folder structure will be created under the output folder.

-Q      quiet processing, no password dialogs will be showed thus password protected pdf files will not be converted successfully.

-P      process single page

-?      Shows usage info and the list of languages + 3letter language identifiers.

Tiff output specific parameters

-S      colorspace: auto|mono|gray|rgb. Default: auto.

-D      resolution: auto|72|96|150|200|300|600|1200|2400. Default: auto.

-Cm     image compression for monochrome images: none|g3|g4|lzw. Default: g4.

-Cg     image compression for grayscale images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

-Cc     image compression for color images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

PDF output specific parameters

-V      Version: 1A|1B|2A|2B|2U|3A|3B|3U|1.3|1.4|1.5|1.6|1.7|2.0|X1A|X3|X4|X4P|PDFE|auto

-W      Optimize for web viewing

-G      Remove tags

-E      Execute sequence

-Y      Convert to gray

Searchable PDF output specific parameters

-L      Language(s) for OCR. Comma delimited list of 3letter language identifiers (-Leng,ger,fre,nor). If not presented, auto is used.

-J      Reject char. Default: ~.

-K      If presented, conversion keeps original images.

-Ao     If presented, pages are oriented and straightened automatically.


The list of OCR languages has not changed — it is identical to all previous 1.x and 2.x versions.


Update for Version 3.1

This is the Help text for Version 3.1:


Usage:

batchconverter.com -I<inputfile> -O<outputfile> -T<output type>
[-R] [-Q] [-P<page number>] [-S<colorspace>]
[-D<resolution>] [-Cm<cmpmono>] [-Cg<cmpgray>]
[-Cc<cmpcolor>] [-V<version>] [-W] [-G] [-E<sequence name>]
[-L<lang1, lang2, lang3>] [-J<char>] [-K] [-Ao] [-?] [-F<logfile>]

-I      input file full path. * can be used for filename (*.pdf, *.tif)

-O      output file or folder full path.

-T      output type: TIF|PDF|PDFS|TXT|MTIF

-R      recursive processing, subfolders under the input will be processed too. The input folder structure will be created under the output folder.

-Q      quiet processing, no password dialogs will be showed thus password protected pdf files will not be converted successfully.

-P      process single page

-?      Shows usage info and the list of languages + 3letter language identifiers.

Tiff output specific parameters

-S      colorspace: auto|mono|gray|rgb. Default: auto.

-D      resolution: auto|72|96|150|200|300|600|1200|2400. Default: auto.

-Cm     image compression for monochrome images: none|g3|g4|lzw. Default: g4.

-Cg     image compression for grayscale images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

-Cc     image compression for color images: none|LZW|JpegMin|JpegLow|JpegMed|JpegHigh|JpegMax. Default: LZW.

PDF output specific parameters

-V      Version: 1A|1B|2A|2B|2U|3A|3B|3U|1.3|1.4|1.5|1.6|1.7|2.0|X1A|X3|X4|X4P|PDFE|auto

-W      Optimize for web viewing

-G      Remove tags

-E      Execute sequence

-Y      Convert to gray

Searchable PDF output specific parameters

-L      Language(s) for OCR. Comma delimited list of 3letter language identifiers (-Leng,ger,fre,nor). If not presented, auto is used.

-J      Reject char. Default: ~.

-K      If presented, conversion keeps original images.

-Ao     If presented, pages are oriented and straightened automatically.


The list of OCR languages has not changed — it is identical to all previous 1.x, 2.x, and 3.x versions.


If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe


Power-PDF-Advanced-OCR-Languages.txt


8
13,691 Views
Joe WinogradDeveloper
CERTIFIED EXPERT
50+ years in computers
Development•Sales
CIO•Document Imaging
EE — FELLOW 2017
MVE 2015,2016,2018
RENOWNED 2018,2019
CERTIFIED GOLD 2020

Comments (21)

Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Hi Chris,
First, thanks for joining Experts Exchange today and reading my article — I appreciate it!

The input file cannot be an HTML file type. The Help output, even in the latest version 2.1 of Power PDF Advanced, still says this:

-I input file full path. * can be used for filename (*.pdf, *.tif)

As I mentioned in the article, I discovered through experimentation that the input file type may also be GIF, JPG, JPEG, PNG, and TIFF. But I just tried HTML in both v2.0 and v2.1, and can confirm that the input file may not be HTML. As you saw yourself, it gives an error message that says "File open error." My advice is to open the HTML file in whatever web browser you prefer and then print it to whatever PDF print driver you prefer. Once you have the PDF file, run Power PDF again, this time using the PDF file as the input instead of the HTML file. Of course, you may not even need to do that if you're happy with the file from the PDF printer. Regards, Joe
Great article - thanks!

Just tried it in version 2.1 and it works fine.

I used:

C:\Program Files (x86)\Nuance\Power PDF 21>batchconverter -I"D:\ScanSoft Documents\The_Boys\BEN\Buy & Sell\*.pdf" -O"D:\_ScanOut\The_Boys\BEN\Buy & Sell" -TPDFS -K -R -Ao -Leng -Fd:\_ScanOut\log.txt

So it works with spaces and 'odd' characters like '& - ampersand' and can create a log file too - which if it exists is appended too on each run.
Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Hi David,
Thanks for the compliment...much appreciated! I'm glad to hear that it works for you in V2.1. Thanks for sharing the command line that you used...very helpful to see working examples like that. Yes, the logfile is a nice feature. Btw, the reason that it works with the spaces is that you enclosed the file name in quotes, which is something that you have to do with Windows command line params when there are spaces in the file name. As far as the "odd" characters (like ampersand), it will work with any valid Windows file name. The only "odd" characters that are not allowed in folder/file names (other than in the drive/path and as wildcards for a few of them) are these:

< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)

If you take a moment to endorse the article by clicking the thumbs-up icon at the end of the article (not the one underneath this comment), I'll appreciate it. Regards, Joe
Done! - I've found some of your other stuff on Paperport v. useful in the past, so kudos to the guru!!
Joe WinogradDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2018

Author

Commented:
Thanks, David, I appreciate the compliments and the article endorsements. Regards, Joe

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.

Get access with a 7-day free trial.
Continue Growing Your Skills and Your Career
  • Interact with leading experts on your specific technology problems.
  • Receive the guidance of experienced professionals.
  • Learn from troubleshooting others have experienced.
  • Gain knowledge from a library of courses, all included.