Joe Winograd
50+ years in computer industry. Everything from development to sales. CIO. Document imaging. EE MVE 2015, EE MVE 2016, EE FELLOW 2017.
PaperPort is a popular document imaging/management product from Nuance Communications. It is in widespread use by both individuals and businesses.

The current version of PaperPort is 14. The previous version was 12 (yes, Nuance got superstitious and skipped 13). Both of these most recent versions come in two editions, Professional and Standard. All four products — PP12 Standard, PP12 Professional, PP14 Standard, PP14 Professional — have the ability to create a searchable PDF file without any other software needing to be installed. PP12 was the first release that could do this (and it was carried forward into PP14).

Prior PaperPort releases require Nuance's OmniPage (a separately priced OCR product) to be installed in order to create a searchable PDF file that PaperPort calls a PDF Searchable Image file (because it contains both the raster image and the text created by OCR). The reason that PP12 and PP14 can create a PDF Searchable Image file is that it contains the OmniPage OCR engine under the covers — via the OmniPage Capture Software Development Kit (CSDK).
Sidebar on PaperPort Version: If you are running PP12.0, I recommend that you upgrade (free!) to PP12.1. This EE article explains how to do it:
PaperPort 12 - Free Upgrade to Version 12.1
If you are running PP14.0, PP14.1, or PP14.2, I recommend that you upgrade (free!) to PP14.5 (there was not a public release for either 14.3 or 14.4). This EE article explains how to do it:
PaperPort 14 - Free Upgrade to Version 14.5
End of Sidebar

There are three ways to create a PDF Searchable Image file in PP12 and PP14 — scanning, converting (via Save As), and printing (to the PaperPort Image Printer):

(1) To scan directly to a PDF Searchable Image file, create a Scanning Profile in the Scan or Get Photo pane, click on the Output tab, and select PDF Searchable Image in the File type drop-down:


(2) To convert a file to a PDF Searchable Image file, right-click the item on the PaperPort Desktop, click Save As... from the context menu, and select PDF Searchable Image in the Save as type drop-down:


(3) To print to a PDF Searchable Image file, print to the PaperPort Image Printer in any Windows program. For example, here's the print dialog for printing a TIFF file from MS Paint to the PP Image Printer, thereby creating a PDF Searchable Image file:


However, you must first configure the PaperPort Image Printer to an output type of PDF Searchable Image file, not PDF Image. To do this in PP12 and PP14, click the Desktop menu, then the Desktop Options button on the ribbon, then the Item tab, and select PDF Searchable Image in the PaperPort Image Printer file type drop-down:


It is important to note that printing to the PP Image Printer creates a raster image (bitmap/graphic) which then has to go through the OCR process in order to create text. If your source document already has text, such as a typical web page or Word file, this is generally not the right technique for creating a PDF, that is, there's no reason to go from text to an image and then back to text again via OCR.

The better technique is to print to a PDF print driver that goes from the source text straight to text in the PDF file, creating what's known as a PDF Normal file. PaperPort installs such a driver that has had various names over the years, including DocuCom, Nuance PDF, and ScanSoft PDF Create.

These are in addition to, and different from, the PP Image Printer. They are similar to other PDF print drivers that create a PDF Normal file (straight text-to-text, i.e., no OCR), such as Adobe PDF (Distiller), part of an Adobe Acrobat installation, as well as many free ones, including Bullzip, CutePDF Writer, doPDF, Foxit Reader PDF Printer (part of the Foxit Reader install), Nitro PDF Creator (part of the Nitro Reader install), PDFCreator, PDF-XChange Printer (part of the PDF-XChange Editor install), and PrimoPDF.

In summary, when scanning paper, you must scan to an image and have PaperPort invoke OCR to create a PDF Searchable Image file (which it does automatically via a Scanning Profile). Likewise, when converting an image-only file, such as a BMP, JPG, PNG, [image-only] PDF, or TIFF, to a PDF Searchable Image file, you must also have PaperPort invoke OCR to create it (which it does automatically via Save As). But when printing to a PDF file, you should print to the PP Image Printer only if the source document is a raster image (bitmap/graphic); if it isn't, then it's better to print to one of the other PDF print drivers mentioned above.

Two important variables that affect OCR accuracy are Mode (Black&White, Grayscale, Color) and Resolution (DPI - dots per inch). For typical business documents, I recommend B&W (monochrome/1-bit) and 300 DPI. This generally results in reasonable files size and accurate OCR. On rare occasions, I'll use B&W and 400 or 600 DPI, but in many cases, 600 DPI (counter intuitively) results in less accurate OCR. On other rare occasions, I'll use Grayscale (8-bit) and either 200 or 300 DPI, which sometimes results in more accurate OCR. To learn more about Mode and Resolution when scanning, I recommend Wayne Fulton's excellent site, A few scanning tips. In particular, look at the section that discusses OCR, Scanning Line art.

In PaperPort, you may set the Mode and Resolution in all three methods for creating a PDF Searchable Image:





That's it! Three easy ways to create PDF Searchable Image files in PaperPort 12 and PaperPort 14.

