<

PaperPort - How To Create Searchable PDF Files

Published on
14,242 Points
5,042 Views
2 Endorsements
Last Modified:
Joe Winograd, EE MVE 2015&2016
50+ yrs in computer industry. Everything from programming to sales. OS kernel dev on mainframes. CIO. Document imaging. EE MVE 2015 & 2016.
PaperPort is a popular document imaging/management product from Nuance Communications. It is in widespread use by both individuals and businesses.

The current version of PaperPort is 14. The previous version was 12 (yes, Nuance got superstitious and skipped 13). Both of these most recent versions come in two editions, Professional and Standard. All four products — PP12 Standard, PP12 Professional, PP14 Standard, PP14 Professional — have the ability to create a searchable PDF file without any other software needing to be installed. PP12 was the first release that could do this (and it was carried forward into PP14).

Prior PaperPort releases require Nuance's OmniPage (a separately priced OCR product) to be installed in order to create a searchable PDF file that PaperPort calls a PDF Searchable Image file (because it contains both the raster image and the text created by OCR). The reason that PP12 and PP14 can create a PDF Searchable Image file is that it contains the OmniPage OCR engine under the covers — via the OmniPage Capture Software Development Kit (CSDK).
 
Sidebar on PaperPort Version: If you are running PP12.0, I recommend that you upgrade (free!) to PP12.1. This EE article explains how to do it:
PaperPort 12 - Free Upgrade to Version 12.1
If you are running PP14.0, PP14.1, or PP14.2, I recommend that you upgrade (free!) to PP14.5 (there was not a public release for either 14.3 or 14.4). This EE article explains how to do it:
PaperPort 14 - Free Upgrade to Version 14.5
End of Sidebar

There are three ways to create a PDF Searchable Image file in PP12 and PP14 — scanning, converting (via Save As), and printing (to the PaperPort Image Printer):

(1) To scan directly to a PDF Searchable Image file, create a Scanning Profile in the Scan or Get Photo pane, click on the Output tab, and select PDF Searchable Image in the File type drop-down:


Scan-to-PDF-Searchable-Image.jpg

(2) To convert a file to a PDF Searchable Image file, right-click the item on the PaperPort Desktop, click Save As... from the context menu, and select PDF Searchable Image in the Save as type drop-down:


Save-As-to-PDF-Searchable-Image.jpg

(3) To print to a PDF Searchable Image file, print to the PaperPort Image Printer in any Windows program. For example, here's the print dialog for printing a TIFF file from MS Paint to the PP Image Printer, thereby creating a PDF Searchable Image file:


Print-to-PDF-Searchable-Image.jpg

However, you must first configure the PaperPort Image Printer to an output type of PDF Searchable Image file, not PDF Image. To do this in PP12 and PP14, click the Desktop menu, then the Desktop Options button on the ribbon, then the Item tab, and select PDF Searchable Image in the PaperPort Image Printer file type drop-down:


Configure-PP-Image-Printer-to-PDF-Search

It is important to note that printing to the PP Image Printer creates a raster image (bitmap/graphic) which then has to go through the OCR process in order to create text. If your source document already has text, such as a typical web page or Word file, this is generally not the right technique for creating a PDF, that is, there's no reason to go from text to an image and then back to text again via OCR.

The better technique is to print to a PDF print driver that goes from the source text straight to text in the PDF file, creating what's known as a PDF Normal file. PaperPort installs such a driver that has had various names over the years, including DocuCom, Nuance PDF, and ScanSoft PDF Create.

These are in addition to, and different from, the PP Image Printer. They are similar to other PDF print drivers that create a PDF Normal file (straight text-to-text, i.e., no OCR), such as Adobe PDF (Distiller), part of an Adobe Acrobat installation, as well as many free ones, including Bullzip, CutePDF Writer, doPDF, Foxit Reader PDF Printer (part of the Foxit Reader install), Nitro PDF Creator (part of the Nitro Reader install), PDFCreator, PDF-XChange Printer (part of the PDF-XChange Editor install), and PrimoPDF.

In summary, when scanning paper, you must scan to an image and have PaperPort invoke OCR to create a PDF Searchable Image file (which it does automatically via a Scanning Profile). Likewise, when converting an image-only file, such as a BMP, JPG, PNG, [image-only] PDF, or TIFF, to a PDF Searchable Image file, you must also have PaperPort invoke OCR to create it (which it does automatically via Save As). But when printing to a PDF file, you should print to the PP Image Printer only if the source document is a raster image (bitmap/graphic); if it isn't, then it's better to print to one of the other PDF print drivers mentioned above.

Two important variables that affect OCR accuracy are Mode (Black&White, Grayscale, Color) and Resolution (DPI - dots per inch). For typical business documents, I recommend B&W (monochrome/1-bit) and 300 DPI. This generally results in reasonable files size and accurate OCR. On rare occasions, I'll use B&W and 400 or 600 DPI, but in many cases, 600 DPI (counter intuitively) results in less accurate OCR. On other rare occasions, I'll use Grayscale (8-bit) and either 200 or 300 DPI, which sometimes results in more accurate OCR. To learn more about Mode and Resolution when scanning, I recommend Wayne Fulton's excellent site, A few scanning tips. In particular, look at the section that discusses OCR, Scanning Line art.

In PaperPort, you may set the Mode and Resolution in all three methods for creating a PDF Searchable Image:

Scanning

Mode-Resolution-scanning.jpg
Converting

Mode-Resolution-converting.jpg
Printing

Mode-Resolution-printing.jpg

That's it! Three easy ways to create PDF Searchable Image files in PaperPort 12 and PaperPort 14.

If you find this article to be helpful, please click the thumbs-up icon below. This lets me know what is valuable for EE members and provides direction for future articles. Thanks very much! Regards, Joe
2
Comment
6 Comments
 

Expert Comment

by:Jony Green
if you don't like to install ocr software to your computer, you can try this free online ocr tool
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Hi Jony,

What can you tell us about that company? Do you work for it?

I checked the domain at URLVoid, which says that it was created just eight days ago:

online-code domain report
Of course, every website had its first day of existence, so that isn't necessarily a bad thing, but I'd like to hear your thoughts on the company/site.

A web search for "free online OCR" turns up several hits of established companies, such as:
http://finereaderonline.com/

The ABBYY FineReader software is excellent OCR and this website from the ABBYY folks uses the same OCR engine. Furthermore, the URLVoid domain report shows that it was first registered nearly eight years ago and has no reported safety issues. It does, however, have a monthly page limitation.

Another example is Online OCR:
http://www.onlineocr.net/

The URLVoid domain report for it shows that it was first registered more than seven years ago and has no reported safety issues.

Those are just two examples. There are many others.

In the interest of informing (and protecting) our EE members, I'm looking forward to hearing back from you about this new company/site. Regards, Joe
0
 

Expert Comment

by:Don Green
Wow.  I reinstalled Paperport 14 on a new computer build, so I knew that printing to Paperport can create a searchable PDF, but spent about an hour stumbling around trying to figure out / remember how to do that.  Of course, now I feel a little stupid because it seemed obvious once I followed your instructions.  Somewhere along the way I had a setting so that printing to Paperport created a horribly OCR'd document, then replaced the actual document with gibberish text.  It's stopped doing that, and I don't want to go back and figure out how I made it do that and how I made it stop.  But, people who find your article are lucky people.  Nuance was going to charge me $10 for this "simple" answer since I purchased more than $90.  I don't know how many people your article helps, but I know it's a beautifully done article that left me feeling enormous gratitude.
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Hi Don,
You're very welcome. And thanks to you for joining EE today (welcome aboard!), as well as reading and endorsing my article — I really appreciate it! I'm glad you found it helpful. Regards, Joe
1
 

Expert Comment

by:Serg __
Any ideas how to make the fonts vectorized in the searchable .pdf? I am asking this question because I would not like to install a pirated Adobe Acrobat to convert one pdf book into a pdf book with vectorized fonts. What I got from PaperPort did not meet my expectations. the fonts got blurry. I expected them to get clean and vectorized, to be able to zoom in without those annoying pixels.
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Hi Serg,
Thank you for joining Experts Exchange this week and reading my article.

> Any ideas how to make the fonts vectorized in the searchable .pdf?

I do not have great expertise in font technology and am not aware of any way to control the font settings when PaperPort creates PDF Searchable Image files via the methods discussed in this article.

> I am asking this question because I would not like to install a pirated Adobe Acrobat to convert one pdf book into a pdf book with vectorized fonts.

I find that a strange comment — why would you even consider installing pirated software? We do not condone that here at Experts Exchange and, in fact, the Experts Exchange Terms of Use strictly prohibit any posting related to such activities (under Section 6, Code of Conduct). If you know that Adobe Acrobat will solve your font issue, and it is for only one PDF book, then I recommend purchasing just one month of Adobe Acrobat DC. For around 25 bucks, you'll avoid pirating software ($22.99 for one month of Acrobat Standard DC or $24.99 for one month of Acrobat Pro DC).

> What I got from PaperPort did not meet my expectations. the fonts got blurry.

It's likely that the fonts are blurry only when viewing the image layer. If you view just the text layer, the fonts should be fine. For example, I printed the first page of this article with the PaperPort Image Printer in B&W at 300 DPI to a PDF Image (not PDF Searchable Image). The whole page is attached as a PDF, but here's what it looks like:

font in image
The fonts, indeed, are blurry, because that's a view of the image (in Adobe Acrobat). I then used Nuance's Power PDF to convert to a searchable PDF, but told it not to keep the images. The whole page for that is also attached as a PDF, but here's the same small sample as shown above:

font in non-image
The fonts look great, because that's a view of the text (in Adobe Acrobat), since there is no image layer in the PDF.

> I expected them to get clean and vectorized, to be able to zoom in without those annoying pixels.

The fonts are fine in the text, as shown above. They get pixelated only when viewing the image layer. Another way to observe this is to Copy the text from the PDF Searchable Image file (created by PaperPort via one of the methods explained in this article) and then Paste it into a text-capable product, such as Notepad or Word — the fonts will, of course, appear fine. Regards, Joe
image-only-PaperPort-PDF-Image.pdf
text-only-Power-PDF-searchable-do-no.pdf
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Join & Write a Comment

We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…
Suggested Courses
Course of the Month14 days, 20 hours left to enroll

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month