Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x

OCR

532

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

PaperPort 14.5 Patch 1 update is often not detected or downloaded automatically. This article provides direct download links to solve the problem for retail (non-bundled) versions of the Standard and Professional editions, as well as the Professional edition in Nuance's own OmniPage Ultimate bundle.
14
 

Expert Comment

by:FRANCO SOTOMAYOR
Comment Utility
I am considering the possibility of upgrading from PP9Pro to PP14.5Pro but after reading about all the problems with this software, I am looking for other more compatible options with Windows 10.   You need to be a guru or geek to be able to install and get a program to work because it has not been duly nor properly customized for us non-techies. This is really scary to say the least!
FrancoS
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Franco,
First, thanks for joining Experts Exchange today and reading my article — welcome aboard!
upgrading from PP9Pro to PP14.5Pro
I used PP9Pro for many years — was one of the best PaperPort releases ever! I was reluctant to "upgrade" from it, but eventually went to PP10 (terrible version), then 11, 12, and 14 (as I'm sure you know, Nuance was superstitious and did not have a PP13). I'm now running PP14.5Pro/Patch1 and am happy with it. My production machines are still W7, but I do have PP14.5Pro/Patch1 in a W10 sandbox where it runs fine, although it is not in heavy use there. For PaperPort users on W10, I suggest reading another one of my EE articles and following the Tips in it:
PaperPort 14 in Windows 10 - A First Look
looking for other more compatible options with Windows 10
I can't recommend anything from personal experience because, as I mentioned above, my production computers are still W7 and I have relatively little experience with other such products on W10. However, I've heard some users in the PaperPort community say that they're now successfully using Lucion's FileCenter and Tracker's PDF-XChange PRO in W10, so you may want to take a look at those. I've done three 5-minute video Micro Tutorials here at EE on the free version of PDF-XChange:

How to rotate pages in a PDF with free software
How to OCR pages in a PDF with free software
How to password-protect a PDF with free software

Also, Nuance has another product in the PDF space called Power PDF, which comes in both Standard and Advanced editions. I've published an article and two 5-minute video Micro Tutorials here at EE on the Advanced edition:

Batch Conversion of PDF, TIFF, and Other Image Formats via Command Line Interface to PDF, PDF Searchable, and TIFF with Power PDF Advanced
Bates Stamping/Numbering of PDF Files with Power PDF Advanced
Convert Scanned Image-Only PDF Files to PDF Searchable Image Files via OCR with Power PDF Advanced
You need to be a guru or geek to be able to install and get a program to work because it has not been duly nor properly customized for us non-techies.
I hear you on that, although it may be a bit of an overstatement. I don't think that you have to be a "guru" or "geek" to download and install PP14.5 and Patch1. There are certainly equally challenging issues for "non-techies" with other products, including Windows itself and Microsoft Office.
This is really scary to say the least!
I don't think that it's any scarier than other stuff that we do on our Windows-based computers all the time — I hold my breath on every Windows update. :)  I know that some folks who use W10 in production were terrified of the Version 1709 update (aka the Fall Creators Update) — talk about scary! :)

In any case, good luck with your efforts. Please post back here with your experiences. Regards, Joe
0
Keep up with what's happening at Experts Exchange!
LVL 11
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

PaperPort is among the most important applications that I run on my Windows computers. I use it every day, for nearly all of my document and photo scanning, as well as most of my document and photo imaging, including OCR via its built-in OmniPage capabilities.

Disclaimer before going further: I have no affiliation with Nuance and no financial interest in it whatsoever. I am simply a happy user/customer.

I've been using PaperPort for around 20 years on every version of Windows since Windows 95. With the Windows 10 release date coming up in two days, I thought it would be worthwhile to document my experience with PaperPort on the Windows 10 Technical Preview, including some tips for successful deployment on W10.

First, my experience with the various builds along the way: I did not install PaperPort on the initial Windows 10 Technical Preview of Build 9841, released on 30-Sep-2014. But I installed on every build after that, from 9860 through the current 10240. The platform is physical hardware, not a virtual machine. It is a relatively old laptop with mediocre specs by today's standards:

Intel Core2 Duo T9300 2.50GHz
4GB RAM DDR2 PC5300
Samsung SSD 840 EVO 250GB (with the read performance firmware upgrade
14
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
You're welcome, Dana. I'm hoping for a PP15 (or maybe just PP14.6) that Nuance certifies and officially supports for W10. It's even possible that Nuance would certify PP14.5 for W10. I'll post here as soon as I learn anything official from Nuance about PP in W10. In the meantime, there's been some discussion about it in the Google PaperPort Group (and its PaperPort wiki). You may want to check in on that. Regards, Joe
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
In a comment above, I mentioned that the Patch 1 update did not work with the PaperPort Professional 14.5 that is included with Nuance's own OmniPage Ultimate bundle. I am pleased to report that Nuance has finally fixed this, although there is still a glitch in the process such that, in some cases, the Common Software Update Manager does not perform the update, even though it detects its existence. The solution, once again, is to get the installer from a direct download link. I just published an article here at EE explaining the method:
How to install the Patch 1 update for the PaperPort Professional 14.5 bundled with OmniPage Ultimate

The article also shows a list of the support tickets that Patch 1 fixes. Regards, Joe
0
In a previously published article here at Experts Exchange, I explained how to achieve duplex (double-sided) scanning in Nuance's PaperPort software with a hardware-capable duplex scanner, that is, a scanner which has an Automatic Document Feeder (ADF) capable of scanning both sides of a document. A recent question here at EE prompted me to write this additional article, which explains how to achieve duplex scanning in PaperPort with a simplex scanner, that is, a scanner whose ADF is capable of scanning only the front side of a document.

As with the previous article, this one applies to the three most recent versions of PaperPort, i.e., 11, 12, and 14 — yes, Nuance got superstitious and did not release a version 13.

Here are the steps to achieve duplex scanning in PaperPort (either Standard or Professional) with a simplex scanner:
 
  • Click the Scan Settings button on the Ribbon in PP12 and PP14, or the Scan or Get Photo icon on the toolbar in PP11. You will now have the Scan or Get Photo pane:

Scan-or-Get-Photo.jpg 
  • Select a Scanner and a Scanning Profile.
 
  • Tick the Show Capture Assistant box.
 
  • Place the document in the (simplex) ADF and click the Scan button.
 
  • In PaperPort Standard, you will get this:

front-side-PP-Std.jpg 
  • In PaperPort Professional, you will get this:

front-side-PP-Pro.jpg 
  • Remove the document from the output tray, turn it over so that the last page is on the top, place it in the ADF, and click the Scan Other Side button.
 
  • In PaperPort Standard, you will get this:

after-Scan-Other-Side-PP-Std.jpg 
  • In
2
PaperPort is a popular document imaging/management product from Nuance Communications. It is in widespread use by both individuals and businesses.

The current version of PaperPort is 14. The previous version was 12 (yes, Nuance got superstitious and skipped 13). Both of these most recent versions come in two editions, Professional and Standard. All four products — PP12 Standard, PP12 Professional, PP14 Standard, PP14 Professional — have the ability to create a searchable PDF file without any other software needing to be installed. PP12 was the first release that could do this (and it was carried forward into PP14).

Prior PaperPort releases require Nuance's OmniPage (a separately priced OCR product) to be installed in order to create a searchable PDF file that PaperPort calls a PDF Searchable Image file (because it contains both the raster image and the text created by OCR). The reason that PP12 and PP14 can create a PDF Searchable Image file is that it contains the OmniPage OCR engine under the covers — via the OmniPage Capture Software Development Kit (CSDK).
 
Sidebar on PaperPort Version: If you are running PP12.0, I recommend that you upgrade (free!) to PP12.1. This EE article explains how to do it:
PaperPort 12 - Free Upgrade to Version 12.1
If you are running PP14.0, PP14.1, or PP14.2, I recommend that you upgrade (free!) to PP14.5 (there was not a public release for either 14.3 or 14.4). This EE article explains how to do it:
2
 

Expert Comment

by:Serg __
Comment Utility
Any ideas how to make the fonts vectorized in the searchable .pdf? I am asking this question because I would not like to install a pirated Adobe Acrobat to convert one pdf book into a pdf book with vectorized fonts. What I got from PaperPort did not meet my expectations. the fonts got blurry. I expected them to get clean and vectorized, to be able to zoom in without those annoying pixels.
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Serg,
Thank you for joining Experts Exchange this week and reading my article.

> Any ideas how to make the fonts vectorized in the searchable .pdf?

I do not have great expertise in font technology and am not aware of any way to control the font settings when PaperPort creates PDF Searchable Image files via the methods discussed in this article.

> I am asking this question because I would not like to install a pirated Adobe Acrobat to convert one pdf book into a pdf book with vectorized fonts.

I find that a strange comment — why would you even consider installing pirated software? We do not condone that here at Experts Exchange and, in fact, the Experts Exchange Terms of Use strictly prohibit any posting related to such activities (under Section 6, Code of Conduct). If you know that Adobe Acrobat will solve your font issue, and it is for only one PDF book, then I recommend purchasing just one month of Adobe Acrobat DC. For around 25 bucks, you'll avoid pirating software ($22.99 for one month of Acrobat Standard DC or $24.99 for one month of Acrobat Pro DC).

> What I got from PaperPort did not meet my expectations. the fonts got blurry.

It's likely that the fonts are blurry only when viewing the image layer. If you view just the text layer, the fonts should be fine. For example, I printed the first page of this article with the PaperPort Image Printer in B&W at 300 DPI to a PDF Image (not PDF Searchable Image). The whole page is attached as a PDF, but here's what it looks like:

font in image
The fonts, indeed, are blurry, because that's a view of the image (in Adobe Acrobat). I then used Nuance's Power PDF to convert to a searchable PDF, but told it not to keep the images. The whole page for that is also attached as a PDF, but here's the same small sample as shown above:

font in non-image
The fonts look great, because that's a view of the text (in Adobe Acrobat), since there is no image layer in the PDF.

> I expected them to get clean and vectorized, to be able to zoom in without those annoying pixels.

The fonts are fine in the text, as shown above. They get pixelated only when viewing the image layer. Another way to observe this is to Copy the text from the PDF Searchable Image file (created by PaperPort via one of the methods explained in this article) and then Paste it into a text-capable product, such as Notepad or Word — the fonts will, of course, appear fine. Regards, Joe
image-only-PaperPort-PDF-Image.pdf
text-only-Power-PDF-searchable-do-no.pdf
0
PaperPort
I. Introduction

In a previous article (now deprecated), I discussed how to upgrade — at no cost for licensed users — Nuance's PaperPort Version 11 (hereafter, PP11) and PaperPort Version 12 (PP12) to the latest "point" releases, namely, 11.2 and 12.1. At the time of that article's publication, PP11 and PP12 were the two latest versions. Now the latest version is PP14 (yes, Nuance was superstitious and skipped 13), and its latest "point" release is 14.5.

I decided that adding PP14 to the previous article would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I decided to create three separate articles for PP11, PP12, and PP14 users. This is the PP14 one.

The earlier point releases of PP14 — 14.0, 14.1, 14.2 (there was not a public release for either 14.3 or 14.4) — are known to have bugs that were fixed in 14.5. This article provides links to 14.5, as well as other useful information on upgrading.

II. Comparison of Standard and Professional Editions

For PP14, there are two consumer editions – Standard and Professional. The feature comparison matrix is available in the Files section of this PaperPort wiki:
http://sites.google.com/site/wikipaperport/files

Here is a direct link to the PDF:
Comparison Matrix of PP14 Standard and PP14 Professional

III. Links to Downloads

The links are to a direct download
10
 

Expert Comment

by:npercival
Comment Utility
solved the slow launching issue.  PP does not like custom scaling.  Just set it to 125 percent and PP opens quickly.  Found suggestion on Google.
My PDF issue was solved by changing the file type on my scanner which is in my HP all in one 8600.

down to slow launching of Fax from HP when I drag a doc to the send to bar.

JIm
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
> solved the slow launching issue

Good to know that solved it in your case, and it may certainly work for some other folks, but keep in mind that there are many reasons for it, and a solution in one case may not work in another case (the classic "YMMV" principle).

> PP does not like custom scaling. Just set it to 125 percent and PP opens quickly.

Do you mean the scaling in Windows, i.e., Control Panel>Display, where the choices are Smaller (100%), Medium (125%), Larger (150%)? Or do you mean the Custom Resolution feature in PP's Options>PDF Rendering Resolution? Or do you mean something else?

> My PDF issue was solved by changing the file type on my scanner which is in my HP all in one 8600.

I take that to mean you use the native TWAIN scanner dialog of the 8600, i.e., you ticked the "Display scanner dialog box" in PP's Scan or Get Photo pane. That brings up the scanner's TWAIN dialog where you can set the output file type (e.g., PDF) along with other options. I'm glad that works for you, but I much prefer to set all of the scanning options in PP's scanning profiles so that I get one-click scanning. I have the "Display scanner dialog box" un-ticked, so I don't have to change the file type (or any other options) in the scanner's native dialog. A single click on the Scan button in PP results in scanning with all of the presets in the profile, such as file type (e.g., PDF Searchable Image), resolution (e.g., 300), mode (e.g., B&W), source (e.g., duplex ADF), etc.

> slow launching of Fax from HP when I drag a doc to the send to bar

Probably an HP Fax issue, as the PP14 Send to Bar launches apps quickly (at least, that's been my experience, for both standard apps and custom apps). Regards, Joe
0
I. Introduction

In a previous article (now deprecated), I discussed how to upgrade — at no cost for licensed users — Nuance's PaperPort Version 11 (hereafter, PP11) and PaperPort Version 12 (PP12) to the latest "point" releases, namely, 11.2 and 12.1. At the time of that article's publication, PP11 and PP12 were the two latest versions. Now the latest version is PP14 (yes, Nuance was superstitious and skipped 13), and its latest "point" release is 14.5.

I decided that adding PP14 to the previous article would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I decided to create three separate articles for PP11, PP12, and PP14 users. This is the PP12 one.

The earlier point release of PP12 — 12.0 — is known to have bugs that were fixed in 12.1. The links in the previous article for 12.1 no longer work. This new article provides working links for 12.1, as well as other useful information on upgrading.

II. Comparison of Standard and Professional Editions

For PP12, there are two consumer editions – Standard and Professional. The feature comparison matrix is available in the Files section of this PaperPort wiki:
http://sites.google.com/site/wikipaperport/files

Here is a direct link to the PDF:
Comparison Matrix of PP12 Standard and PP12 Professional

III. New Links to Downloads

The new links are to a direct download
2
I. Introduction

In a previous article (now deprecated), I discussed how to upgrade — at no cost for licensed users — Nuance's PaperPort Version 11 (hereafter, PP11) and PaperPort Version 12 (PP12) to the latest "point" releases, namely, 11.2 and 12.1. At the time of that article's publication, PP11 and PP12 were the two latest versions. Now the latest version is PP14 (yes, Nuance was superstitious and skipped 13), and its latest "point" release is 14.5.

I decided that adding PP14 to the previous article would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I decided to create three separate articles for PP11, PP12, and PP14 users. This is the PP11 one.

The earlier point releases of PP11 — 11.0 and 11.1 — are known to have bugs that were fixed in 11.2. Although the links in the previous article for 11.2 still work, Nuance informed me that they may soon stop working. This new article provides working links for 11.2 that Nuance says will continue to work after the other ones have been taken down. This article also provides other useful information on upgrading.

II. Comparison of Standard and Professional Editions

For PP11, there are two consumer editions – Standard and Professional. The feature comparison matrix is available in the Files section of this PaperPort wiki:
http://sites.google.com/site/wikipaperport/files

Here is a direct link to the PDF:
2
 

Expert Comment

by:Becky Hanlon
Comment Utility
The update wants a serial number, which I do not have as I lost my software CD that came with my MFC-7340 Brother printer.  Is there a way to install the update without this?
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Becky,
Thank you for joining Experts Exchange today and reading my article — welcome aboard! The PaperPort software that comes with Brother MFCs is an "SE" version, which stands for Special Edition. It is a trimmed-down version that Brother OEMs from Nuance for bundling with their devices and it is not considered to be a commercial/retail edition. This means that updates of the software, such as the 11.2 upgrade discussed in this article, will likely not apply to those bundled SE versions. So even if you had the serial number, it is unlikely that the 11.2 upgrade would work on it.

The other issue is that PP11 is more than 10 years old. As noted in this article, Vista is the latest Windows on which PP11 is supported. My suggestion is to purchase a retail copy of the latest version of PaperPort, which is 14. It is currently $31.61 at Amazon:
https://www.amazon.com/dp/B005CELKLM

That's the standard edition, not Professional, but it's probably more functional than the SE version that was bundled with your Brother MFC. You may want to wait for a better price, as I've seen it at Amazon for less. The download (or disk) is going to be version 14.0, but you may upgrade it for free to version 14.5, because it is a retail version. This comment that I posted at an EE question a couple of months ago explains the upgrade process, referring to several other articles that I've published here at EE:
https://www.experts-exchange.com/questions/29057949/Window-10-version.html#a42302130

Interesting to note that it was just $19.20 at Amazon back then. Once again, welcome to Experts Exchange! Regards, Joe
0
PaperPort
This article discusses the PaperPort 14 Scanner Connection Tool, which Nuance provides at no charge in order to fix scanning problems in Windows 8. Furthermore, users of PaperPort 14 in Windows 7 and Windows 10 have reported that the tool works in those versions of Windows, too.
1
 

Expert Comment

by:Gregory Parsons
Comment Utility
This did not help.  Since the last auto-update in Windows 10 PaperPort Professional 14.5 won't recognize one of my two scanners.  Neither your site nor the manufacturer's latest drivers have been of any help - especially baffling as Nuance is the software bundled with the scanner.  The Xerox DocuMate 152i is no longer listed as a compatible scanner.
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Gregory,

First, I see that you joined EE today — welcome aboard! I'll do my best to try to help you.

Do you have Patch 1 installed? If not, I recommend doing that, as PP14.5/Patch1 is the only Windows 10-compliant version of PaperPort. This EE article explains how to get it (at no cost):
How to install the Patch 1 update for PaperPort 14.5

Reinstall the Scanner Connection Tool after installing Patch 1. If it still doesn't work, read this other EE article:
PaperPort 14 in Windows 10 - A First Look

Perhaps some of the Tips in there will help.

Of course, having a driver that works in your version of Windows (including bit level — 32-bit or 64-bit) is critical. I checked the Xerox website for your DocuMate 152i and it shows that it has drivers for W10, including ISIS, TWAIN, and WIA. You may download them from here:
http://www.xeroxscanners.com/en/us/products/drivers.asp?PN=97-0084-00U

Note this comment at that site: "You must uninstall your current driver and OneTouch software to install an updated driver."

I much prefer ISIS drivers over TWAIN and WIA, and PP14.5 fully supports ISIS, so try that first (nothing wrong with TWAIN and WIA, so try them, too). Also, reinstall the Scanner Connection Tool after reinstalling the scanning drivers. Regards, Joe
0
Power PDF Advanced
This article explains how to perform batch conversion of PDF, TIFF, and other image file formats into PDF, PDF Searchable, and TIFF files via a command line interface, using Nuance's latest document imaging software — Power PDF Advanced.
6
 

Expert Comment

by:Chris S
Comment Utility
Hello,

i tried to use the command line but i always get the error message "File open error" although I have write to all directories stated in the command?! It always says "Converting H:\TEST.HTML TO H:\TEST.PDF" which I think means that the Syntax itself is correct but either the Input or Output file cannot be opened?
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Chris,
First, thanks for joining Experts Exchange today and reading my article — I appreciate it!

The input file cannot be an HTML file type. The Help output, even in the latest version 2.1 of Power PDF Advanced, still says this:

-I input file full path. * can be used for filename (*.pdf, *.tif)

As I mentioned in the article, I discovered through experimentation that the input file type may also be GIF, JPG, JPEG, PNG, and TIFF. But I just tried HTML in both v2.0 and v2.1, and can confirm that the input file may not be HTML. As you saw yourself, it gives an error message that says "File open error." My advice is to open the HTML file in whatever web browser you prefer and then print it to whatever PDF print driver you prefer. Once you have the PDF file, run Power PDF again, this time using the PDF file as the input instead of the HTML file. Of course, you may not even need to do that if you're happy with the file from the PDF printer. Regards, Joe
0
Update 21-May-2015: I temporarily removed the source code to make major changes to the program. Regards, Joe

INTRODUCTION

This article presents a solution to a question asked here at Experts Exchange. The situation is that there's a large number of subfolders (400 in the original question), each of which has a number of PDF files (two in the original question). The goal is to combine/merge the PDF files in each subfolder (in ascending date order) into a single PDF file, storing the combined file in each subfolder. The source PDF files in each subfolder may have any file names and the user should be able to specify the file name of the combined file.

REQUIRED SOFTWARE

The method presented in this article requires AutoHotkey, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started. After installation, AutoHotkey will own the AHK file type, supporting the solution discussed in the remainder of this article.

The program utilizes another excellent (free!) piece of software — PDF Toolkit (PDFtk). It comes in both command line and GUI versions. The command line version is called PDFtk Server
7
 

Expert Comment

by:Centex Aps
Comment Utility
Hi

Will the "Combine-Merge-PDF-files-20140826.ahk"  file not be attached again?
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Centex,
I've decided not to post the full program. I'll be rewriting the article as a "design roadmap" with some crucial code snippets, such as how to call PDFtk Server, but will not be posting the complete source code. Regards, Joe
0
[Webinar On Demand] Database Backup and Recovery
LVL 11
[Webinar On Demand] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

The standard (non-Professional) edition of PaperPort from Nuance Communications (previously known as ScanSoft) is limited to five Scanning Profiles, but in a previous article, I discussed how to overcome this limitation. The technique presented in that article may also be used to address an issue that I've been asked many times by PaperPort users, namely, how to reorder the Scanning Profiles in the Scan or Get Photo pane.

For users with many Scanning Profiles, it is desirable to order the list such that the more frequently used ones are at the top. Unfortunately, PaperPort 12, the previous release, offers no ability to rearrange the order of the Scanning Profiles. PaperPort 14, the current release (yes, Nuance got superstitious and skipped 13), added a little-known, undocumented feature that helps: drag-and-drop of the Scanning Profiles. However, it works poorly, in my opinion, as I find it difficult to drop the profile exactly where I want it.

Also, in both PP12 and PP14, there is no ability to search for a Scanning Profile, so the user must scroll through the list to find the desired profile. This article presents an approach for reordering the list, using the same method presented in my previous article, PaperPort - How To Achieve More Than Five Scanning Profiles in the Standard Edition.

All of the screenshots in this article are from PaperPort Professional 14
1
PaperPort is a popular document imaging/management product from Nuance Communications, previously known as ScanSoft. PaperPort is in widespread use by both individuals and businesses.

The current version of PaperPort is 14. The previous version was 12. Yes, Nuance got superstitious and skipped 13. Both of these most recent versions come in two editions, Professional and Standard, although the Nuance folks do not call it Standard – they simply leave Professional off the name, i.e., PaperPort 12 and PaperPort Professional 12; PaperPort 14 and PaperPort Professional 14. In this article, I refer to them as PP-Std and PP-Pro, and all such references are valid for versions 12 and 14.

There are numerous differences between PP-Std and PP-Pro. The comparison matrices may be seen in the Files section at this PaperPort wiki in these files:

Comparison Matrix of PP12 Standard and PP12 Professional.pdf
Comparison Matrix of PP14 Standard and PP14 Professional.pdf

As shown in the documents above, one of the differences between PP-Std and PP-Pro is that the former allows only five Scanning Profiles to be created, while the latter allows an unlimited number. However, it turns out that PP-Std will properly handle an unlimited number of Scanning Profiles. The problem is that it won't let you create them. This is easy to overcome by creating the file containing the Scanning Profiles outside of PP-Std. This article describes two ways to do it.

3
 

Expert Comment

by:mapline
Comment Utility
Hi Joe
Great suggestion 2 comments:
1 My PP 14.5 std stores file in C:\ProgramData\Nuance\PaperPort\14\Profiles.xml (Windows 10)
2. Notepad++ great free app for viewin/editing xml files.
Many thanks
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Michael,
Sorry I'm just replying to your 25-Mar-2016 comment now. I don't recollect seeing it when it first came in and only just now saw it when I received a notification that you endorsed the article today — btw, thanks for that!

> My PP 14.5 std stores file in C:\ProgramData\Nuance\PaperPort\14\Profiles.xml (Windows 10)

You will also find it at C:\Users\All Users\Nuance\PaperPort\14\Profiles.xml in W10. That's because C:\Users\All Users\ points to C:\ProgramData\. In other words, C:\ProgramData\ is the "real" folder and C:\Users\All Users\ is simply a pointer to it — technically known as a junction or symbolic link. So if you look at C:\ProgramData\ and C:\Users\All Users\ in your file manager, they'll show the identical contents, because they are one-and-the-same folder.

> Notepad++ great free app for viewing/editing xml files.

I have Notepad++ installed and agree that it is a great free app, although I use it only for test purposes, since I do all of my text editing with my fav text editor that I've been using forever. But thanks for the tip to our readers! Regards, Joe
0
PaperPort is a popular document management/imaging product from Nuance Communications. It is in widespread use by both individuals and businesses. The current version of PaperPort is 14 (previous version was 12 – Nuance got superstitious and skipped 13). This Article documents how PP14 finally solved a nasty duplex scanning problem that has plagued PaperPort since the introduction of the Blank page is job separator capability in PP10.

The problem is that a blank back side of a page will act as a job separator during a duplex scan. This is extremely bad, since most double-sided documents have some single-sided pages, and they will terminate the document – not what you want! It makes the Blank page is job separator capability practically worthless for users doing duplex scanning. In other words, if you are using a duplex scanner and a page in the stack is not blank on the front, but is blank on the back, this should not be considered as a separator page. In the case of duplex scanning, a page should be blank on both sides in order for it to be treated as a separator page. Otherwise, you'll get what should be a single document broken into separate PaperPort items if that document happens to have some single-sided and some double-sided pages.

This "bug" (Nuance called it a "feature" when I reported it) existed in PP10, PP11, and PP12 (as mentioned above, there was no PP13). Nuance finally fixed it in PP14 with the addition of a new sub-option in the Settings for a
2
 

Expert Comment

by:donjud
Comment Utility
Joe,
I realize this is a bit off topic, but you have so helpful I wanted to ask you first. I am also new to Experts Exchange and wasn't sure if I should start a new Topic.
 The folder we have added to Paper Port is on our server so that it can be accessed and modified by multiple users in our office. The problem we have run in to is that, when someone makes any changes or adds a new document and adds it to the All in One search index, the new document is still not indexed on the other computers so, each user has to re-index the folder in order to find it using the (AIO) search. We have set Paper Port to index every night but, we would like for the document to be indexed on all of the computers as soon as the changes are made.
Is there a setting that would automatically index all incoming or modified documents to the Paper Port Folder or is there a way to run Paper Port on the server? We have spent several days trying to come up with a solution to this problem and haven’t had any luck.
Thanks,
J.D.
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi J.D.,
I was working on a (lengthy!) reply to your same question in the message system, which I just sent (before seeing this). Regards, Joe
0
Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe

INTRODUCTION

This Article is a follow-up to the Article entitled How To Rename-Move a Batch of PDF Files Based on Contents of the Files, recently published here at Experts Exchange.

I considered adding the new feature (splitting a single document into multiple documents) to that Article and program, but concluded that it is a significant enough enhancement to warrant a new Article and program.

PREVIOUS ARTICLE

To understand this Article, it will be helpful to read the previous Article, but to get things going here right away, here's a summary of the previous problem and solution.

There is a large batch of PDF files, all with cryptic names, such as [D123456.PDF]. Inside each file on the first line of the first page (always starting at a fixed column and running to the end of the line) is a human-friendly identifier for the file, such as [John Smith]. The requirement is to loop through all of the files in a specified folder in an automated fashion, changing the file names from, for example,

D123456.PDF

to

D123456 John Smith.PDF

That is, add the identifier from the first line of the first page to the file name.

NEW REQUIREMENT

Following publication of the previous Article and the program that implements the solution, the Original Poster (OP) of the question that prompted the Article
7
 

Expert Comment

by:Member_2_7970298
Comment Utility
how do I obtain a copy of the autohotkey script?
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi New Member,

When I removed the source code last year from six articles that I published here at EE, my intention was that the removal be temporary. I began a project to rewrite all of the programs in my portfolio in order to generalize them for a broader audience and to have a standard user interface, including both a GUI (graphical user interface) and, where it makes sense, a CLI (command line interface). It wound up being a much larger effort than I anticipated, and I'm still not ready to post or distribute the source code for this program (or any of the other five published at EE — and I don't know when or even if that will be, for a variety of reasons).

I have created customized versions of these various programs for EE members who became clients of mine. I provided licenses for the run-time programs (the executables, i.e., the compiled EXE files) for an agreed-upon fee, but I did not provide the source code. I did this previously when EE had the "Hire Me" button, but that no longer exists. The mechanism now at EE for such work is the new Gigs feature, if that interests you.

Regards, Joe
0
Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe

A recent question here at Experts Exchange piqued my interest, so I decided to provide a thorough solution and publish this Article about it. The Original Poster (OP) of the question has approximately one thousand PDF files containing 7-character sequential alphanumeric file names (and, of course, all of the file extensions are PDF). Although the OP did not state this, it is likely that the sequential alphanumerics represent unique identifiers for his customers, perhaps customer numbers. The alphanumeric file name is cryptic, in no way identifiable with the customer, so the OP would like the file name to contain the customer name in addition to the number. For example, a file might be named:

D123456.PDF

The OP would like this file to be renamed:

D123456 John Smith.PDF

The customer name always begins in column 16 on the first line of the first page in the PDF file (and runs to the end of the line). The OP wants an automated way to rename the thousand PDF files, based on the customer name in the contents of each file – in essence, a batch/mass rename. The program documented in this Article (and provided in source code) performs this function.

Two excellent freeware products are needed for this solution – the AutoHotkey scripting language (the program is written in this) and the Xpdf package to convert the PDF …
7
 

Expert Comment

by:Member_2_7970298
Comment Utility
Joe,
Thanks for your response.
I appreciate your comments & issues.
I would greatly appreciate it if you could see your way clear to send me your original AutoHotKey script.
I'm trying to learn more about AutoHotKey scripts and especially how it interfaces with Xpdf's pdftotext.exe
Thanks
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Member_2_7970298 (???),

I received your email at my personal email address, which I'll respond to in a moment. I already responded to your post at the AHK forum, which led you to this article, and then to my Split-Rename-Move article. Instead of three different communication venues (EE, AHK, email), let's continue this discussion via just email.

That said, a quick message about your comments is that the Tutorials forum and the Scripts and Functions forum at the AHK boards are the way to go "to learn more about AutoHotKey scripts" (as well as the Tutorial at the AHK docs site).

There's not much to learn about "how it interfaces with Xpdf's pdftotext.exe" — the RunWait command is it. Here's an actual call from one of my programs:

RunWait,%pdftotextEXE% -f 1 -l 1 -raw "%FullFileNameCurrent%" "%DestinationFolder%%FileNameCurrentTXT%"

Open in new window

I'm sure from the names of the variables you can figure out what that line does. Also, I gave you links at the AHK forum to my two 5-minute EE video Micro Tutorials that should help you with learning about how to use the pdftotext.exe tool:
Xpdf - Command Line Utility for PDF Files
Xpdf - Convert PDF Files to Plain Text Files

If you haven't viewed them yet, I think you'll find them to be a worthwhile expenditure of 10 minutes. Regards, Joe
0
Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe

INTRODUCTION

The inspiration for this Article was a fascinating question here at Experts Exchange on combining TIFF files. Since it is in an area of extreme interest to me (Document Imaging) and since the solution involves two of my all-time favorite freeware products – IrfanView for the TIFF image processing and AutoHotkey for the scripting – I decided to publish the solution as an Article, with a lot more detail put into it than a typical response to a question.

INSTALLATION INSTRUCTIONS

The original poster (OP) of the problem (KHMaddox) said he has no programming experience at all, so I made the solution suitable for such a user. All you have to be capable of doing is download and install the two freeware products, IrfanView and AutoHotkey, and then run the script attached to this Article, as follows:

(1) Install AutoHotkey – http://ahkscript.org (also, see my EE article: AutoHotkey - Getting Started)

Click the Download button at the page above, save the install file, and run it.

(2) Install IrfanView – http://www.irfanview.com/
11
 

Expert Comment

by:Nathan Emch
Comment Utility
Your script sounds like exactly what we need, but I do not see it available for download anywhere on this page.  Am I missing something, or did your "Combine-TIFF.ahk" script get pulled from this page?
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Nathan,
Yes, I pulled the script from the article. My initial intention was to re-post the code, but I have decided not to do that. Instead, I am going to rewrite this article, and a similar article on merging PDF files, as "roadmap design" articles, with enough information to help folks write their own programs/scripts. However, I have enhanced the program over the years into what I now call MergeTIFF™. As you can see in the comments above from May of this year, Kit Maddox and Deacon Aspinwall had great results with MergeTIFF. But, as also mentioned above, I'm not yet ready to expose the program publicly on the Internet, so I'll write you a PM in the EE Message System to discuss this further, as I did with both Kit and Deacon. Regards, Joe
0
This article is about duplex scanning in Nuance's PaperPort software with a hardware-capable duplex scanner. It is not about the Scan Other Side feature in the Capture Assistant that allows a simplex scanner to achieve double-sided scanning. If you are interested in the latter, see my Experts Exchange article, How to Perform Duplex Scanning with a Simplex Scanner in PaperPort Versions 11, 12, 14. But this article is strictly about how to get duplex scanners to work in PaperPort, more specifically, how to achieve automatic/one-click duplex scanning, that is, duplex scanning with neither the Display scanner dialog box nor the Show Capture Assistant box checked. This article applies to the three most recent versions of PaperPort, i.e., 11, 12, and 14 – yes, Nuance got superstitious and did not release a version 13.

The first step in any scanning issue is to download the latest-and-greatest drivers from your scanner manufacturer's website. My experience over the years is that PaperPort is very sensitive to the TWAIN/WIA drivers, and I've seen many scanning problems fixed by installing the latest drivers.

The problem that I'm addressing in this article is the lack of a Duplex ADF choice in the Source field drop-down in the SET tab of a Scanning Profile. For example, here's what it may look like when the existence of a duplex scanner is not recognized:

Duplex scanner not properly set up
The first approach to fixing this is to run through Advanced setup
6
 

Expert Comment

by:Ed Burwell
Comment Utility
Joe,

This is GREAT!!!  THANKS!

I am using Paperport 11 SE+ and out of nowhere I am getting this DocuCom PDF Trial Watermark on my PDF files.  I've been scanning for years and all of a sudden, there it is?  Have you see this?  Do you know what causes it?  I have a Brother MFP by the way.

Thanks!

Ed
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Hi Ed,
First, thanks for joining EE today and reading my article — I appreciate it! I'm sorry to hear that you've run into the dreaded "DocuCom Watermark" glitch. This problem has been in the PaperPort (and PDF Converter) world for a long time. Two complicating factors in your situation are (1) the "SE" editions are non-retail (usually trimmed down) versions that are bundled with scanners (in your case, a Brother MFC), and solutions for the retail products often don't apply to the SE products; and (2) PP11 is very old — more than 10 years! All of that said, here are a few ideas for you:

(1) Read this PaperPort knowledgebase article:
DocuCom PDF Trial watermark appears when opening PDF files

(2) Uninstall and reinstall PP11SE from the media that came with your Brother MFC. Btw, which model MFC is it? Also, what version and bit level of Windows are you using?

(3) If you're not on Version 11.2, try to upgrade to it. The earlier point releases of PP11 — 11.0 and 11.1 — are known to have bugs that were fixed in 11.2. Another one of my EE articles explains how to upgrade to 11.2 at no cost:
PaperPort 11 - Free Upgrade to Version 11.2

However, because you have an SE version, this may not work, but it's worth a try. If it doesn't work, reinstall PP11SE from the media that came with your Brother MFC.

(4) Because it is such a common problem, there has been a lot of discussion about it in the PaperPort community. I participate in a Google Group devoted to PaperPort (and Nuance's other document imaging products), and a search there for "docucom watermark" turns up many hits. Here's the URL for that search:
https://groups.google.com/forum/#!searchin/paperport/docucom$20watermark%7Csort:relevance

You'll need to join the group (free, of course) to get the results (as a non-member, it will say, "You must be signed in as a member of this group to view and participate in it.").

(5) If PaperPort is important to you, buy the retail edition of the latest version, PP14. It is inexpensive these days — just $30 at Amazon for the Standard edition:
https://www.amazon.com/dp/B005CELKLM

That will be version 14.0, but another one of my EE articles explains how to upgrade to 14.5 (the latest release) at no cost:
PaperPort 14 - Free Upgrade to Version 14.5

After doing that, another one of my EE articles explains how to install the Patch 1 update to PP14.5 at no cost:
How to install the Patch 1 update for PaperPort 14.5

PP14.5/Patch1 is the latest-and-greatest, and is the only PaperPort version that is Windows 10-compliant.

If you have any scanning problems after that, I recommend installing the PP14 scanner connection tool. Another one of my EE articles explains how to do that at no cost:
PaperPort 14 Scanner Connection Tool - Fix Scanning Problems in PaperPort 14

Regards, Joe
0
Update 13-December-2014: Article Deprecated. The links in this article for the PaperPort 12 upgrades no longer work and Nuance informed me that the links for the PaperPort 11 ones may soon stop working. However, Nuance provided new links for them, as well as links for the latest version, PaperPort 14 (there was no PaperPort 13). The new links are to direct downloads of the upgrades (PP11.2, PP12.1, PP14.5), rather than to a Download Request Form, as with the previous links. In addition, Nuance provided links to a "Remover Tool" for all three versions. Lastly, there are "Standard versus Professional" feature comparison matrices for all three versions (this article shows the one only for PP12). To deal with such a substantial number of changes, I decided to deprecate this article. I also decided that adding PP14 information to both PP11 and PP12 would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I published three separate articles for PP11, PP12, and PP14 users. You will find them here:

PaperPort 11 - Free Upgrade to Version 11.2
PaperPort 12 - Free Upgrade to Version 12.1
PaperPort 14 - Free Upgrade to Version 14.5
1
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Comment Utility
Just a quick comment to point out that shortly after this article was published, Nuance did, indeed, release version 14 of PaperPort, confirming their superstitious behavior of skipping 13 (the latest release of PaperPort is version 14.5). Regards, Joe
0

OCR

532

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>