OCR

536

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

PaperPort XP Compatibility Mode
Nuance's PaperPort may display this error message: PaperPort appears to be running Windows XP Compatibility Mode which may result in errors. We recommend disabling Compatibility Mode for the PaprPort.exe program, see Technote 6629. This article provides a possible solution to the problem.
2
 
LVL 17

Expert Comment

by:Andrew Leniart
Comment Utility
@Walter
I tried repairing then uninstalling, reinstalling Power PDF Standard. Still getting "DLL not found" when I attempt scanning.

I don't mean to interject here, but I recently had this *exact* problem, albeit when trying to re-install a free PDF writing utility for a client of mine. Whenever he tried to create a PDF, same thing - "DLL not found". After much frustration (and a couple of uninstalls/re-installs) I tracked the problem down to a Windows registry corruption, whereby even uninstalling the app completely and reinstalling it didn't resolve the issue.

I ended up installing the PDF writer on one of my own VM's where it worked correctly, tracked and exported the registry entries from my working VM and imported them into my client's machine. All DLL related problems instantly dissapeared because I'd already verified of course that the necessary DLL's were where they were supposed to be. Your issue may be totally different, but I just thought I'd throw that in as a possible cause.

@Joe - I hope you don't mind me chiming in here, just thought it may be another scenario you may like to consider.

Regards, Andrew
1
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Walter,
A few more thoughts:

(1) Since you're on W10, make sure you install Patch 1 after installing PP14.5, as PP14.5/Patch1 is the only W10-compliant version of PaperPort. Also, see if any of the Tips in my PaperPort 14 in Windows 10 - A First Look article are things that you haven't tried yet.

(2) Maybe there's a corruption in your user profile that is causing the grief to PaperPort and/or Power PDF. Try creating a new user profile.

(3) This is certainly not the DLL problem, but it would be a good idea to get the latest W10/64-bit drivers from the Brother site for your MFC-J6710DW.

(4) I sent an email to my contacts at Nuance with a link to this thread. They may not want to post publicly about it, but, with their permission, I'll share whatever they say here.

Regards, Joe

Edit: Hi Andrew,
I saw your post after hitting Submit on mine. I don't mind your chiming in here...indeed, I encourage it! Colloaboration with the many bright folks here at Experts Exchange often leads to a solution that eludes an individual. In this case, your idea is certainly a good one... thanks for chiming in! Regards, Joe
0
Get expert help—faster!
LVL 12
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

PaperPort 14.5 Patch 1 update is often not detected or downloaded automatically. This article provides direct download links to solve the problem for retail (non-bundled) versions of the Standard and Professional editions, as well as the Professional edition in Nuance's own OmniPage Ultimate bundle.
17
 

Expert Comment

by:Frank Tonis
Comment Utility
Thanks, Joe. Patch 1 installed effortlessly! Now I am about finding and installing Canon ScanGear for my new machine. In my old PC, this software made the dialog box in Paperport much more useful. Have you heard of this? Silly question, eh?

My best,

Frank
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
That's great news, Frank! I'm very glad to hear that you now have PaperPort Pro 14.5 with Patch 1 installed.

I (and all my clients) have never had a Canon scanner, so I can't speak from personal experience, but I do know that ScanGear is Canon's TWAIN-compatible and WIA-compatible scanning driver. You should be able to download it from the Drivers&Downloads section of the Canon support website. Make sure that you download the version that supports your particular scanner model and your version of Windows, including its bit level. After installing it, PaperPort should automatically detect its existence and offer its TWAIN and WIA drivers in the Scan or Get Photo pane (when you click the Desktop menu, then the Scan Settings button on the ribbon). But that's just a guess on my part, since I've never had ScanGear installed. If you need further help on it, click the big, blue "Ask a Question" button at the top of this page (or any EE page), fill in the simple form, and submit it. There are likely many other EE members who have Canon scanners and are more knowledgeable than I am about ScanGear. Regards, Joe
0
PaperPort is among the most important applications that I run on my Windows computers. I use it every day, for nearly all of my document and photo scanning, as well as most of my document and photo imaging, including OCR via its built-in OmniPage capabilities.

Disclaimer before going further: I have no affiliation with Nuance and no financial interest in it whatsoever. I am simply a happy user/customer.

I've been using PaperPort for around 20 years on every version of Windows since Windows 95. With the Windows 10 release date coming up in two days, I thought it would be worthwhile to document my experience with PaperPort on the Windows 10 Technical Preview, including some tips for successful deployment on W10.

First, my experience with the various builds along the way: I did not install PaperPort on the initial Windows 10 Technical Preview of Build 9841, released on 30-Sep-2014. But I installed on every build after that, from 9860 through the current 10240. The platform is physical hardware, not a virtual machine. It is a relatively old laptop with mediocre specs by today's standards:

Intel Core2 Duo T9300 2.50GHz
4GB RAM DDR2 PC5300
Samsung SSD 840 EVO 250GB (with the read performance firmware upgrade
15
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
You're welcome, Dana. I'm hoping for a PP15 (or maybe just PP14.6) that Nuance certifies and officially supports for W10. It's even possible that Nuance would certify PP14.5 for W10. I'll post here as soon as I learn anything official from Nuance about PP in W10. In the meantime, there's been some discussion about it in the Google PaperPort Group (and its PaperPort wiki). You may want to check in on that. Regards, Joe
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
In a comment above, I mentioned that the Patch 1 update did not work with the PaperPort Professional 14.5 that is included with Nuance's own OmniPage Ultimate bundle. I am pleased to report that Nuance has finally fixed this, although there is still a glitch in the process such that, in some cases, the Common Software Update Manager does not perform the update, even though it detects its existence. The solution, once again, is to get the installer from a direct download link. I just published an article here at EE explaining the method:
How to install the Patch 1 update for the PaperPort Professional 14.5 bundled with OmniPage Ultimate

The article also shows a list of the support tickets that Patch 1 fixes. Regards, Joe
0
In a previously published article here at Experts Exchange, I explained how to achieve duplex (double-sided) scanning in Nuance's PaperPort software with a hardware-capable duplex scanner, that is, a scanner which has an Automatic Document Feeder (ADF) capable of scanning both sides of a document. A recent question here at EE prompted me to write this additional article, which explains how to achieve duplex scanning in PaperPort with a simplex scanner, that is, a scanner whose ADF is capable of scanning only the front side of a document.

As with the previous article, this one applies to the three most recent versions of PaperPort, i.e., 11, 12, and 14 — yes, Nuance got superstitious and did not release a version 13.

Here are the steps to achieve duplex scanning in PaperPort (either Standard or Professional) with a simplex scanner:
 
  • Click the Scan Settings button on the Ribbon in PP12 and PP14, or the Scan or Get Photo icon on the toolbar in PP11. You will now have the Scan or Get Photo pane:

Scan-or-Get-Photo.jpg 
  • Select a Scanner and a Scanning Profile.
 
  • Tick the Show Capture Assistant box.
 
  • Place the document in the (simplex) ADF and click the Scan button.
 
  • In PaperPort Standard, you will get this:

front-side-PP-Std.jpg 
  • In PaperPort Professional, you will get this:

front-side-PP-Pro.jpg 
  • Remove the document from the output tray, turn it over so that the last page is on the top, place it in the ADF, and click the Scan Other Side button.
 
  • In PaperPort Standard, you will get this:

after-Scan-Other-Side-PP-Std.jpg 
  • In
2
PaperPort is a popular document imaging/management product from Nuance Communications. It is in widespread use by both individuals and businesses.

The current version of PaperPort is 14. The previous version was 12 (yes, Nuance got superstitious and skipped 13). Both of these most recent versions come in two editions, Professional and Standard. All four products — PP12 Standard, PP12 Professional, PP14 Standard, PP14 Professional — have the ability to create a searchable PDF file without any other software needing to be installed. PP12 was the first release that could do this (and it was carried forward into PP14).

Prior PaperPort releases require Nuance's OmniPage (a separately priced OCR product) to be installed in order to create a searchable PDF file that PaperPort calls a PDF Searchable Image file (because it contains both the raster image and the text created by OCR). The reason that PP12 and PP14 can create a PDF Searchable Image file is that it contains the OmniPage OCR engine under the covers — via the OmniPage Capture Software Development Kit (CSDK).
 
Sidebar on PaperPort Version: If you are running PP12.0, I recommend that you upgrade (free!) to PP12.1. This EE article explains how to do it:
PaperPort 12 - Free Upgrade to Version 12.1
If you are running PP14.0, PP14.1, or PP14.2, I recommend that you upgrade (free!) to PP14.5 (there was not a public release for either 14.3 or 14.4). This EE article explains how to do it:
2
 

Expert Comment

by:Serg __
Comment Utility
Any ideas how to make the fonts vectorized in the searchable .pdf? I am asking this question because I would not like to install a pirated Adobe Acrobat to convert one pdf book into a pdf book with vectorized fonts. What I got from PaperPort did not meet my expectations. the fonts got blurry. I expected them to get clean and vectorized, to be able to zoom in without those annoying pixels.
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Serg,
Thank you for joining Experts Exchange this week and reading my article.

> Any ideas how to make the fonts vectorized in the searchable .pdf?

I do not have great expertise in font technology and am not aware of any way to control the font settings when PaperPort creates PDF Searchable Image files via the methods discussed in this article.

> I am asking this question because I would not like to install a pirated Adobe Acrobat to convert one pdf book into a pdf book with vectorized fonts.

I find that a strange comment — why would you even consider installing pirated software? We do not condone that here at Experts Exchange and, in fact, the Experts Exchange Terms of Use strictly prohibit any posting related to such activities (under Section 6, Code of Conduct). If you know that Adobe Acrobat will solve your font issue, and it is for only one PDF book, then I recommend purchasing just one month of Adobe Acrobat DC. For around 25 bucks, you'll avoid pirating software ($22.99 for one month of Acrobat Standard DC or $24.99 for one month of Acrobat Pro DC).

> What I got from PaperPort did not meet my expectations. the fonts got blurry.

It's likely that the fonts are blurry only when viewing the image layer. If you view just the text layer, the fonts should be fine. For example, I printed the first page of this article with the PaperPort Image Printer in B&W at 300 DPI to a PDF Image (not PDF Searchable Image). The whole page is attached as a PDF, but here's what it looks like:

font in image
The fonts, indeed, are blurry, because that's a view of the image (in Adobe Acrobat). I then used Nuance's Power PDF to convert to a searchable PDF, but told it not to keep the images. The whole page for that is also attached as a PDF, but here's the same small sample as shown above:

font in non-image
The fonts look great, because that's a view of the text (in Adobe Acrobat), since there is no image layer in the PDF.

> I expected them to get clean and vectorized, to be able to zoom in without those annoying pixels.

The fonts are fine in the text, as shown above. They get pixelated only when viewing the image layer. Another way to observe this is to Copy the text from the PDF Searchable Image file (created by PaperPort via one of the methods explained in this article) and then Paste it into a text-capable product, such as Notepad or Word — the fonts will, of course, appear fine. Regards, Joe
image-only-PaperPort-PDF-Image.pdf
text-only-Power-PDF-searchable-do-no.pdf
0
PaperPort
I. Introduction

In a previous article (now deprecated), I discussed how to upgrade — at no cost for licensed users — Nuance's PaperPort Version 11 (hereafter, PP11) and PaperPort Version 12 (PP12) to the latest "point" releases, namely, 11.2 and 12.1. At the time of that article's publication, PP11 and PP12 were the two latest versions. Now the latest version is PP14 (yes, Nuance was superstitious and skipped 13), and its latest "point" release is 14.5.

I decided that adding PP14 to the previous article would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I decided to create three separate articles for PP11, PP12, and PP14 users. This is the PP14 one.

The earlier point releases of PP14 — 14.0, 14.1, 14.2 (there was not a public release for either 14.3 or 14.4) — are known to have bugs that were fixed in 14.5. This article provides links to 14.5, as well as other useful information on upgrading.

II. Comparison of Standard and Professional Editions

For PP14, there are two consumer editions – Standard and Professional. The feature comparison matrix is available in the Files section of this PaperPort wiki:
http://sites.google.com/site/wikipaperport/files

Here is a direct link to the PDF:
Comparison Matrix of PP14 Standard and PP14 Professional

III. Links to Downloads

The links are to a direct download
12
 

Expert Comment

by:Robert Hanson
Comment Utility
Dude, you saved my life. I had almost given up getting Paperport back up and running on Windows 10. Your procedure and tools worked!!

Thank you so much.
1
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
You're very welcome, Robert, and my thanks back to you for joining Experts Exchange today, reading my article, and providing such great feedback. I'm glad to hear that you now have PaperPort running in W10 . If you take a moment to endorse the article by clicking the thumbs-up icon at the end of the article (not the one underneath this comment), I'll appreciate it. Regards, Joe
0
I. Introduction

In a previous article (now deprecated), I discussed how to upgrade — at no cost for licensed users — Nuance's PaperPort Version 11 (hereafter, PP11) and PaperPort Version 12 (PP12) to the latest "point" releases, namely, 11.2 and 12.1. At the time of that article's publication, PP11 and PP12 were the two latest versions. Now the latest version is PP14 (yes, Nuance was superstitious and skipped 13), and its latest "point" release is 14.5.

I decided that adding PP14 to the previous article would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I decided to create three separate articles for PP11, PP12, and PP14 users. This is the PP12 one.

The earlier point release of PP12 — 12.0 — is known to have bugs that were fixed in 12.1. The links in the previous article for 12.1 no longer work. This new article provides working links for 12.1, as well as other useful information on upgrading.

II. Comparison of Standard and Professional Editions

For PP12, there are two consumer editions – Standard and Professional. The feature comparison matrix is available in the Files section of this PaperPort wiki:
http://sites.google.com/site/wikipaperport/files

Here is a direct link to the PDF:
Comparison Matrix of PP12 Standard and PP12 Professional

III. New Links to Downloads

The new links are to a direct download
2
I. Introduction

In a previous article (now deprecated), I discussed how to upgrade — at no cost for licensed users — Nuance's PaperPort Version 11 (hereafter, PP11) and PaperPort Version 12 (PP12) to the latest "point" releases, namely, 11.2 and 12.1. At the time of that article's publication, PP11 and PP12 were the two latest versions. Now the latest version is PP14 (yes, Nuance was superstitious and skipped 13), and its latest "point" release is 14.5.

I decided that adding PP14 to the previous article would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I decided to create three separate articles for PP11, PP12, and PP14 users. This is the PP11 one.

The earlier point releases of PP11 — 11.0 and 11.1 — are known to have bugs that were fixed in 11.2. Although the links in the previous article for 11.2 still work, Nuance informed me that they may soon stop working. This new article provides working links for 11.2 that Nuance says will continue to work after the other ones have been taken down. This article also provides other useful information on upgrading.

II. Comparison of Standard and Professional Editions

For PP11, there are two consumer editions – Standard and Professional. The feature comparison matrix is available in the Files section of this PaperPort wiki:
http://sites.google.com/site/wikipaperport/files

Here is a direct link to the PDF:
2
 

Expert Comment

by:Becky Hanlon
Comment Utility
The update wants a serial number, which I do not have as I lost my software CD that came with my MFC-7340 Brother printer.  Is there a way to install the update without this?
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Becky,
Thank you for joining Experts Exchange today and reading my article — welcome aboard! The PaperPort software that comes with Brother MFCs is an "SE" version, which stands for Special Edition. It is a trimmed-down version that Brother OEMs from Nuance for bundling with their devices and it is not considered to be a commercial/retail edition. This means that updates of the software, such as the 11.2 upgrade discussed in this article, will likely not apply to those bundled SE versions. So even if you had the serial number, it is unlikely that the 11.2 upgrade would work on it.

The other issue is that PP11 is more than 10 years old. As noted in this article, Vista is the latest Windows on which PP11 is supported. My suggestion is to purchase a retail copy of the latest version of PaperPort, which is 14. It is currently $31.61 at Amazon:
https://www.amazon.com/dp/B005CELKLM

That's the standard edition, not Professional, but it's probably more functional than the SE version that was bundled with your Brother MFC. You may want to wait for a better price, as I've seen it at Amazon for less. The download (or disk) is going to be version 14.0, but you may upgrade it for free to version 14.5, because it is a retail version. This comment that I posted at an EE question a couple of months ago explains the upgrade process, referring to several other articles that I've published here at EE:
https://www.experts-exchange.com/questions/29057949/Window-10-version.html#a42302130

Interesting to note that it was just $19.20 at Amazon back then. Once again, welcome to Experts Exchange! Regards, Joe
0
PaperPort
This article discusses the PaperPort 14 Scanner Connection Tool, which Nuance provides at no charge in order to fix scanning problems in Windows 8. Furthermore, users of PaperPort 14 in Windows 7 and Windows 10 have reported that the tool works in those versions of Windows, too.
1
 

Expert Comment

by:Tina Stark
Comment Utility
I have been running Paperport 12 for years.  Two days ago I updated Windows 7 and now PaperPort 12 will NOT run.  I have reinstalled, I had a tech look at it.  Not happening.  So if I purchase Paperport 14 will it run?  I don't have the time to play with this program, I just need my scanner working.  Thank you
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Tina,
Thanks for joining Experts Exchange today and reading my article — Welcome Aboard!

I can't say if PaperPort 14 will run on your system. Of course, it should run — but PaperPort 12 should run on W7, too. So there may be some other problem unrelated to PaperPort that is causing the scanner not to work. One test is to see if another program can scan. Try the built-in Paint app in W7 — click the drop-down, then "From scanner or camera" (or try any other app that you prefer, such as Acrobat, IrfanView, paint.net, etc.).

The most common culprit when a scanner doesn't work is the scanning drivers (ISIS, TWAIN, WIA/WIA2). The first thing I suggest is to install the latest drivers from your scanner manufacturer's website, being to sure to get the correct bit level for your version of W7, i.e., 32-bit or 64-bit. After installing the latest scanning drivers, try PaperPort 12 again, although I suggest upgrading (at no charge) to PP12.1, as explained in my EE article:

PaperPort 12 - Free Upgrade to Version 12.1

That said, I do recommend purchasing PaperPort 14. Both the standard edition and Professional edition are reasonably priced at Amazon these days. It is version 14.0, but you may upgrade to PP14.5 (at no charge), as explained in my EE article:

PaperPort 14 - Free Upgrade to Version 14.5

Then install the Patch 1 update and you'll be on the latest release, which works fine in W7 (and PP14.5/Patch1 is the only W10-compliant version of PaperPort). Installation of the Patch 1 update is explained in my EE article:

How to install the Patch 1 update for PaperPort 14.5

After that, install the PaperPort 14 Scanner Connection Tool, as described in this article. Regards, Joe
0
Power PDF Advanced
This article explains how to perform batch conversion of PDF, TIFF, and other image file formats into PDF, PDF Searchable, and TIFF files via a command line interface, using Nuance's latest document imaging software — Power PDF Advanced.
6
 

Expert Comment

by:Chris S
Comment Utility
Hello,

i tried to use the command line but i always get the error message "File open error" although I have write to all directories stated in the command?! It always says "Converting H:\TEST.HTML TO H:\TEST.PDF" which I think means that the Syntax itself is correct but either the Input or Output file cannot be opened?
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Chris,
First, thanks for joining Experts Exchange today and reading my article — I appreciate it!

The input file cannot be an HTML file type. The Help output, even in the latest version 2.1 of Power PDF Advanced, still says this:

-I input file full path. * can be used for filename (*.pdf, *.tif)

As I mentioned in the article, I discovered through experimentation that the input file type may also be GIF, JPG, JPEG, PNG, and TIFF. But I just tried HTML in both v2.0 and v2.1, and can confirm that the input file may not be HTML. As you saw yourself, it gives an error message that says "File open error." My advice is to open the HTML file in whatever web browser you prefer and then print it to whatever PDF print driver you prefer. Once you have the PDF file, run Power PDF again, this time using the PDF file as the input instead of the HTML file. Of course, you may not even need to do that if you're happy with the file from the PDF printer. Regards, Joe
0
Free Tool: ZipGrep
LVL 12
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Update 21-May-2015: I temporarily removed the source code to make major changes to the program. Regards, Joe

INTRODUCTION

This article presents a solution to a question asked here at Experts Exchange. The situation is that there's a large number of subfolders (400 in the original question), each of which has a number of PDF files (two in the original question). The goal is to combine/merge the PDF files in each subfolder (in ascending date order) into a single PDF file, storing the combined file in each subfolder. The source PDF files in each subfolder may have any file names and the user should be able to specify the file name of the combined file.

REQUIRED SOFTWARE

The method presented in this article requires AutoHotkey, an excellent (free!) programming/scripting language. The quick explanation for installing AutoHotkey is to visit its website. A more comprehensive explanation is to read my EE article, AutoHotkey - Getting Started. After installation, AutoHotkey will own the AHK file type, supporting the solution discussed in the remainder of this article.

The program utilizes another excellent (free!) piece of software — PDF Toolkit (PDFtk). It comes in both command line and GUI versions. The command line version is called PDFtk Server
7
 

Expert Comment

by:Centex Aps
Comment Utility
Hi

Will the "Combine-Merge-PDF-files-20140826.ahk"  file not be attached again?
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Centex,
I've decided not to post the full program. I'll be rewriting the article as a "design roadmap" with some crucial code snippets, such as how to call PDFtk Server, but will not be posting the complete source code. Regards, Joe
0
The standard (non-Professional) edition of PaperPort from Nuance Communications (previously known as ScanSoft) is limited to five Scanning Profiles, but in a previous article, I discussed how to overcome this limitation. The technique presented in that article may also be used to address an issue that I've been asked many times by PaperPort users, namely, how to reorder the Scanning Profiles in the Scan or Get Photo pane.

For users with many Scanning Profiles, it is desirable to order the list such that the more frequently used ones are at the top. Unfortunately, PaperPort 12, the previous release, offers no ability to rearrange the order of the Scanning Profiles. PaperPort 14, the current release (yes, Nuance got superstitious and skipped 13), added a little-known, undocumented feature that helps: drag-and-drop of the Scanning Profiles. However, it works poorly, in my opinion, as I find it difficult to drop the profile exactly where I want it.

Also, in both PP12 and PP14, there is no ability to search for a Scanning Profile, so the user must scroll through the list to find the desired profile. This article presents an approach for reordering the list, using the same method presented in my previous article, PaperPort - How To Achieve More Than Five Scanning Profiles in the Standard Edition.

All of the screenshots in this article are from PaperPort Professional 14
1
PaperPort is a popular document imaging/management product from Nuance Communications, previously known as ScanSoft. PaperPort is in widespread use by both individuals and businesses.

The current version of PaperPort is 14. The previous version was 12. Yes, Nuance got superstitious and skipped 13. Both of these most recent versions come in two editions, Professional and Standard, although the Nuance folks do not call it Standard – they simply leave Professional off the name, i.e., PaperPort 12 and PaperPort Professional 12; PaperPort 14 and PaperPort Professional 14. In this article, I refer to them as PP-Std and PP-Pro, and all such references are valid for versions 12 and 14.

There are numerous differences between PP-Std and PP-Pro. The comparison matrices may be seen in the Files section at this PaperPort wiki in these files:

Comparison Matrix of PP12 Standard and PP12 Professional.pdf
Comparison Matrix of PP14 Standard and PP14 Professional.pdf

As shown in the documents above, one of the differences between PP-Std and PP-Pro is that the former allows only five Scanning Profiles to be created, while the latter allows an unlimited number. However, it turns out that PP-Std will properly handle an unlimited number of Scanning Profiles. The problem is that it won't let you create them. This is easy to overcome by creating the file containing the Scanning Profiles outside of PP-Std. This article describes two ways to do it.

3
 

Expert Comment

by:mapline
Comment Utility
Hi Joe
Great suggestion 2 comments:
1 My PP 14.5 std stores file in C:\ProgramData\Nuance\PaperPort\14\Profiles.xml (Windows 10)
2. Notepad++ great free app for viewin/editing xml files.
Many thanks
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Michael,
Sorry I'm just replying to your 25-Mar-2016 comment now. I don't recollect seeing it when it first came in and only just now saw it when I received a notification that you endorsed the article today — btw, thanks for that!

> My PP 14.5 std stores file in C:\ProgramData\Nuance\PaperPort\14\Profiles.xml (Windows 10)

You will also find it at C:\Users\All Users\Nuance\PaperPort\14\Profiles.xml in W10. That's because C:\Users\All Users\ points to C:\ProgramData\. In other words, C:\ProgramData\ is the "real" folder and C:\Users\All Users\ is simply a pointer to it — technically known as a junction or symbolic link. So if you look at C:\ProgramData\ and C:\Users\All Users\ in your file manager, they'll show the identical contents, because they are one-and-the-same folder.

> Notepad++ great free app for viewing/editing xml files.

I have Notepad++ installed and agree that it is a great free app, although I use it only for test purposes, since I do all of my text editing with my fav text editor that I've been using forever. But thanks for the tip to our readers! Regards, Joe
0
PaperPort is a popular document management/imaging product from Nuance Communications. It is in widespread use by both individuals and businesses. The current version of PaperPort is 14 (previous version was 12 – Nuance got superstitious and skipped 13). This Article documents how PP14 finally solved a nasty duplex scanning problem that has plagued PaperPort since the introduction of the Blank page is job separator capability in PP10.

The problem is that a blank back side of a page will act as a job separator during a duplex scan. This is extremely bad, since most double-sided documents have some single-sided pages, and they will terminate the document – not what you want! It makes the Blank page is job separator capability practically worthless for users doing duplex scanning. In other words, if you are using a duplex scanner and a page in the stack is not blank on the front, but is blank on the back, this should not be considered as a separator page. In the case of duplex scanning, a page should be blank on both sides in order for it to be treated as a separator page. Otherwise, you'll get what should be a single document broken into separate PaperPort items if that document happens to have some single-sided and some double-sided pages.

This "bug" (Nuance called it a "feature" when I reported it) existed in PP10, PP11, and PP12 (as mentioned above, there was no PP13). Nuance finally fixed it in PP14 with the addition of a new sub-option in the Settings for a
2
 

Expert Comment

by:donjud
Comment Utility
Joe,
I realize this is a bit off topic, but you have so helpful I wanted to ask you first. I am also new to Experts Exchange and wasn't sure if I should start a new Topic.
 The folder we have added to Paper Port is on our server so that it can be accessed and modified by multiple users in our office. The problem we have run in to is that, when someone makes any changes or adds a new document and adds it to the All in One search index, the new document is still not indexed on the other computers so, each user has to re-index the folder in order to find it using the (AIO) search. We have set Paper Port to index every night but, we would like for the document to be indexed on all of the computers as soon as the changes are made.
Is there a setting that would automatically index all incoming or modified documents to the Paper Port Folder or is there a way to run Paper Port on the server? We have spent several days trying to come up with a solution to this problem and haven’t had any luck.
Thanks,
J.D.
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi J.D.,
I was working on a (lengthy!) reply to your same question in the message system, which I just sent (before seeing this). Regards, Joe
0
Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe

INTRODUCTION

This Article is a follow-up to the Article entitled How To Rename-Move a Batch of PDF Files Based on Contents of the Files, recently published here at Experts Exchange.

I considered adding the new feature (splitting a single document into multiple documents) to that Article and program, but concluded that it is a significant enough enhancement to warrant a new Article and program.

PREVIOUS ARTICLE

To understand this Article, it will be helpful to read the previous Article, but to get things going here right away, here's a summary of the previous problem and solution.

There is a large batch of PDF files, all with cryptic names, such as [D123456.PDF]. Inside each file on the first line of the first page (always starting at a fixed column and running to the end of the line) is a human-friendly identifier for the file, such as [John Smith]. The requirement is to loop through all of the files in a specified folder in an automated fashion, changing the file names from, for example,

D123456.PDF

to

D123456 John Smith.PDF

That is, add the identifier from the first line of the first page to the file name.

NEW REQUIREMENT

Following publication of the previous Article and the program that implements the solution, the Original Poster (OP) of the question that prompted the Article
7
 

Expert Comment

by:Member_2_7970298
Comment Utility
how do I obtain a copy of the autohotkey script?
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi New Member,

When I removed the source code last year from six articles that I published here at EE, my intention was that the removal be temporary. I began a project to rewrite all of the programs in my portfolio in order to generalize them for a broader audience and to have a standard user interface, including both a GUI (graphical user interface) and, where it makes sense, a CLI (command line interface). It wound up being a much larger effort than I anticipated, and I'm still not ready to post or distribute the source code for this program (or any of the other five published at EE — and I don't know when or even if that will be, for a variety of reasons).

I have created customized versions of these various programs for EE members who became clients of mine. I provided licenses for the run-time programs (the executables, i.e., the compiled EXE files) for an agreed-upon fee, but I did not provide the source code. I did this previously when EE had the "Hire Me" button, but that no longer exists. The mechanism now at EE for such work is the new Gigs feature, if that interests you.

Regards, Joe
0
Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe

A recent question here at Experts Exchange piqued my interest, so I decided to provide a thorough solution and publish this Article about it. The Original Poster (OP) of the question has approximately one thousand PDF files containing 7-character sequential alphanumeric file names (and, of course, all of the file extensions are PDF). Although the OP did not state this, it is likely that the sequential alphanumerics represent unique identifiers for his customers, perhaps customer numbers. The alphanumeric file name is cryptic, in no way identifiable with the customer, so the OP would like the file name to contain the customer name in addition to the number. For example, a file might be named:

D123456.PDF

The OP would like this file to be renamed:

D123456 John Smith.PDF

The customer name always begins in column 16 on the first line of the first page in the PDF file (and runs to the end of the line). The OP wants an automated way to rename the thousand PDF files, based on the customer name in the contents of each file – in essence, a batch/mass rename. The program documented in this Article (and provided in source code) performs this function.

Two excellent freeware products are needed for this solution – the AutoHotkey scripting language (the program is written in this) and the Xpdf package to convert the PDF …
7
 

Expert Comment

by:Member_2_7970298
Comment Utility
Joe,
Thanks for your response.
I appreciate your comments & issues.
I would greatly appreciate it if you could see your way clear to send me your original AutoHotKey script.
I'm trying to learn more about AutoHotKey scripts and especially how it interfaces with Xpdf's pdftotext.exe
Thanks
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Member_2_7970298 (???),

I received your email at my personal email address, which I'll respond to in a moment. I already responded to your post at the AHK forum, which led you to this article, and then to my Split-Rename-Move article. Instead of three different communication venues (EE, AHK, email), let's continue this discussion via just email.

That said, a quick message about your comments is that the Tutorials forum and the Scripts and Functions forum at the AHK boards are the way to go "to learn more about AutoHotKey scripts" (as well as the Tutorial at the AHK docs site).

There's not much to learn about "how it interfaces with Xpdf's pdftotext.exe" — the RunWait command is it. Here's an actual call from one of my programs:

RunWait,%pdftotextEXE% -f 1 -l 1 -raw "%FullFileNameCurrent%" "%DestinationFolder%%FileNameCurrentTXT%"

Open in new window

I'm sure from the names of the variables you can figure out what that line does. Also, I gave you links at the AHK forum to my two 5-minute EE video Micro Tutorials that should help you with learning about how to use the pdftotext.exe tool:
Xpdf - Command Line Utility for PDF Files
Xpdf - Convert PDF Files to Plain Text Files

If you haven't viewed them yet, I think you'll find them to be a worthwhile expenditure of 10 minutes. Regards, Joe
0
Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe

INTRODUCTION

The inspiration for this Article was a fascinating question here at Experts Exchange on combining TIFF files. Since it is in an area of extreme interest to me (Document Imaging) and since the solution involves two of my all-time favorite freeware products – IrfanView for the TIFF image processing and AutoHotkey for the scripting – I decided to publish the solution as an Article, with a lot more detail put into it than a typical response to a question.

INSTALLATION INSTRUCTIONS

The original poster (OP) of the problem (KHMaddox) said he has no programming experience at all, so I made the solution suitable for such a user. All you have to be capable of doing is download and install the two freeware products, IrfanView and AutoHotkey, and then run the script attached to this Article, as follows:

(1) Install AutoHotkey – http://ahkscript.org (also, see my EE article: AutoHotkey - Getting Started)

Click the Download button at the page above, save the install file, and run it.

(2) Install IrfanView – http://www.irfanview.com/
11
 

Expert Comment

by:Nathan Emch
Comment Utility
Your script sounds like exactly what we need, but I do not see it available for download anywhere on this page.  Am I missing something, or did your "Combine-TIFF.ahk" script get pulled from this page?
0
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Nathan,
Yes, I pulled the script from the article. My initial intention was to re-post the code, but I have decided not to do that. Instead, I am going to rewrite this article, and a similar article on merging PDF files, as "roadmap design" articles, with enough information to help folks write their own programs/scripts. However, I have enhanced the program over the years into what I now call MergeTIFF™. As you can see in the comments above from May of this year, Kit Maddox and Deacon Aspinwall had great results with MergeTIFF. But, as also mentioned above, I'm not yet ready to expose the program publicly on the Internet, so I'll write you a PM in the EE Message System to discuss this further, as I did with both Kit and Deacon. Regards, Joe
0
This article is about duplex scanning in Nuance's PaperPort software with a hardware-capable duplex scanner. It is not about the Scan Other Side feature in the Capture Assistant that allows a simplex scanner to achieve double-sided scanning. If you are interested in the latter, see my Experts Exchange article, How to Perform Duplex Scanning with a Simplex Scanner in PaperPort Versions 11, 12, 14. But this article is strictly about how to get duplex scanners to work in PaperPort, more specifically, how to achieve automatic/one-click duplex scanning, that is, duplex scanning with neither the Display scanner dialog box nor the Show Capture Assistant box checked. This article applies to the three most recent versions of PaperPort, i.e., 11, 12, and 14 – yes, Nuance got superstitious and did not release a version 13.

The first step in any scanning issue is to download the latest-and-greatest drivers from your scanner manufacturer's website. My experience over the years is that PaperPort is very sensitive to the TWAIN/WIA drivers, and I've seen many scanning problems fixed by installing the latest drivers.

The problem that I'm addressing in this article is the lack of a Duplex ADF choice in the Source field drop-down in the SET tab of a Scanning Profile. For example, here's what it may look like when the existence of a duplex scanner is not recognized:

Duplex scanner not properly set up
The first approach to fixing this is to run through Advanced setup
8
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
You're very welcome, Cori, and thanks back at you for joining Experts Exchange today and reading my article. I'm glad to hear that you now have duplex scanning working in PaperPort with your Brother PDS-5000 duplex scanner. If you take a moment to endorse the article by clicking the thumbs-up icon at the end of it (which currently says 6). I'll appreciate it. Welcome to Experts Exchange! Regards, Joe
1
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Cori,
Thanks for endorsing the article — much appreciated! Cheers, Joe
0
Update 13-December-2014: Article Deprecated. The links in this article for the PaperPort 12 upgrades no longer work and Nuance informed me that the links for the PaperPort 11 ones may soon stop working. However, Nuance provided new links for them, as well as links for the latest version, PaperPort 14 (there was no PaperPort 13). The new links are to direct downloads of the upgrades (PP11.2, PP12.1, PP14.5), rather than to a Download Request Form, as with the previous links. In addition, Nuance provided links to a "Remover Tool" for all three versions. Lastly, there are "Standard versus Professional" feature comparison matrices for all three versions (this article shows the one only for PP12). To deal with such a substantial number of changes, I decided to deprecate this article. I also decided that adding PP14 information to both PP11 and PP12 would result in a long, unwieldy article. In addition, a user of one version is not going to be concerned about the other two versions, so I published three separate articles for PP11, PP12, and PP14 users. You will find them here:

PaperPort 11 - Free Upgrade to Version 11.2
PaperPort 12 - Free Upgrade to Version 12.1
PaperPort 14 - Free Upgrade to Version 14.5
1
 
LVL 59

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Just a quick comment to point out that shortly after this article was published, Nuance did, indeed, release version 14 of PaperPort, confirming their superstitious behavior of skipping 13 (the latest release of PaperPort is version 14.5). Regards, Joe
0

OCR

536

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>