I have a new client that is a document scanning company.
They have some Kodak 1440 & Kodak 4200 scanners and they are using Kodak Capture Pro Software.
This is my first time with dealing with this type of software and it seems like they have a production log-jam with the processing of the documents after they are scanned.
Almost all of their customers are demanding that the scans are "searchable" pdf files. This defintely adds to the processing time since the images need to be a higher resolution.
Is this Capture Pro Software slow by nature? Is there a different program that we should try instead? Or is this just the nature of the business.
Joe Winograd - EE Fellow & MVE Developer Commented:
Images do not have to be at a very high DPI in order to achieve accurate OCR. For typical business documents, they also do not have to be color (24-bit/32-bit) or grayscale (8-bit). For good OCR, 300 DPI, black&white (1-bit) docs are usually adequate. In fact, scanning at a higher DPI, such as 600, can actually decrease OCR accuracy. I suggest you take a look at Wayne Fulton's excellent site, "A few scanning tips":

In particular, take a look at his OCR tips:

One possible reason for the slow performance is that they are are scanning in color. But the other reason is what you hit upon – it is the nature of OCR to take time. A faster processor and more memory will help, but I have not observed a big difference in performance among the popular desktop imaging/OCR packages. If you want to try a package other than Kodak Capture Pro and do some performance comparisons, take a look at Nuance's PaperPort:

It supports any scanner with ISIS, TWAIN, or WIA drivers, and the Kodak i1440 and i4200 both have ISIS, TWAIN, and WIA drivers. PaperPort can scan directly to a PDF searchable image file by automatically invoking its built-in OCR engine, which is based on Nuance's OmniPage, an excellent OCR package:

Keep in mind that the above link is for the full OmniPage OCR package, which is not what I'm recommending, although if you want a more robust OCR package, it's worth a look – along with ABBYY FineReader, another excellent OCR package:

But I'm suggesting the PaperPort product, which calls the OmniPage SDK under the covers in order to create a PDF searchable image file (has the bitmap image from the scan as well as a layer of text from the OCR process). All of that said, I don't want to get your hopes up – as you surmised, and I said earlier, it's the nature of OCR. Regards, Joe
