OCR Printing of Old Typewritten Documents

How can I scan 60 to 90 year old typewritten manuscripts into my Windows 8.1 using computer, using ABBYY Finereader 12 and an Epson 4630 printer/scanner, and get decent character recognition, which I am not getting now? Perhaps I am using the wrong term in "character recognition" because the scanned document is a duplication of the original, but the final product is awful. I am using 300 dpi, PDF,and gray scale as recommended. I have not used an OCR program before because I recently purchased the ABBYY and Epson 4630 specifically for the purpose of converting multiple unpublished manuscripts into printable documents. After many, many, hours, I am getting nowhere!
Joe Winograd, Fellow&MVEDeveloperCommented:


You are using the almost correct term – instead of "character recognition", it is "optical character recognition". This is what you have in the title – OCR. But "OCR Printing" is not correct – you want something like, "OCR when scanning".

OK, with terminology out of the way, let's talk about scanning 60-90 year old docs. Regardless of age, the accuracy of OCR depends heavily on the quality of the document. I would venture to guess that your 60-90 year old typewritten manuscripts are not of good quality.

ABBYY FineReader is an excellent OCR package – you have the latest V12 (I have V11). Your setting of 300DPI/grayscale (8-bit) is fine, although I do almost all scanning at 300DPI/black&white (monochrome/1-bit), which for most typical docs leads to accurate OCR. Occasionally, I'll use 300DPI/grayscale, and even less frequently, 600DPI/black&white.

My guess (and it's just a guess) is that your old docs are too faded/light for good accuracy, although the problem may be the quality of your scanner. It is almost surely not a problem with ABBYY FineReader.

I suspect your manuscripts have private/sensitive information, but if you can find one innocuous page and post it as a PDF, I'll try to OCR it with many software packages that I have. It would be good to post three scanned versions: 300DPI/black&white, 300DPI/grayscale, 600DPI/black&white. Regards, Joe

One other point. You said, "for the purpose of converting multiple unpublished manuscripts into printable documents." To be clear, the real purpose of OCR is to create text so that documents are searchable. If all you want is to be able to print them, you don't need OCR. You could scan to an image-only PDF (with just a raster image/bitmap/graphic) and print that. Regards, Joe

to add to the above - try scanning 1 document on different scanners, put it on usb stick, and ocr it then on your PC to see if there are differences
there are scanner brands that include software enhancements of the image - you could search for one that has it
contact the main sellers like HP

you can also use a software like this :  http://www.stoik.com/products/business-solutions/sdk/document-image-enhancement-sdk.php
Joe Winograd, Fellow&MVEDeveloperCommented:


I just realized in my last post that I didn't explain how to do an image-only scan in ABBYY FineReader. When you do the Save as PDF Document after scanning, click the Options button and you'll see this:

ABBYY Save image only
The default is Text under the page image, so you'll need to select Page image only to avoid the OCR step (all three Text... choices will result in OCR being done). That screenshot is from FR11, but I suspect that it's the same, or very similar, in FR12. Regards, Joe
you can also have them treated in a print shop - and ask for warranty printing !(ask for examples)
Joe Winograd, Fellow&MVEDeveloperCommented:


Question has a verified solution.

