Patrick O'Dea

asked on

OCR - How good is it?

Hi all,

I am a software consultant and may have a new interesting project.

It may involved OCR - Optical CHaracter Recognition.

Perhaps I am not even using the right jargon!

Here is my query;

My client receives an A4 paper report once a month from a major (old-fashioned) government body.

Ideally, they would like to transfer this paper report into a meaningful spreadsheet.

Generally, if the print quality is good , is the above possible?
I should add one other point. While today's OCR is very good, it is NOT 100%. There are always issues like the number "0" and the upper case "O"; the number "1" and the lower case "l"; and words like "modern", where the "r" and the "n" can be nearly touching in a proportional font, thereby causing the OCR to think it's the word "modem".

When creating searchable PDF files (a primary usage of OCR these days), most users are willing to live with the occasional OCR error. But if you're creating spreadsheets where you expect the data to be 100% accurate, OCR alone won't do it. I like to quip that the good news of OCR is that it's 99% accurate, and the bad news of OCR is that it's 99% accurate. :)   This is why some folks, in some situations, use heads-down data entry instead of, or in conjunction with, OCR. Regards, Joe
Thanks Joe,

I was going to wait to see if anybody else had a contribution to make.
However, yours is so comprehensive that there is no need for any more.

Thanks again!
You're very welcome. Good luck on the project! Regards, Joe