Link to home
Create AccountLog in
Avatar of Patrick O'Dea
Patrick O'DeaFlag for Ireland

asked on

Will This "OCR" into Excel - How accurately?

Hi ,

This is a follow up to a query I made yesterday.

See attachment.

Bear in mind I have no OCR/scanning software.

However, does anyone know if I could could use OCR to transfer a 20 page PDF into excel.

 See my attachment which is the exact PDF format.   I accept that some manual tidying might be involved.

I guess what I am asking is how long would it take (ish??) to convert 20 pages of my attachment into accurate excel data.

How accurate would the excel data be?

I accept that there is no absolute answer!
Avatar of chrismanncalgavin

The issue you may have is there is a background / coating to the paper which always throws OCR software off and makes inaccurate results. Ideally a white background and clear print.

We use a program called ABBYY Finereader which does a pretty good job, and should also allow Excel exports. It's not too expensive.

If you can get the original copy to be of a sharp and white background, and it is not a "copy of a copy" that is faded through age, you have a good chance.

As to how accurate it would be, I would certainly want to give it a thorough check over afterwards as OCR I find is NEVER 100% accurate, however good the program.
Certainly fixing the problems it causes would be quicker usually than typing the data manually though.
Avatar of Patrick O'Dea


Thanks Chris,
Actually the background shading is probably less of a problem (it is an Iphone photo that was forwarded to me).

The source of my data will actually be original A4 paper (I think).
Presumably this will increase my accuracy.
Yes, sounds ok then. You can always play with your scanner and get the contrast settings adjusted to improve things a bit.
Only way to know is to try one first and see, or if you know someone with OCR who can test it before you commit to buying software and scanner.

Ensure you scan at 300dpi and greyscale or black and white, should give good quality.
Also worth getting a scanner with an automatic feeder if you plan to do more in the future, makes much faster copying.
If you had a high quality scan, I could run it through our OCR here to see if it works just as a test.
Thanks for the offer,

I am expecting an mail in the next few minutes hopefully.
There will be a one page attachment which I will load onto EE.

It would be great for me if you could magically convert it to excel!

(My bottom line here is that I am trying to reconcile two sets of data.  One already in Excel and the other is on paper.  So if I could get the paper one onto Excel then  .. giant leap forward.   I accept that the accuracy would not be 100%.  I also believe that the paper may be grey or blueish - which may not help. )

Actually, I have just received the document.
However, the quality is dissapointingly poor.

This may be because it was send by fax - possible an old one.

See attached.  Can this be OCR'd??
Is it worth you trying?

If this is no use then I would expect to have an original in my hand in a few days.

Thanks again.
Another version - possible better - but not great.
Avatar of chrismanncalgavin

Link to home
Create an account to see this answer
Signing up is free. No credit card required.
Create Account

I am probably wasting my time with this poor quality fax.

I will close call and perhaps raise a new one in a few days.

I should have an original one by then.

You have been very helpful!
Avatar of Joe Winograd
Hi Dewsbury,
Me again. :)   This whole thread took place while I was still sleeping here in the USA Central time zone, so that's why I didn't offer any help this time. Sorry about that. But I'm in agreement with the conclusion reached, and when you do get a good quality document, I'll make the same offer as @chrismanncalgavin and run it through some OCR packages. As I mentioned in our previous thread, I have ABBYY FineReader (an older version bundled with a scanner), Nuance OmniPage Pro 18 (the latest version), and Nuance PaperPort Pro 14 (also the latest version, which has the OmniPage OCR engine built into it). As I also mentioned in our previous thread, all three packages can go straight to Excel.

Btw, I did run your PDF and JPG through the latest versions of OmniPage and PaperPort. As predicted, the results are terrible, and, as @chrismanncalgavin stated, that's because the quality of the source document is so poor. Looking forward to seeing a high quality version of it. Regards, Joe