Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 594
  • Last Modified:

General question about OCR

I need to scan/convert pdf files to a database so that I can do word searches within the PDF. I used omnipage standard 18 to do the OCR but it doesnt do a good job and I have to fix every page I import which is ridiculous.

What can you guys recommend for this?
0
AquaJ9
Asked:
AquaJ9
1 Solution
 
Paul MacDonaldDirector, Information SystemsCommented:
How old is the software?  No OCR software is perfect, but newer versions are surprisingly good.  

If you can, change the settings in the software to improve accuracy.  This may slow your scan/conversion time but should reduce the need to manipulate every page.
0
 
Anthony RussoCommented:
It's not instant but Evernote has really good OCR.

www.evernote.com
0
 
AquaJ9Author Commented:
The omnipage 18 is the newest one and is rated number one on all the sites.
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Paul MacDonaldDirector, Information SystemsCommented:
Yes, Omnipage is one of the best for sure.  I'm not sure what else to say.  Make sure the scans are clean and give the software time to do its best.
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
I've been using OmniPage for many years with very good results. In addition, I use PaperPort (from the same company, Nuance). The OCR that is built into PaperPort utilizes the OmniPage SDK to create PDF Searchable Image files (PDFs with both an image and a layer of text created by OCR) and, again, the OCR produces very accurate results. So my suspicion is that either the source documents are of bad quality or you are scanning with parameters that are not good for OCR. Here a few tips that may help with these issues:

(1) My first attempt with any OCR software is always black&white (1-bit), 300 DPI, brightness (threshold) in the middle (50%).

(2) If the results aren't good with that, try adjusting the brightness up and down. For OCR accuracy, it's rare than anything is better than 300 DPI. There's a temptation to want to crank up the DPI, but this can actually hurt OCR rather than help.

(3) In an unexpected twist, recent versions of OmniPage do better with grayscale (8-bit) scanning that with B&W (1-bit). Images will be much larger, but the OCR accuracy may be much better. The thing about OmniPage's grayscale mode is that sets an automatic brightness level that handles poor quality source docs (and shading) very well. With disk space cheap these days and file size no longer the issue that it once was, this is worth a shot.

In summary, experiment with various scan settings to see if you can improve OmniPage's accuracy. But if you still can't get OmniPage to work well for you and you want to try another program, ABBYY FineReader is a good one, and in the same price range as OmniPage:
http://www.abbyy.com/

Regards, Joe
0
 
AquaJ9Author Commented:
Perfect answer. The omni page OCR now works amazingly by changing the settings as described in this post. The best tip was making the brightness more, awesome tip. Thank you so much.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now