Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

General question about OCR

Posted on 2012-04-10
6
Medium Priority
?
582 Views
Last Modified: 2012-04-14
I need to scan/convert pdf files to a database so that I can do word searches within the PDF. I used omnipage standard 18 to do the OCR but it doesnt do a good job and I have to fix every page I import which is ridiculous.

What can you guys recommend for this?
0
Comment
Question by:AquaJ9
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 34

Expert Comment

by:Paul MacDonald
ID: 37828848
How old is the software?  No OCR software is perfect, but newer versions are surprisingly good.  

If you can, change the settings in the software to improve accuracy.  This may slow your scan/conversion time but should reduce the need to manipulate every page.
0
 
LVL 17

Expert Comment

by:Anthony Russo
ID: 37828854
It's not instant but Evernote has really good OCR.

www.evernote.com
0
 

Author Comment

by:AquaJ9
ID: 37828859
The omnipage 18 is the newest one and is rated number one on all the sites.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 34

Expert Comment

by:Paul MacDonald
ID: 37829113
Yes, Omnipage is one of the best for sure.  I'm not sure what else to say.  Make sure the scans are clean and give the software time to do its best.
0
 
LVL 55

Accepted Solution

by:
Joe Winograd, EE MVE 2015&2016 earned 2000 total points
ID: 37833273
I've been using OmniPage for many years with very good results. In addition, I use PaperPort (from the same company, Nuance). The OCR that is built into PaperPort utilizes the OmniPage SDK to create PDF Searchable Image files (PDFs with both an image and a layer of text created by OCR) and, again, the OCR produces very accurate results. So my suspicion is that either the source documents are of bad quality or you are scanning with parameters that are not good for OCR. Here a few tips that may help with these issues:

(1) My first attempt with any OCR software is always black&white (1-bit), 300 DPI, brightness (threshold) in the middle (50%).

(2) If the results aren't good with that, try adjusting the brightness up and down. For OCR accuracy, it's rare than anything is better than 300 DPI. There's a temptation to want to crank up the DPI, but this can actually hurt OCR rather than help.

(3) In an unexpected twist, recent versions of OmniPage do better with grayscale (8-bit) scanning that with B&W (1-bit). Images will be much larger, but the OCR accuracy may be much better. The thing about OmniPage's grayscale mode is that sets an automatic brightness level that handles poor quality source docs (and shading) very well. With disk space cheap these days and file size no longer the issue that it once was, this is worth a shot.

In summary, experiment with various scan settings to see if you can improve OmniPage's accuracy. But if you still can't get OmniPage to work well for you and you want to try another program, ABBYY FineReader is a good one, and in the same price range as OmniPage:
http://www.abbyy.com/

Regards, Joe
0
 

Author Closing Comment

by:AquaJ9
ID: 37845888
Perfect answer. The omni page OCR now works amazingly by changing the settings as described in this post. The best tip was making the brightness more, awesome tip. Thank you so much.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The advancement in technology has been a great source of betterment and empowerment for the human race, Nevertheless, this is not to say that technology doesn’t have any problems. We are bombarded with constant distractions, whether as an overload o…
The core idea of this article is to make you acquainted with the best way in which you can export Exchange mailbox to PST format.
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question