OCR

522

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

What to do when PaperPort crashes, hangs, or fails to start
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video.

If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, it may be because of corrupt metadata (likely) or corrupt data files, such as bad PDFs (much less likely, but possible). This video Micro Tutorial shows how to use a utility called CheckPPFolders that ships with all releases of PaperPort 12 and PaperPort 14. CheckPPFolders is able to remove all PaperPort metadata, as well as identify problem files that may be causing PaperPort to crash, hang, or fail to start. PaperPort will rebuild the metadata, but there are two caveats. First, Folder Color and Folder Notes are in the MaxDesk.ini files, so you will lose those — and there's no easy way to retain the colors and notes. Thus, if you make heavy use of Folder Color and Folder Notes, you may want to uncheck them in the metadata cleaner dialog (see the third checkbox in the last screenshot in Step 2 below), especially since it's unlikely for those metadata files to be the culprit. Second, rebuilding all of the metadata is fast, except for the SearchVerity folders, which are the indexes for All-in-One Search. Rebuilding those can take a very long time, so you may want to try not removing them, at least the first time that you run CheckPPFolders (see the fourth checkbox in the last screenshot in Step 2 below).

1. Find the CheckPPFolders.exe file

1
Hire Technology Freelancers with Gigs
LVL 10
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Hello,

What factors determine the quality and accuracy of OCR (optical character recognition) and is there much variability among different OCR software?

If there is variability in software, what applications are best (both free and purchased)?

Thanks
0
Extract Text From Images?i have many images,i searched and found some online convertors but doesn't work becouse i have 10.000 image,so i need a mass tool,can someone help me with this,thank you
0
due to a recent hard drive replacement, I've lost the scanner tool and sharpdesk desktop software for scanning from our AR-M257 to my desktop unit. Is there a current software that I can download and install. I cannot find any disks in house for a reinstall ?
0
I have a client that has no electronic backup files for hard copies of booklets that require periodic changes. In fact, the problem is we don't know what format the original was created in. Some of these books have been around for years. We have a an old xerox scanner utilizing 'Document and Scan MakeReady' version 3.0.0.16 software. We don't have the CD. It doesn't appear to be available - I assume it's really old. It is running on a Windows 2000 Pro PC. The value of this existing software, is it's ability to allow edits from a received  scan document without changing the original format. At this point - after edits, it can be printed prior to having to save it to a file format , like PDF or DOC. In other words, it allows you to cut/paste changes without altering the original format.

Does anybody have a low cost solution providing the above functionality? Is there an OCR out there that will allow you to make changes prior to committing to a file format.
0
Here is the Script courtesy of Brooks Duncan at Document Snap and MacSparky.

I have run it on about 500 files and it mostly runs OK but it has failed about 30 times. Do you think that the delay needs to be increased? Is there anywhere to look for why it failed?


tell application "PDFpenPro"
      open theFile as alias
      tell document 1
            ocr
            repeat while performing ocr
                  delay 1
            end repeat
            delay 1
            close with saving
      end tell
end tell
0
On a Mac Is there a way in Hazel to tell if a file has been ocr’d.
0
Hello and Good Morning Everyone,

          From a previously closed post, I found out I can scan a text document straight into MS Word by using either ABBY Finereader or Paperport.  At this point, I am interested in knowing which program would work best for achieving this goal.  Any shared thoughts, suggestions, or tips will be greatly appreciated.

          Thank you.

           George
0
I'm trying to  get OCR working using YAGF.  I read this is what Google used to scan books. I tried scanning a cookbook page and it gave me nothing.  Then I scanned my mom's harlequin romance book so there were no pictures.  That didn't work either.  Any guesses?  This is using ubuntu.
0
Is it possible to program the Raspberry PI camera to do OCR?
Need it to recognize a list of characters and when seen, write the info to a file.  
Need the following info with each entry;
Captured data
Date and time (down to the second)
Location (can be GPS or user entry)

And once a day (or at user request) upload data to a online database.
0
[Webinar] Lessons on Recovering from Petya
LVL 10
[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

How to password-protect a PDF with free software
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This video explains how to do it with excellent, free software called PDF-XChange Editor from Tracker Software Products.

1. Download PDF-XChange Editor


Visit the PDF-XChange Editor section of the Tracker Software Products website:

http://www.tracker-software.com/product/pdf-xchange-editor

Click the white-on-green Download button for either product. It doesn't matter if you download PDF-XChange Editor or PDF-XChange Editor Plus, since you'll be selecting the Free Version when you install.

Step1

2. Run downloaded installer


Run the downloaded installer and select Free Version (unless, of course, you want more features and decide to purchase the Pro or Plus Version).

Step2

3. Open a non-secured PDF file in PDF-XChange Editor


Run PDF-XChange Editor and open a PDF file that does not currently have password protection on it.

Step3

4. Open Security section of Document Properties


Click File menu.

Click Document Properties.

Click Security category.

Step4

5. Open Password Security Settings dialog


Click Security Method drop-down.

Click Password Security.

Step5

6. Fill in Password Security Settings dialog


In Options section, select Compatibility from the drop-down and what you want encrypted via the radio buttons.

In Document Passwords section, enter password to open PDF and password to change permission settings.

In Permissions section, set Printing Allowed and Changing Allowed choices via the drop-downs; enable/disable content copying and
2
 
LVL 4

Expert Comment

by:Stephen Kairys
OK, maybe there's a bug in the software. After I click YES to confirm, the program, on its own, reprompts for the password.
password problem
Thanks,.
0
 
LVL 56

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Ah, now I see! Here's what's happening. There are two types of passwords for PDFs — Owner Password and User Password. The User Password is what's needed to open the file. The Owner Password is what's needed to set permissions/restrictions (and it may also be used to open the file). Your PDF file has an Owner Password on it — do you know what it is? If you open the file with the User Password, you will get the prompt that you posted for the Owner Password when trying to change security (or when changing any permissions/restrictions). If you open the file with the Owner Password, you will not get a prompt for the Owner Password when trying to change security (or when changing any permissions/restrictions). Note that you have a choice when opening the file of entering either the User Password or the Owner Password:

enter user or owner password
Regards, Joe
0
how do I upgrade from from pp 12.1 to 14.5
0
i included opencv and tesseract ocr in visual studio
#include<opencv2\core\core.hpp>
#include<opencv2\highgui\highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include<baseapi.h>
#include<allheaders.h>
#include<iostream>
#include <vector>
#include <fstream>
#define _CRT_SECURE_NO_WARNINGS
using namespace cv;
using namespace std;
tesseract::TessBaseAPI ocr;

int main()
{
   Mat input = imread("C:\\eurotext.tif",1);
   cvtColor( input, input, CV_BGR2GRAY );

  ocr.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);
 
  ocr.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
  ocr.SetImage(input.data, input.cols, input.rows, 1, input.step);
  char* text = ocr.GetUTF8Text();
  cout << "Text:" << endl;
  cout << text << endl;
  cout << "Confidence: " << ocr.MeanTextConf() << endl << endl;
  

}

Open in new window

the build was succeeded but when running

erreur_run.PNG
and


erreur_run2.PNG
0
i tried to add tesseract ocr to visual studio 2010
the build is succeded but when i run  there is an error 0xc0150002

0xc0150002.PNG
i tried to find th missing dll with dependency walker it shows

dependency.PNG
and

Error: The Side-by-Side configuration information for "c:\users\eouerten\documents\visual studio 2010\projects\tess_open\debug\LIBTESSERACT302D.DLL" contains errors. Lapplication na pas pu dmarrer car sa configuration cte--cte est incorrecte. Pour plus dinformations, consultez le journal dvnements dapplications ou utilisez loutil de ligne de commande sxstrace.exe (14001).
Error: The Side-by-Side configuration information for "c:\users\eouerten\documents\visual studio 2010\projects\tess_open\debug\LIBLEPT168D.DLL" contains errors. Lapplication na pas pu dmarrer car sa configuration cte--cte est incorrecte. Pour plus dinformations, consultez le journal dvnements dapplications ou utilisez loutil de ligne de commande sxstrace.exe (14001).
Error: Modules with different CPU types were found.
Warning: At least one module has an unresolved import due to a missing export function in a delay-load dependent module.

Open in new window

0
i included opencv  and tesseract ocr in visual studio 2010
#include<opencv2\core\core.hpp>
#include<opencv2\highgui\highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include<tesseract\baseapi.h>
#include<leptonica\allheaders.h>
#include<iostream>
#include <vector>
#include <fstream>
#define _CRT_SECURE_NO_WARNINGS
using namespace cv;
using namespace std;
tesseract::TessBaseAPI ocr;

int main()
{
   Mat input = imread("C:\Program Files (x86)\Tesseract-OCR");
	 cvtColor( input, input, CV_BGR2GRAY );

  ocr.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);
 
  ocr.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
  ocr.SetImage(input.data, input.cols, input.rows, 1, input.step);
  char* text = ocr.GetUTF8Text();
  cout << "Text:" << endl;
  cout << text << endl;
  cout << "Confidence: " << ocr.MeanTextConf() << endl << endl;
  

}

Open in new window


when i builded

 c:\program files (x86)\tesseract-ocr\include\leptonica\environ.h(277): warning C4005: 'snprintf' : macro redefinition
1>          c:\program files (x86)\tesseract-ocr\include\tesseract\platform.h(33) : see previous definition of 'snprintf'
1>c:\program files (x86)\tesseract-ocr\include\leptonica\pix.h(169): warning C4305: 'initializing' : truncation from 'double' to 'const l_float32'
1>c:\program files (x86)\tesseract-ocr\include\leptonica\pix.h(171): warning C4305: 'initializing' : truncation from 'double' to 'const l_float32'
1>tessopen.obj : warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/INCREMENTAL:NO' specification
1>  tess_open.vcxproj -> C:\Users\eouerten\documents\visual studio 2010\Projects\tess_open\Debug\tess_open.exe
1>FinalizeBuildStatus:
1>  Deleting file "Debug\tess_open.unsuccessfulbuild".
1>
1>Build succeeded.
1>
1>Time Elapsed 00:00:02.93
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========


an when running
0xc0150002.PNG
0
Can a Fujitsu ScanSnap iX500 use OCR to specifically NAME a file (PDF) as the value found when scanning?  For example, it we have a stack of invoices, all formatted the same with the Invoice Number in the same location, can we scan those and ask ScanSnap to name each individual scan page based on the value of the Invoice Number?  (I.e. Inv123.pdf, Inv456.pdf, Inv789.pdf)

Alternatively, I am more sure that we can make a PDF SEARCHABLE so that we could SEARCH within each document for a given invoice number... (i.e. search ALL PDFs that contain INV456)).  Only problem is, that would be slower than simply eyeing down a list of filenames...

Best way to scan and file similar documents?

Thank you!
0
- Numerous PDFs in a network folder, or to be pulled into the solution via a network scanner
- Need to read the bar code and extract the 5 pieces of data for indexing. OR, OCR portions of the page with the same data as in the bar code.
- Use this data to store the document for search and retrieval later - methods may vary. Would like documents placed into folders by the date in the bar code.
- Some sort of compression or load into a database is preferred to keep file size down.
- Windows or Linux based
- OpenSource only: I want to get my hands dirty with it.

Any ideas?
0
I had to reinstall PaperPort 14 but when I try to open it I get an error stating that it has stopped working. How do I eliminate this problem?
0
Nuance is offering Paperport Professional 14 for $59.99.  Info still shows no support for W10.  Will your 14.5 upgrade work on this one?
0
Enroll in October's Free Course of the Month
LVL 10
Enroll in October's Free Course of the Month

Do you work with and analyze data? Enroll in October's Course of the Month for 7+ hours of SQL training, allowing you to quickly and efficiently store or retrieve data. It's free for Premium Members, Team Accounts, and Qualified Experts!

I have tried a simple OCR program from youtube. Here, i am getting 'type expected' ERROR near Graphics. please help in this regards. I am using Visual Studio 2012.

Imports Emgu.CV
Imports Emgu.Util
Imports Emgu.CV.OCR
Imports Emgu.CV.Structure
Public Class Form1
    Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_DEFAULT)
    Dim pic As Bitmap = New Bitmap(270, 100)
    Dim gfx As Graphics = Graphics.FromImage(pic)



    Private Sub Timer1_Tick(sender As Object, e As EventArgs) Handles Timer1.Tick
        gfx.CopyFromScreen(New Point(Me.Location.X + PictureBox1.Location.X + 4, PictureBox1.Location.Y + 30), New Point(0, 0), pic.Size)
        PictureBox1.Image = pic
        PictureBox1.Image = Nothing
    End Sub
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        OCRz.Recognize(New Image(Of Bgr, Byte)(pic))
        RichTextBox1.Text = OCRz.GetText
    End Sub
0
How to add page numbers to a PDF with Adobe Acrobat XI Pro
In a recent question here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial shows how to do it.

1. Click the Tools button


That will expose the Tools pane.

Step1

2. Click the Pages arrow


That will expand the Pages section.

Step2

3. Click the Header & Footer drop-down


That will show three menu choices.

Step3

4. Click the Add Header & Footer... menu item


You will now have the Add Header and Footer dialog.

Step4

5. Select the Page Number format


Click the Page Number and Date Format... link.

Step5

6. Select the font for the page number



Step6

7. Set other options


There are several other features in the dialog, including Appearance Options, Margin sizes, and Page Range Options.

8. Select the location for the page number


Click in one of these six boxes: Left Header Text, Center Header Text, Right Header Text, Left Footer Text, Center Footer Text, Right Footer Text.

Step8

9. Add the page numbers


Click the Insert Page Number button and then click OK. Note that it's also possible to insert a Date (and format it, too).

Step9
That's it! You now have page numbers in your PDF file. Remember to Save the file or do a Save As if you don't want to overwrite the original PDF.

If you find this video to be helpful, please click the thumbs-up icon below. Thank you for watching!
2
 
LVL 17

Administrative Comment

by:Kyle Santos
Congratulations.  Your video has been Accepted and is now published on Experts Exchange.  Feel free to share this video by selecting the social sharing icons to your left.
0
Hi Experts

Could you give a way on how to configure a "price reader device" - used in supermarkets f.e. ?

img_leitor

I guess that for the use it has to take a formated file with codebars/ prices, maybe the file format is  
defined by the device manufacturer. Once the device read this file (by USB f.e.) the device reader use could start.

Isn't it?

Could you clear?

Thanks in advance
0
Does anyone know of a way to increase the font size of PDF so that a 5pt document prints out in 10-12 pt font?  The attached page is 1 page of a 476 page document.

I tried converting it to Word and Excel, but those aren’t viable options when you think of the time it takes to OCR the doc.  Also, the output is horrible and there are so many possible mistakes, it doesn’t bare thinking of it as an option.

I don’t think there’s a way to do it, not even zooming with a copier works.
0
Hi Experts,

I'm looking for an easy to use Windows OCR Application that can convert small screen captures to text.

Here are a few sample screen captures.

sample screen capture
sample screen capture
sample screen capture
Regards,
Leigh
0
How can I convert an unsearchable PDF file into one that is?
0

OCR

522

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>