OCR

516

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

Here is the Script courtesy of Brooks Duncan at Document Snap and MacSparky.

I have run it on about 500 files and it mostly runs OK but it has failed about 30 times. Do you think that the delay needs to be increased? Is there anywhere to look for why it failed?


tell application "PDFpenPro"
      open theFile as alias
      tell document 1
            ocr
            repeat while performing ocr
                  delay 1
            end repeat
            delay 1
            close with saving
      end tell
end tell
0
On Demand Webinar - Networking for the Cloud Era
LVL 9
On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

On a Mac Is there a way in Hazel to tell if a file has been ocr’d.
0
Hello and Good Morning Everyone,

          From a previously closed post, I found out I can scan a text document straight into MS Word by using either ABBY Finereader or Paperport.  At this point, I am interested in knowing which program would work best for achieving this goal.  Any shared thoughts, suggestions, or tips will be greatly appreciated.

          Thank you.

           George
0
I'm trying to  get OCR working using YAGF.  I read this is what Google used to scan books. I tried scanning a cookbook page and it gave me nothing.  Then I scanned my mom's harlequin romance book so there were no pictures.  That didn't work either.  Any guesses?  This is using ubuntu.
0
Is it possible to program the Raspberry PI camera to do OCR?
Need it to recognize a list of characters and when seen, write the info to a file.  
Need the following info with each entry;
Captured data
Date and time (down to the second)
Location (can be GPS or user entry)

And once a day (or at user request) upload data to a online database.
0
How to password-protect a PDF with free software
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This video explains how to do it with excellent, free software called PDF-XChange Editor from Tracker Software Products.

1. Download PDF-XChange Editor


Visit the PDF-XChange Editor section of the Tracker Software Products website:

http://www.tracker-software.com/product/pdf-xchange-editor

Click the white-on-green Download button for either product. It doesn't matter if you download PDF-XChange Editor or PDF-XChange Editor Plus, since you'll be selecting the Free Version when you install.

Step1

2. Run downloaded installer


Run the downloaded installer and select Free Version (unless, of course, you want more features and decide to purchase the Pro or Plus Version).

Step2

3. Open a non-secured PDF file in PDF-XChange Editor


Run PDF-XChange Editor and open a PDF file that does not currently have password protection on it.

Step3

4. Open Security section of Document Properties


Click File menu.

Click Document Properties.

Click Security category.

Step4

5. Open Password Security Settings dialog


Click Security Method drop-down.

Click Password Security.

Step5

6. Fill in Password Security Settings dialog


In Options section, select Compatibility from the drop-down and what you want encrypted via the radio buttons.

In Document Passwords section, enter password to open PDF and password to change permission settings.

In Permissions section, set Printing Allowed and Changing Allowed choices via the drop-downs; enable/disable content copying and
2
how do I upgrade from from pp 12.1 to 14.5
0
i included opencv and tesseract ocr in visual studio
#include<opencv2\core\core.hpp>
#include<opencv2\highgui\highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include<baseapi.h>
#include<allheaders.h>
#include<iostream>
#include <vector>
#include <fstream>
#define _CRT_SECURE_NO_WARNINGS
using namespace cv;
using namespace std;
tesseract::TessBaseAPI ocr;

int main()
{
   Mat input = imread("C:\\eurotext.tif",1);
   cvtColor( input, input, CV_BGR2GRAY );

  ocr.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);
 
  ocr.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
  ocr.SetImage(input.data, input.cols, input.rows, 1, input.step);
  char* text = ocr.GetUTF8Text();
  cout << "Text:" << endl;
  cout << text << endl;
  cout << "Confidence: " << ocr.MeanTextConf() << endl << endl;
  

}

Open in new window

the build was succeeded but when running

erreur_run.PNG
and


erreur_run2.PNG
0
i tried to add tesseract ocr to visual studio 2010
the build is succeded but when i run  there is an error 0xc0150002

0xc0150002.PNG
i tried to find th missing dll with dependency walker it shows

dependency.PNG
and

Error: The Side-by-Side configuration information for "c:\users\eouerten\documents\visual studio 2010\projects\tess_open\debug\LIBTESSERACT302D.DLL" contains errors. Lapplication na pas pu dmarrer car sa configuration cte--cte est incorrecte. Pour plus dinformations, consultez le journal dvnements dapplications ou utilisez loutil de ligne de commande sxstrace.exe (14001).
Error: The Side-by-Side configuration information for "c:\users\eouerten\documents\visual studio 2010\projects\tess_open\debug\LIBLEPT168D.DLL" contains errors. Lapplication na pas pu dmarrer car sa configuration cte--cte est incorrecte. Pour plus dinformations, consultez le journal dvnements dapplications ou utilisez loutil de ligne de commande sxstrace.exe (14001).
Error: Modules with different CPU types were found.
Warning: At least one module has an unresolved import due to a missing export function in a delay-load dependent module.

Open in new window

0
i included opencv  and tesseract ocr in visual studio 2010
#include<opencv2\core\core.hpp>
#include<opencv2\highgui\highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include<tesseract\baseapi.h>
#include<leptonica\allheaders.h>
#include<iostream>
#include <vector>
#include <fstream>
#define _CRT_SECURE_NO_WARNINGS
using namespace cv;
using namespace std;
tesseract::TessBaseAPI ocr;

int main()
{
   Mat input = imread("C:\Program Files (x86)\Tesseract-OCR");
	 cvtColor( input, input, CV_BGR2GRAY );

  ocr.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);
 
  ocr.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
  ocr.SetImage(input.data, input.cols, input.rows, 1, input.step);
  char* text = ocr.GetUTF8Text();
  cout << "Text:" << endl;
  cout << text << endl;
  cout << "Confidence: " << ocr.MeanTextConf() << endl << endl;
  

}

Open in new window


when i builded

 c:\program files (x86)\tesseract-ocr\include\leptonica\environ.h(277): warning C4005: 'snprintf' : macro redefinition
1>          c:\program files (x86)\tesseract-ocr\include\tesseract\platform.h(33) : see previous definition of 'snprintf'
1>c:\program files (x86)\tesseract-ocr\include\leptonica\pix.h(169): warning C4305: 'initializing' : truncation from 'double' to 'const l_float32'
1>c:\program files (x86)\tesseract-ocr\include\leptonica\pix.h(171): warning C4305: 'initializing' : truncation from 'double' to 'const l_float32'
1>tessopen.obj : warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/INCREMENTAL:NO' specification
1>  tess_open.vcxproj -> C:\Users\eouerten\documents\visual studio 2010\Projects\tess_open\Debug\tess_open.exe
1>FinalizeBuildStatus:
1>  Deleting file "Debug\tess_open.unsuccessfulbuild".
1>
1>Build succeeded.
1>
1>Time Elapsed 00:00:02.93
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========


an when running
0xc0150002.PNG
0
Free Tool: SSL Checker
LVL 9
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Can a Fujitsu ScanSnap iX500 use OCR to specifically NAME a file (PDF) as the value found when scanning?  For example, it we have a stack of invoices, all formatted the same with the Invoice Number in the same location, can we scan those and ask ScanSnap to name each individual scan page based on the value of the Invoice Number?  (I.e. Inv123.pdf, Inv456.pdf, Inv789.pdf)

Alternatively, I am more sure that we can make a PDF SEARCHABLE so that we could SEARCH within each document for a given invoice number... (i.e. search ALL PDFs that contain INV456)).  Only problem is, that would be slower than simply eyeing down a list of filenames...

Best way to scan and file similar documents?

Thank you!
0
- Numerous PDFs in a network folder, or to be pulled into the solution via a network scanner
- Need to read the bar code and extract the 5 pieces of data for indexing. OR, OCR portions of the page with the same data as in the bar code.
- Use this data to store the document for search and retrieval later - methods may vary. Would like documents placed into folders by the date in the bar code.
- Some sort of compression or load into a database is preferred to keep file size down.
- Windows or Linux based
- OpenSource only: I want to get my hands dirty with it.

Any ideas?
0
I had to reinstall PaperPort 14 but when I try to open it I get an error stating that it has stopped working. How do I eliminate this problem?
0
Nuance is offering Paperport Professional 14 for $59.99.  Info still shows no support for W10.  Will your 14.5 upgrade work on this one?
0
I have tried a simple OCR program from youtube. Here, i am getting 'type expected' ERROR near Graphics. please help in this regards. I am using Visual Studio 2012.

Imports Emgu.CV
Imports Emgu.Util
Imports Emgu.CV.OCR
Imports Emgu.CV.Structure
Public Class Form1
    Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_DEFAULT)
    Dim pic As Bitmap = New Bitmap(270, 100)
    Dim gfx As Graphics = Graphics.FromImage(pic)



    Private Sub Timer1_Tick(sender As Object, e As EventArgs) Handles Timer1.Tick
        gfx.CopyFromScreen(New Point(Me.Location.X + PictureBox1.Location.X + 4, PictureBox1.Location.Y + 30), New Point(0, 0), pic.Size)
        PictureBox1.Image = pic
        PictureBox1.Image = Nothing
    End Sub
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        OCRz.Recognize(New Image(Of Bgr, Byte)(pic))
        RichTextBox1.Text = OCRz.GetText
    End Sub
0
How to add page numbers to a PDF with Adobe Acrobat XI Pro
In a recent question here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial shows how to do it.

1. Click the Tools button


That will expose the Tools pane.

Step1

2. Click the Pages arrow


That will expand the Pages section.

Step2

3. Click the Header & Footer drop-down


That will show three menu choices.

Step3

4. Click the Add Header & Footer... menu item


You will now have the Add Header and Footer dialog.

Step4

5. Select the Page Number format


Click the Page Number and Date Format... link.

Step5

6. Select the font for the page number



Step6

7. Set other options


There are several other features in the dialog, including Appearance Options, Margin sizes, and Page Range Options.

8. Select the location for the page number


Click in one of these six boxes: Left Header Text, Center Header Text, Right Header Text, Left Footer Text, Center Footer Text, Right Footer Text.

Step8

9. Add the page numbers


Click the Insert Page Number button and then click OK. Note that it's also possible to insert a Date (and format it, too).

Step9
That's it! You now have page numbers in your PDF file. Remember to Save the file or do a Save As if you don't want to overwrite the original PDF.

If you find this video to be helpful, please click the thumbs-up icon below. Thank you for watching!
2
 
LVL 16

Administrative Comment

by:Kyle Santos
Congratulations.  Your video has been Accepted and is now published on Experts Exchange.  Feel free to share this video by selecting the social sharing icons to your left.
0
Hi Experts

Could you give a way on how to configure a "price reader device" - used in supermarkets f.e. ?

img_leitor

I guess that for the use it has to take a formated file with codebars/ prices, maybe the file format is  
defined by the device manufacturer. Once the device read this file (by USB f.e.) the device reader use could start.

Isn't it?

Could you clear?

Thanks in advance
0
Does anyone know of a way to increase the font size of PDF so that a 5pt document prints out in 10-12 pt font?  The attached page is 1 page of a 476 page document.

I tried converting it to Word and Excel, but those aren’t viable options when you think of the time it takes to OCR the doc.  Also, the output is horrible and there are so many possible mistakes, it doesn’t bare thinking of it as an option.

I don’t think there’s a way to do it, not even zooming with a copier works.
0
Hi Experts,

I'm looking for an easy to use Windows OCR Application that can convert small screen captures to text.

Here are a few sample screen captures.

sample screen capture
sample screen capture
sample screen capture
Regards,
Leigh
0
[Webinar] Learn How Hackers Steal Your Credentials
LVL 9
[Webinar] Learn How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

How can I convert an unsearchable PDF file into one that is?
0
Hi

I'm looking for a solution that will OCR and index scanned documents for retrieval later by searching for key works or numbers.

Are there any industry standard programs for this sort of thing?

Thanks
0
Has anyone used this software?  I'd appreciate any comments anyone can offer - I can't find a single review on the internet, which is odd.

Thanks!
0
Hello
  I am trying to convert a document from .ocr to Excel. I tried using adobe Pro but it did not work very well.
wordHS.docx
0
I am trying to install the software but it keeps freezing. Its not doing anything when I press the next button. Can any one please help me out?
screenshot.jpgscreenshot2.jpgscreenshot3.jpgpic1.jpgpic2.jpg
http://www.neat.com/helpcenter/download-neat-scanner-drivers/
0
I have a google document with some embedded images that have text content in them. Is there a way to convert the Google doc so all the images are converted to text so all I have is a text document?
0

OCR

516

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>