Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x

OCR

532

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

I am considering making a site that can auto-analyze a certain type of uploaded report, and instantly display the results as a PDF. There are various steps involved in the creation of the PDF and I want a feeling for the effort and technology needed for each step.

There are three different steps I will discuss here to see where I can use WordPress plugins and where I need to customize the functionality.

The uploaded report would be a merchant's monthly credit card statement, like the following snippet..

Statement Example
1) So, for the first of three steps, I need a WordPress OCR plug-in. Are there many options for that? Is the angle of the text a problem? I can not guarantee neatness. (I added the underlining to make it easier for me to read)

I imagine allowing an authorized user to upload a report. And i need this plug-in to convert images to some form of digital data, like a PDF or a CSV file.

2) I need a way to analyze that data, and wonder if there is a configurable WordPress plugin for this? It will query the items by the Description, then use the numeric values in the Number, Amount and Total columns for mathematical computations. There will be some mathematical steps performed on some of the data as it generates the output for the report.

The results should go into some format, like a CSV file

3) I need a report tool which can import the data results from Step #2 and apply them to various pre-designed fields in the final pre-designed …
0
Become an Android App Developer
LVL 11
Become an Android App Developer

Ready to kick start your career in 2018? Learn how to build an Android app in January’s Course of the Month and open the door to new opportunities.

I need a tool I can use to digitize a report, like the one attached here...

Report
Will this kind of report get a 100% successful conversion rate?

Eventually, I need the tool to be part of my website, but I have not, as yet, chosen my back-end technology. For now, a simple Mac based tool is fine, just so I can hand convert a report that I can start to use in my programming of the back-end.

Windows is okay, if there are limited Mac FREE versions.

I do have Office 365 (Mac) if there is a tool in there which I can use.

I am also interested in hearing what "plug-ins" can work when I deploy this to my website, for online OCR conversions.
0
Security/Privacy related question.  Can text be detected in say a .jpg, . bmp, etc. type file formats?  I know using text within .pdfs can be with OCR.  When I say "detected" I mean with use of a SIEM, DLP or other event driven software?  Not referring to steganography or obfuscation of text in anyway.  Just simple text detection in a jpg or bmp format.  Much thanks.
0
I had this question after viewing PDFTK - filling this PDF but got an error.

After talking with the others on this project, we decided it's ok to have the final PDF as read-only and it doesn't need to be editable.

I ran the commands below I can populate the PDF but not the checkbox. Joe (if you're reading this)....is this because of the LiveCycle issue that the checkboxes don't get checked?  If it is, I got approval to buy LiveCycle. I'll get it and see what's going on.

I tried "No" for value, "On", "1" but I don't see the checkbox checked.

1. i-765 is the orig file

2. notsigned.pdf is the file I QPDF-ed to get rid of the password error message

3. Ran this pdftk.exe notsigned.pdf fill_form i-765.txt output OutputFilled.pdf

4. outputfilled.pdf is the populated PDF.
i-765.pdf
notsigned.pdf
OutputFilled.pdf
0
Hi, do you have any shortcut for avoiding special characters in textbox?
im using this manual code:

Public Class Form1
    Private Sub TextBox1_TextChanged(sender As Object, e As EventArgs) Handles TextBox1.TextChanged
        If TextBox1.ToString.Contains("`") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("~") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("!") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("@") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("#") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("$") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("%") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("^") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("*") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("&") Then

Open in new window

0
I'm hoping there's a solution for this problem.

1. I have the FDF that I want to use to populate a PDF. It's attached. It's i-765.txt

2. I have the PDF file. I got it from the INS site. It's attached here and called i-765.pdf

3. I ran this command
   pdftk.exe i-765.pdf fill_form i-765.txt output Output.pdf

but got this error

Error: Failed to open PDF file:
   i-765.pdf
   OWNER PASSWORD REQUIRED, but not given (or incorrect)
Errors encountered.  No output created.
Done.  Input errors, so no output created.


4. I googled the error and came across this solution saying I need to run qpdf Solution to get rid of the error

    a. I downloaded QPDF from here Download QPDF
    b.  It got installed in folder:  C:\Users\bwa\Downloads\qpdf-7.0.0-bin-mingw32\qpdf-7.0.0\bin
    c. I copied i-765.pdf to that folder
    d. Ran this command
qpdf --decrypt i-765.pdf decrypted765.pdf

Open in new window

   e. Now, I have decrypte765.pdf. I open it and get this message. I click ok on it and the PDF is read-only
        Message I get    f.  I ran this command to get rid of the message
pdftk decryptedi765.pdf cat output i-765notsigned.pdf

Open in new window

   g. I open i765notsigned.pdf and there's no error and the fields are editable.
    However,  I noticed this: The functionality of the …
0
I need OCR software that runs on a Mac, with the scanned images coming from my HP LaserJet MFP M127fw.

What options do I have?

Thanks
0
Hello Experts,

I need an OCR (Optical Character Recognition) App to scan business card and directly update into outlook/exchange address book.

Thank you very much in advance.
0
I'm using ABBY OCR https://www.abbyy.com/en-us/  to read PDF files and parse some fields from it and then using C# code, I store the data in the database.

This is what I want to do:

I have some PDF files but they're forms. As it is now, we download the form and fill the form. For example, fill first name, last name, address, etc. Then we scan it or fax it whatever.

I want to read the entire PDF file and display it in a web page. Along with the image (if it exists).  Then, user fills the form online. For example, read the PDF file using OCR and display the same exact file on a web page, then save the entire form, print, etc.

Is this doable?

Edit: Maybe I should have the same fields the PDF form has on a web page. User fills the fields and somehow I plug those field values into the PDF?

Edit: I've attached a sample PDF form i like to process.
i-765.pdf
0
What to do when PaperPort crashes, hangs, or fails to start
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video.

If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, it may be because of corrupt metadata (likely) or corrupt data files, such as bad PDFs (much less likely, but possible). This video Micro Tutorial shows how to use a utility called CheckPPFolders that ships with all releases of PaperPort 12 and PaperPort 14. CheckPPFolders is able to remove all PaperPort metadata, as well as identify problem files that may be causing PaperPort to crash, hang, or fail to start. PaperPort will rebuild the metadata, but there are two caveats. First, Folder Color and Folder Notes are in the MaxDesk.ini files, so you will lose those — and there's no easy way to retain the colors and notes. Thus, if you make heavy use of Folder Color and Folder Notes, you may want to uncheck them in the metadata cleaner dialog (see the third checkbox in the last screenshot in Step 2 below), especially since it's unlikely for those metadata files to be the culprit. Second, rebuilding all of the metadata is fast, except for the SearchVerity folders, which are the indexes for All-in-One Search. Rebuilding those can take a very long time, so you may want to try not removing them, at least the first time that you run CheckPPFolders (see the fourth checkbox in the last screenshot in Step 2 below).

1. Find the CheckPPFolders.exe file

1
Free Tool: ZipGrep
LVL 11
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Hello,

What factors determine the quality and accuracy of OCR (optical character recognition) and is there much variability among different OCR software?

If there is variability in software, what applications are best (both free and purchased)?

Thanks
0
Extract Text From Images?i have many images,i searched and found some online convertors but doesn't work becouse i have 10.000 image,so i need a mass tool,can someone help me with this,thank you
0
due to a recent hard drive replacement, I've lost the scanner tool and sharpdesk desktop software for scanning from our AR-M257 to my desktop unit. Is there a current software that I can download and install. I cannot find any disks in house for a reinstall ?
0
I have a client that has no electronic backup files for hard copies of booklets that require periodic changes. In fact, the problem is we don't know what format the original was created in. Some of these books have been around for years. We have a an old xerox scanner utilizing 'Document and Scan MakeReady' version 3.0.0.16 software. We don't have the CD. It doesn't appear to be available - I assume it's really old. It is running on a Windows 2000 Pro PC. The value of this existing software, is it's ability to allow edits from a received  scan document without changing the original format. At this point - after edits, it can be printed prior to having to save it to a file format , like PDF or DOC. In other words, it allows you to cut/paste changes without altering the original format.

Does anybody have a low cost solution providing the above functionality? Is there an OCR out there that will allow you to make changes prior to committing to a file format.
0
Here is the Script courtesy of Brooks Duncan at Document Snap and MacSparky.

I have run it on about 500 files and it mostly runs OK but it has failed about 30 times. Do you think that the delay needs to be increased? Is there anywhere to look for why it failed?


tell application "PDFpenPro"
      open theFile as alias
      tell document 1
            ocr
            repeat while performing ocr
                  delay 1
            end repeat
            delay 1
            close with saving
      end tell
end tell
0
On a Mac Is there a way in Hazel to tell if a file has been ocr’d.
0
Hello and Good Morning Everyone,

          From a previously closed post, I found out I can scan a text document straight into MS Word by using either ABBY Finereader or Paperport.  At this point, I am interested in knowing which program would work best for achieving this goal.  Any shared thoughts, suggestions, or tips will be greatly appreciated.

          Thank you.

           George
0
I'm trying to  get OCR working using YAGF.  I read this is what Google used to scan books. I tried scanning a cookbook page and it gave me nothing.  Then I scanned my mom's harlequin romance book so there were no pictures.  That didn't work either.  Any guesses?  This is using ubuntu.
0
Is it possible to program the Raspberry PI camera to do OCR?
Need it to recognize a list of characters and when seen, write the info to a file.  
Need the following info with each entry;
Captured data
Date and time (down to the second)
Location (can be GPS or user entry)

And once a day (or at user request) upload data to a online database.
0
Free Tool: IP Lookup
LVL 11
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

How to password-protect a PDF with free software
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This video explains how to do it with excellent, free software called PDF-XChange Editor from Tracker Software Products.

1. Download PDF-XChange Editor


Visit the PDF-XChange Editor section of the Tracker Software Products website:

http://www.tracker-software.com/product/pdf-xchange-editor

Click the white-on-green Download button for either product. It doesn't matter if you download PDF-XChange Editor or PDF-XChange Editor Plus, since you'll be selecting the Free Version when you install.

Step1

2. Run downloaded installer


Run the downloaded installer and select Free Version (unless, of course, you want more features and decide to purchase the Pro or Plus Version).

Step2

3. Open a non-secured PDF file in PDF-XChange Editor


Run PDF-XChange Editor and open a PDF file that does not currently have password protection on it.

Step3

4. Open Security section of Document Properties


Click File menu.

Click Document Properties.

Click Security category.

Step4

5. Open Password Security Settings dialog


Click Security Method drop-down.

Click Password Security.

Step5

6. Fill in Password Security Settings dialog


In Options section, select Compatibility from the drop-down and what you want encrypted via the radio buttons.

In Document Passwords section, enter password to open PDF and password to change permission settings.

In Permissions section, set Printing Allowed and Changing Allowed choices via the drop-downs; enable/disable content copying and
2
 
LVL 4

Expert Comment

by:Stephen Kairys
OK, maybe there's a bug in the software. After I click YES to confirm, the program, on its own, reprompts for the password.
password problem
Thanks,.
0
 
LVL 57

Author Comment

by:Joe Winograd, EE MVE 2015&2016
Ah, now I see! Here's what's happening. There are two types of passwords for PDFs — Owner Password and User Password. The User Password is what's needed to open the file. The Owner Password is what's needed to set permissions/restrictions (and it may also be used to open the file). Your PDF file has an Owner Password on it — do you know what it is? If you open the file with the User Password, you will get the prompt that you posted for the Owner Password when trying to change security (or when changing any permissions/restrictions). If you open the file with the Owner Password, you will not get a prompt for the Owner Password when trying to change security (or when changing any permissions/restrictions). Note that you have a choice when opening the file of entering either the User Password or the Owner Password:

enter user or owner password
Regards, Joe
0
how do I upgrade from from pp 12.1 to 14.5
0
i included opencv and tesseract ocr in visual studio
#include<opencv2\core\core.hpp>
#include<opencv2\highgui\highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include<baseapi.h>
#include<allheaders.h>
#include<iostream>
#include <vector>
#include <fstream>
#define _CRT_SECURE_NO_WARNINGS
using namespace cv;
using namespace std;
tesseract::TessBaseAPI ocr;

int main()
{
   Mat input = imread("C:\\eurotext.tif",1);
   cvtColor( input, input, CV_BGR2GRAY );

  ocr.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);
 
  ocr.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
  ocr.SetImage(input.data, input.cols, input.rows, 1, input.step);
  char* text = ocr.GetUTF8Text();
  cout << "Text:" << endl;
  cout << text << endl;
  cout << "Confidence: " << ocr.MeanTextConf() << endl << endl;
  

}

Open in new window

the build was succeeded but when running

erreur_run.PNG
and


erreur_run2.PNG
0
i tried to add tesseract ocr to visual studio 2010
the build is succeded but when i run  there is an error 0xc0150002

0xc0150002.PNG
i tried to find th missing dll with dependency walker it shows

dependency.PNG
and

Error: The Side-by-Side configuration information for "c:\users\eouerten\documents\visual studio 2010\projects\tess_open\debug\LIBTESSERACT302D.DLL" contains errors. Lapplication na pas pu dmarrer car sa configuration cte--cte est incorrecte. Pour plus dinformations, consultez le journal dvnements dapplications ou utilisez loutil de ligne de commande sxstrace.exe (14001).
Error: The Side-by-Side configuration information for "c:\users\eouerten\documents\visual studio 2010\projects\tess_open\debug\LIBLEPT168D.DLL" contains errors. Lapplication na pas pu dmarrer car sa configuration cte--cte est incorrecte. Pour plus dinformations, consultez le journal dvnements dapplications ou utilisez loutil de ligne de commande sxstrace.exe (14001).
Error: Modules with different CPU types were found.
Warning: At least one module has an unresolved import due to a missing export function in a delay-load dependent module.

Open in new window

0
i included opencv  and tesseract ocr in visual studio 2010
#include<opencv2\core\core.hpp>
#include<opencv2\highgui\highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include<tesseract\baseapi.h>
#include<leptonica\allheaders.h>
#include<iostream>
#include <vector>
#include <fstream>
#define _CRT_SECURE_NO_WARNINGS
using namespace cv;
using namespace std;
tesseract::TessBaseAPI ocr;

int main()
{
   Mat input = imread("C:\Program Files (x86)\Tesseract-OCR");
	 cvtColor( input, input, CV_BGR2GRAY );

  ocr.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);
 
  ocr.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
  ocr.SetImage(input.data, input.cols, input.rows, 1, input.step);
  char* text = ocr.GetUTF8Text();
  cout << "Text:" << endl;
  cout << text << endl;
  cout << "Confidence: " << ocr.MeanTextConf() << endl << endl;
  

}

Open in new window


when i builded

 c:\program files (x86)\tesseract-ocr\include\leptonica\environ.h(277): warning C4005: 'snprintf' : macro redefinition
1>          c:\program files (x86)\tesseract-ocr\include\tesseract\platform.h(33) : see previous definition of 'snprintf'
1>c:\program files (x86)\tesseract-ocr\include\leptonica\pix.h(169): warning C4305: 'initializing' : truncation from 'double' to 'const l_float32'
1>c:\program files (x86)\tesseract-ocr\include\leptonica\pix.h(171): warning C4305: 'initializing' : truncation from 'double' to 'const l_float32'
1>tessopen.obj : warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/INCREMENTAL:NO' specification
1>  tess_open.vcxproj -> C:\Users\eouerten\documents\visual studio 2010\Projects\tess_open\Debug\tess_open.exe
1>FinalizeBuildStatus:
1>  Deleting file "Debug\tess_open.unsuccessfulbuild".
1>
1>Build succeeded.
1>
1>Time Elapsed 00:00:02.93
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========


an when running
0xc0150002.PNG
0
Can a Fujitsu ScanSnap iX500 use OCR to specifically NAME a file (PDF) as the value found when scanning?  For example, it we have a stack of invoices, all formatted the same with the Invoice Number in the same location, can we scan those and ask ScanSnap to name each individual scan page based on the value of the Invoice Number?  (I.e. Inv123.pdf, Inv456.pdf, Inv789.pdf)

Alternatively, I am more sure that we can make a PDF SEARCHABLE so that we could SEARCH within each document for a given invoice number... (i.e. search ALL PDFs that contain INV456)).  Only problem is, that would be slower than simply eyeing down a list of filenames...

Best way to scan and file similar documents?

Thank you!
0

OCR

532

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>