Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x

OCR

532

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

I am considering making a site that can auto-analyze a certain type of uploaded report, and instantly display the results as a PDF. There are various steps involved in the creation of the PDF and I want a feeling for the effort and technology needed for each step.

There are three different steps I will discuss here to see where I can use WordPress plugins and where I need to customize the functionality.

The uploaded report would be a merchant's monthly credit card statement, like the following snippet..

Statement Example
1) So, for the first of three steps, I need a WordPress OCR plug-in. Are there many options for that? Is the angle of the text a problem? I can not guarantee neatness. (I added the underlining to make it easier for me to read)

I imagine allowing an authorized user to upload a report. And i need this plug-in to convert images to some form of digital data, like a PDF or a CSV file.

2) I need a way to analyze that data, and wonder if there is a configurable WordPress plugin for this? It will query the items by the Description, then use the numeric values in the Number, Amount and Total columns for mathematical computations. There will be some mathematical steps performed on some of the data as it generates the output for the report.

The results should go into some format, like a CSV file

3) I need a report tool which can import the data results from Step #2 and apply them to various pre-designed fields in the final pre-designed …
0
Get your problem seen by more experts
LVL 11
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

I need a tool I can use to digitize a report, like the one attached here...

Report
Will this kind of report get a 100% successful conversion rate?

Eventually, I need the tool to be part of my website, but I have not, as yet, chosen my back-end technology. For now, a simple Mac based tool is fine, just so I can hand convert a report that I can start to use in my programming of the back-end.

Windows is okay, if there are limited Mac FREE versions.

I do have Office 365 (Mac) if there is a tool in there which I can use.

I am also interested in hearing what "plug-ins" can work when I deploy this to my website, for online OCR conversions.
0
Security/Privacy related question.  Can text be detected in say a .jpg, . bmp, etc. type file formats?  I know using text within .pdfs can be with OCR.  When I say "detected" I mean with use of a SIEM, DLP or other event driven software?  Not referring to steganography or obfuscation of text in anyway.  Just simple text detection in a jpg or bmp format.  Much thanks.
0
I had this question after viewing PDFTK - filling this PDF but got an error.

After talking with the others on this project, we decided it's ok to have the final PDF as read-only and it doesn't need to be editable.

I ran the commands below I can populate the PDF but not the checkbox. Joe (if you're reading this)....is this because of the LiveCycle issue that the checkboxes don't get checked?  If it is, I got approval to buy LiveCycle. I'll get it and see what's going on.

I tried "No" for value, "On", "1" but I don't see the checkbox checked.

1. i-765 is the orig file

2. notsigned.pdf is the file I QPDF-ed to get rid of the password error message

3. Ran this pdftk.exe notsigned.pdf fill_form i-765.txt output OutputFilled.pdf

4. outputfilled.pdf is the populated PDF.
i-765.pdf
notsigned.pdf
OutputFilled.pdf
0
Hi, do you have any shortcut for avoiding special characters in textbox?
im using this manual code:

Public Class Form1
    Private Sub TextBox1_TextChanged(sender As Object, e As EventArgs) Handles TextBox1.TextChanged
        If TextBox1.ToString.Contains("`") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("~") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("!") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("@") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("#") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("$") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("%") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("^") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("*") Then
            MsgBox("Text contains invalid character(s)", vbInformation, "Invalid!")
        ElseIf TextBox1.ToString.Contains("&") Then

Open in new window

0
I'm hoping there's a solution for this problem.

1. I have the FDF that I want to use to populate a PDF. It's attached. It's i-765.txt

2. I have the PDF file. I got it from the INS site. It's attached here and called i-765.pdf

3. I ran this command
   pdftk.exe i-765.pdf fill_form i-765.txt output Output.pdf

but got this error

Error: Failed to open PDF file:
   i-765.pdf
   OWNER PASSWORD REQUIRED, but not given (or incorrect)
Errors encountered.  No output created.
Done.  Input errors, so no output created.


4. I googled the error and came across this solution saying I need to run qpdf Solution to get rid of the error

    a. I downloaded QPDF from here Download QPDF
    b.  It got installed in folder:  C:\Users\bwa\Downloads\qpdf-7.0.0-bin-mingw32\qpdf-7.0.0\bin
    c. I copied i-765.pdf to that folder
    d. Ran this command
qpdf --decrypt i-765.pdf decrypted765.pdf

Open in new window

   e. Now, I have decrypte765.pdf. I open it and get this message. I click ok on it and the PDF is read-only
        Message I get    f.  I ran this command to get rid of the message
pdftk decryptedi765.pdf cat output i-765notsigned.pdf

Open in new window

   g. I open i765notsigned.pdf and there's no error and the fields are editable.
    However,  I noticed this: The functionality of the …
0
I need OCR software that runs on a Mac, with the scanned images coming from my HP LaserJet MFP M127fw.

What options do I have?

Thanks
0
Hello Experts,

I need an OCR (Optical Character Recognition) App to scan business card and directly update into outlook/exchange address book.

Thank you very much in advance.
0
I'm using ABBY OCR https://www.abbyy.com/en-us/  to read PDF files and parse some fields from it and then using C# code, I store the data in the database.

This is what I want to do:

I have some PDF files but they're forms. As it is now, we download the form and fill the form. For example, fill first name, last name, address, etc. Then we scan it or fax it whatever.

I want to read the entire PDF file and display it in a web page. Along with the image (if it exists).  Then, user fills the form online. For example, read the PDF file using OCR and display the same exact file on a web page, then save the entire form, print, etc.

Is this doable?

Edit: Maybe I should have the same fields the PDF form has on a web page. User fills the fields and somehow I plug those field values into the PDF?

Edit: I've attached a sample PDF form i like to process.
i-765.pdf
0
Hello,

What factors determine the quality and accuracy of OCR (optical character recognition) and is there much variability among different OCR software?

If there is variability in software, what applications are best (both free and purchased)?

Thanks
0
Free Tool: ZipGrep
LVL 11
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Extract Text From Images?i have many images,i searched and found some online convertors but doesn't work becouse i have 10.000 image,so i need a mass tool,can someone help me with this,thank you
0
due to a recent hard drive replacement, I've lost the scanner tool and sharpdesk desktop software for scanning from our AR-M257 to my desktop unit. Is there a current software that I can download and install. I cannot find any disks in house for a reinstall ?
0
I have a client that has no electronic backup files for hard copies of booklets that require periodic changes. In fact, the problem is we don't know what format the original was created in. Some of these books have been around for years. We have a an old xerox scanner utilizing 'Document and Scan MakeReady' version 3.0.0.16 software. We don't have the CD. It doesn't appear to be available - I assume it's really old. It is running on a Windows 2000 Pro PC. The value of this existing software, is it's ability to allow edits from a received  scan document without changing the original format. At this point - after edits, it can be printed prior to having to save it to a file format , like PDF or DOC. In other words, it allows you to cut/paste changes without altering the original format.

Does anybody have a low cost solution providing the above functionality? Is there an OCR out there that will allow you to make changes prior to committing to a file format.
0
Here is the Script courtesy of Brooks Duncan at Document Snap and MacSparky.

I have run it on about 500 files and it mostly runs OK but it has failed about 30 times. Do you think that the delay needs to be increased? Is there anywhere to look for why it failed?


tell application "PDFpenPro"
      open theFile as alias
      tell document 1
            ocr
            repeat while performing ocr
                  delay 1
            end repeat
            delay 1
            close with saving
      end tell
end tell
0
On a Mac Is there a way in Hazel to tell if a file has been ocr’d.
0
Hello and Good Morning Everyone,

          From a previously closed post, I found out I can scan a text document straight into MS Word by using either ABBY Finereader or Paperport.  At this point, I am interested in knowing which program would work best for achieving this goal.  Any shared thoughts, suggestions, or tips will be greatly appreciated.

          Thank you.

           George
0
I'm trying to  get OCR working using YAGF.  I read this is what Google used to scan books. I tried scanning a cookbook page and it gave me nothing.  Then I scanned my mom's harlequin romance book so there were no pictures.  That didn't work either.  Any guesses?  This is using ubuntu.
0
how do I upgrade from from pp 12.1 to 14.5
0
Can a Fujitsu ScanSnap iX500 use OCR to specifically NAME a file (PDF) as the value found when scanning?  For example, it we have a stack of invoices, all formatted the same with the Invoice Number in the same location, can we scan those and ask ScanSnap to name each individual scan page based on the value of the Invoice Number?  (I.e. Inv123.pdf, Inv456.pdf, Inv789.pdf)

Alternatively, I am more sure that we can make a PDF SEARCHABLE so that we could SEARCH within each document for a given invoice number... (i.e. search ALL PDFs that contain INV456)).  Only problem is, that would be slower than simply eyeing down a list of filenames...

Best way to scan and file similar documents?

Thank you!
0
Hire Technology Freelancers with Gigs
LVL 11
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

- Numerous PDFs in a network folder, or to be pulled into the solution via a network scanner
- Need to read the bar code and extract the 5 pieces of data for indexing. OR, OCR portions of the page with the same data as in the bar code.
- Use this data to store the document for search and retrieval later - methods may vary. Would like documents placed into folders by the date in the bar code.
- Some sort of compression or load into a database is preferred to keep file size down.
- Windows or Linux based
- OpenSource only: I want to get my hands dirty with it.

Any ideas?
0
I had to reinstall PaperPort 14 but when I try to open it I get an error stating that it has stopped working. How do I eliminate this problem?
0
Nuance is offering Paperport Professional 14 for $59.99.  Info still shows no support for W10.  Will your 14.5 upgrade work on this one?
0
I have tried a simple OCR program from youtube. Here, i am getting 'type expected' ERROR near Graphics. please help in this regards. I am using Visual Studio 2012.

Imports Emgu.CV
Imports Emgu.Util
Imports Emgu.CV.OCR
Imports Emgu.CV.Structure
Public Class Form1
    Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_DEFAULT)
    Dim pic As Bitmap = New Bitmap(270, 100)
    Dim gfx As Graphics = Graphics.FromImage(pic)



    Private Sub Timer1_Tick(sender As Object, e As EventArgs) Handles Timer1.Tick
        gfx.CopyFromScreen(New Point(Me.Location.X + PictureBox1.Location.X + 4, PictureBox1.Location.Y + 30), New Point(0, 0), pic.Size)
        PictureBox1.Image = pic
        PictureBox1.Image = Nothing
    End Sub
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        OCRz.Recognize(New Image(Of Bgr, Byte)(pic))
        RichTextBox1.Text = OCRz.GetText
    End Sub
0
Hi Experts

Could you give a way on how to configure a "price reader device" - used in supermarkets f.e. ?

img_leitor

I guess that for the use it has to take a formated file with codebars/ prices, maybe the file format is  
defined by the device manufacturer. Once the device read this file (by USB f.e.) the device reader use could start.

Isn't it?

Could you clear?

Thanks in advance
0
Does anyone know of a way to increase the font size of PDF so that a 5pt document prints out in 10-12 pt font?  The attached page is 1 page of a 476 page document.

I tried converting it to Word and Excel, but those aren’t viable options when you think of the time it takes to OCR the doc.  Also, the output is horrible and there are so many possible mistakes, it doesn’t bare thinking of it as an option.

I don’t think there’s a way to do it, not even zooming with a copier works.
0

OCR

532

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>