OCR

565

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

I have a computer running windows 10.    The DVD player will not open or play a DVD or CD.   I can burn a dvd or CD in the burner and it will burn the disk.   But when I go to play it, the disk does not play.    If I take that disk and put it in another computer, the disk plays fine.   Also if I go to open on the computer in question, I can see the files but no matter which file I click on, the disk will not play.  Any ideas?
1
Announcing the Winners!
LVL 13
Announcing the Winners!

The results are in for the 15th Annual Expert Awards! Congratulations to the winners, and thank you to everyone who participated in the nominations. We are so grateful for the valuable contributions experts make on a daily basis. Click to read more about this year’s recipients!

Hi
Node.js
Calling the library https://github.com/naptha/tesseract.js#tesseractjs

We call the function worker.recognize(path2png, language) for OCR of a PNG in a await function.

async function readPNG(path2png, language) {
 const worker = new TesseractWorker();
 try{
   let result = await worker.recognize(path2png, language);
    return result.text;
 } catch (error) {
   console.error("************************** error=",error)
 } 
}

Open in new window


There is a crash in tesseract and we would expect that it lands in the catch(error), but it does not. Instead, we get this and no callback.

contains_unichar_id(unichar_id):Error:Assert failed:in file /src/src/ccutil/unicharset.h, line 502
trap!
trap!
abort("trap!"). Build with -s ASSERTIONS=1 for more info.
abort("trap!"). Build with -s ASSERTIONS=1 for more info.

/home/diego/NetBeansProjects/FromGitHub/tmp/localsearch_triage/node_modules/tesseract.js-core/tesseract-core.js:8
var Module=typeof TesseractCoreWASM!=="undefined"?TesseractCoreWASM:{};var moduleOverrides={};var key;for(key in Module){if(Module.hasOwnProperty(key)){moduleOverrides[key]=Module[key]}}Module["arguments"]=[];Module["thisProgram"]="./this.program";Module["quit"]=(function(status,toThrow){throw toThrow});Module["preRun"]=[];Module["postRun"]=[];var ENVIRONMENT_IS_WEB=false;var ENVIRONMENT_IS_WORKER=false;var ENVIRONMENT_IS_NODE=false;var ENVIRONMENT_IS_SHELL=false;ENVIRONMENT_IS_WEB=typeof window==="object";ENVIRONMENT_IS_WORKER=typeof 

Open in new window

0
Is there a scanning or OCR program that can convert signatures of first name last name into Excel?  I have a pdf that has first name and last name as signatures and I want it in excel so I can sor it.
0
i got some free online ocr converters but they have limited pages that are being converted
0
I have the attached doc in a foreign language: is there any
online free translation that I could just upload the whole
doc & it returns me the equiv doc in English?

I guess some sort of OCR is needed: my OCR may not work
well with a foreign language
builData_classificationFrenchy2018.pdf
0
I want to scan a document and use ocr so I can edit it in word 2016
0
We have an order management system that is based on VB6, runs in Access 2003. It currently has code (we wrote all of the program) which sends outgoing faxes from the system to a Castell Faxpress box we have. For incoming faxes, we have a home brewed OCR system built which converts all incoming faxes to a copy/pastable PDF.

Any ideas what out there nowadays could replace either or both of these? The boxes this stuff runs on need to be replaced and instead of doing that we want to at least step into the 2010's.
0
Hi,

Someone can suggest some .net ocr free library, or any paid at a reasonable price?

best regards
0
In previous work place, the Canon IR ADV 4251 offers OCR scanning to PDF
ie the resultant softcopy PDF is text-searchable.

However, in current work place, the same model does not offer this feature.

Is this a plug-in or just an OS upgrade or some sort of additional add-on we
have to buy : can point me to some articles/manual that mention this?

If it's a software upgrade only, can point me to the specific version & where
to get the software?  If it's just a feature to turn on, appreciate the
instructions on how to do this
0
Hi Experts,

Can anyone recommend a good scanning software that has OCR capacity, while maintaining the format of the document, works with any scanner with sheet feeder and produces a PDF?

I have a Brother  Control Center, but the OCR produces a simple text file, but I need to preserve the formatting.
Thanks
0
Python 3 Fundamentals
LVL 13
Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

Image Magick C# Library throwing exception

I suspect this is an easy one for you to help me solve.

I am trying to run a Visual Studio project that works on my friend's Windows PC, but is throwing a path/library exception on my Windows Visual Studio Community 2015,where Windows is running on my Mac via Parallels.

I verify the file exists, but then I get the following exception...

Message = "PDFDelegateFailed `The system cannot find the file specified.\r\n' @ error/pdf.c/ReadPDFImage/793"

Exception
and here is the code that throws it:

Code that throws exception
0
Our Helpdesk has install  Adobe Acrobat XI Pro  &  Adobe Forms Central
on my laptop when I requested for a software that could do OCR &
convert PDF to editable MSWord doc.

However, they don't know how to use it:  anyone has a Quick Guide on
how to do PDF to Word conversion (with OCR) with these tools?
0
I have some sample code of using Tesseract-OCR.
Right now the code just opens up a image file and extracts what text it can

I have a picturebox.
I have a textbox and a button
 The OCR Button Load the image from the opendialog in the picturebox and extracts what text it can.

I want to type text  in the text box AND if Tesseract can find it in the image then highlight the text it finds on the image in the picture box .
The ocr button does a pretty good job on extracting what text it can .
I will include a sample image i been working with.
Some sample code would be great .
Thanks for all comments and help.
123.tif
Tesseract-OCR.rar
0
Need a Windows Forms User Interface to enable OCR User Error Checking

I am in the process of writing a Windows Forms application (in C#) that will use an external OCR module to perform OCR on a PDF containing scanned financial documents. So, I expect there to be errors. But, I want to provide the scanned results to the user in a format where the results can be cross-checked by the user.

Clearly, typing over with the corrected value is key.

What is the easiest way to do this in a Windows Forms program?

Various forms will have different values, so I want this as generic as possible.

Shall I just display the whole block of data as a multi-line text input field?

Any other ideas?

Thanks
0
adobe acrobat reader dc (free)

I click on one pdf and it allows me to add/edit text

but another pdf I click "edit" and asks me to sign in (pay) for paid version

Could it be the type of pdf. ocr versus non ocr

I just want to write text over the lines.  I dont want to use Microsoft paint with a screenshot
0
I have a very large set of assorted PDF files. They contain searchable text, but the text is filled with errors. If I save a page from the PDF as an image, and use my OCR software on it, I get a much better result. So, I would like to re-OCR all of the files — but first I need to "flatten" all of the text objects in the PDFs, since none of my OCR tools will overwrite any existing text.

I do have Acrobat X and XI Pro, and I've tried using a batch action to strip the text and rerun OCR, but anytime the program encounters an error, it interrupts the process with a dialog box. I searched for a way to prevent this, but there does not appear to be one.

So, the way I see it, I need one of three things:

1. A way to force Acrobat to skip over errors in batch actions and process the remaining files. I could swear you used to be able to do this.

2. A batch OCR tool, free or paid, that will remove and replace all existing text objects.

3. A tool to batch-flatten all text objects in a large set of PDFs (so I can then run them through OCR). I've found software that looks related, but everything seems to either delete text, which I don't want; or else it flattens form fields, images, etc. but does not mention text.

Any of the three of these would solve my problem. I'm open to other suggestions too, of course -- any advice is greatly appreciated!
0
hi,

Can EE recommend OCR software? We receive files in PDF, some are in editable PDF, some are in image scanned form. We are thinking if we could capture the data into Excel, that would save us a lot of time. thanks
0
I have a bunch of scanned (and OCR'ed) PDF and Word files : I need freeware tools that cud search (for case sensitive+non Sensitive) strings of text using AND & OR operands .

Appreciate a few free tools
0
My customer has been using Sharpdesk to do OCR conversion from a PDF to Word so they can then edit the Word Document. Sharpdesk is kind of ugly especially since they no longer have Sharp copiers.  Is the a simply way to convert PDFs to Word so that the Word Document will be editable?

   They have tried opening PDFs with Word but some of the PDFs are pure graphics so the OCR is important,....
0
Starting with Angular 5
LVL 13
Starting with Angular 5

Learn the essential features and functions of the popular JavaScript framework for building mobile, desktop and web applications.

Hi Experts,

With a document scanning project, what is "Searchable PDF"?

I am using Brother Control Center, and I believe when scanning into PDF, they are treated as image, but I know when I use OCR they are converting to simple text format?
0
Need to search find closest match in array of strings

I have a static list of about 500 strings containing things like:

VS Credit Voucher Proc-CR Trans 2
VS Credit Voucher Proc-OB Prepaid Trans 2

but am reading from OCR and get the strings from the faxed reports looking like:

VS Credit Voucher Proc-CR Trans 2
VS Crect Voucher Proc-OBPrepaid Trar 2

I need to do a lookup for the best match for each as it appears in the in the static list.

And of course, there needs to be a threshold where NO MATCH is a possibility.

How shall I store the static list? How can I do a search in the list that is resource efficient?

I would sort that list of 500, clearly. But what are the mechanics of the lookup?

I am writing a C# Win Forms (64 bit) application and could include a database, if I could include that into my EXE, to avoid a distinct installation step.

What search algorithm?
 
Thanks.
0
Baby steps with PDFtoText for OCR

What steps are the first for me to take as I create a proof of concept that will be:

- a C# Winforms program
- uses the PDFtoText library for OCR

Are there any demo programs I can review? Should I just dive in?

Thanks
0
how to programmatically isolate PDF from image scan versus an original PDF?

I have a folder filled with PDF's, most of which are scanned copies. But I need a way to pul out the original versions.

I do not want to deal with OCR software and need originals.

Is there a tool which can do this parsing to find originals?

Thanks
0
Hi

I have a pdf files got from scanner and I'd like to bulk rename all the files based on OCR data. Can someone provide me at software which can bulk rename base on OCR entries?

Regards,

CK
0
We want to develop an inventory application for a client.  They use Surface tablets to take handwritten notes which they later transcribe manually into Excel sheets.  We have an old application in Access which is close to what they want, but what it lacks is OCR.  We'd like for the client to be able to write their notes directly into a field.

From all that I've read and researched so far, it seems that Access 2016 does not have OCR capabilities, nor could I find add-ins which provide it.  The best I've been able to find has either been the OneNote API (writing notes in OneNote, linking to them in Access), or libraries for WPF (which would entail writing the app from scratch).

Has anyone seeing this done this before?  Any suggestions for accomplishing our goal?
0

OCR

565

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>