[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

OCR

550

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

How to put a date-time stamp on a PDF file with free software - Foxit Reader
I previously published an Experts Exchange video Micro Tutorial that describes how to scan documents to a PDF file using an excellent, free product called Foxit Reader:

How to scan to a PDF file with free software

N.B.: As with any "free" software, there may be restrictions, which are always specified in the software's licensing agreement, typically known as the End-User License Agreement (EULA). I encourage you to read the entire EULA of this product to be certain that you are in license compliance.

This new video Micro Tutorial shows where to download the free Foxit Reader and explains how to use it to place a date-time stamp on a PDF file.

1. Download and Install the Free Version of Foxit Reader


Visit the website for Foxit Reader at Foxit Software:

https://www.foxitsoftware.com/pdf-reader/

Select the language and O.S. from the drop-downs, then click the big, red Download button:

After downloading, run the installer.

step1

2. Run Foxit Reader


The installer creates a Foxit Reader program group with a shortcut to the Foxit Reader program.

Click the shortcut to run Foxit Reader.

step2

3. Put a pre-defined date-time stamp on the PDF


After opening a PDF file, click the Comment menu.

Click the drop-down on the Stamp button.

Click one of the five pre-defined Dynamic Stamps, all of which have a date-time stamp.

Position the mouse wherever you want the stamp and click to place it.

step3

4. Create a custom date-time stamp


Click the drop-down on the Create button.

Click Create Custom Dynamic Stamp.

Select a Stamp Template and fill in the options in the dialog box.

Click Add.

Click OK.

step4
2
LVL 21

Expert Comment

by:Andrew Leniart
Another great "Winograd Micro Tutorial" :)

Good stuff Joe, should be highly useful to point askers to.

Endorsed!
0
LVL 62

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Andrew,
Thank you for the kind words and the endorsement — I really appreciate both! Regards, Joe
0
CompTIA Cloud+
LVL 12
CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

I have some sample code of using Tesseract-OCR.
Right now the code just opens up a image file and extracts what text it can

I have a picturebox.
I have a textbox and a button
 The OCR Button Load the image from the opendialog in the picturebox and extracts what text it can.

I want to type text  in the text box AND if Tesseract can find it in the image then highlight the text it finds on the image in the picture box .
The ocr button does a pretty good job on extracting what text it can .
I will include a sample image i been working with.
Some sample code would be great .
Thanks for all comments and help.
123.tif
Tesseract-OCR.rar
0
Need a Windows Forms User Interface to enable OCR User Error Checking

I am in the process of writing a Windows Forms application (in C#) that will use an external OCR module to perform OCR on a PDF containing scanned financial documents. So, I expect there to be errors. But, I want to provide the scanned results to the user in a format where the results can be cross-checked by the user.

Clearly, typing over with the corrected value is key.

What is the easiest way to do this in a Windows Forms program?

Various forms will have different values, so I want this as generic as possible.

Shall I just display the whole block of data as a multi-line text input field?

Any other ideas?

Thanks
0
adobe acrobat reader dc (free)

I click on one pdf and it allows me to add/edit text

but another pdf I click "edit" and asks me to sign in (pay) for paid version

Could it be the type of pdf. ocr versus non ocr

I just want to write text over the lines.  I dont want to use Microsoft paint with a screenshot
0
I have a very large set of assorted PDF files. They contain searchable text, but the text is filled with errors. If I save a page from the PDF as an image, and use my OCR software on it, I get a much better result. So, I would like to re-OCR all of the files — but first I need to "flatten" all of the text objects in the PDFs, since none of my OCR tools will overwrite any existing text.

I do have Acrobat X and XI Pro, and I've tried using a batch action to strip the text and rerun OCR, but anytime the program encounters an error, it interrupts the process with a dialog box. I searched for a way to prevent this, but there does not appear to be one.

So, the way I see it, I need one of three things:

1. A way to force Acrobat to skip over errors in batch actions and process the remaining files. I could swear you used to be able to do this.

2. A batch OCR tool, free or paid, that will remove and replace all existing text objects.

3. A tool to batch-flatten all text objects in a large set of PDFs (so I can then run them through OCR). I've found software that looks related, but everything seems to either delete text, which I don't want; or else it flattens form fields, images, etc. but does not mention text.

Any of the three of these would solve my problem. I'm open to other suggestions too, of course -- any advice is greatly appreciated!
0
hi,

Can EE recommend OCR software? We receive files in PDF, some are in editable PDF, some are in image scanned form. We are thinking if we could capture the data into Excel, that would save us a lot of time. thanks
0
I have a bunch of scanned (and OCR'ed) PDF and Word files : I need freeware tools that cud search (for case sensitive+non Sensitive) strings of text using AND & OR operands .

Appreciate a few free tools
0
How to scan to a PDF file with free software - Foxit Reader
I've published three five-minute Experts Exchange video Micro Tutorials that describe terrific features in an excellent, free PDF product called PDF-XChange Editor:

How to rotate pages in a PDF with free software
How to OCR pages in a PDF with free software
How to password-protect a PDF with free software

PDF-XChange Editor has many other features in its free version, but, unfortunately, it cannot do scanning — you must purchase one of its non-free versions to get scanning functionality. Fortunately, there's another excellent, free PDF product that can perform scanning — Foxit Reader. However, the free Foxit Reader cannot do OCR, so you'll want to keep the free PDF-XChange Editor for its OCR capability, and add Foxit Reader for its scanning capability. The combination of the two products will allow you to create searchable PDFs (aka PDF Searchable Image files) with your scanner, utilizing free software.

N.B.: As with any "free" software, there may be restrictions, which are always specified in the software's licensing agreement, typically known as the End-User License Agreement (EULA). I encourage you to read the entire EULA of these products to be certain that you are in license compliance.

In order to scan, Foxit Reader requires an …
3

Expert Comment

by:Basem Khawaja
Joe if I say you are a genius. It would be an understatement. God bless you my friend:)
0
LVL 62

Author Comment

by:Joe Winograd, Fellow&MVE
Hi Basem,
Thank you for the kind words and the video endorsement...both very much appreciated! Regards, Joe
0
My customer has been using Sharpdesk to do OCR conversion from a PDF to Word so they can then edit the Word Document. Sharpdesk is kind of ugly especially since they no longer have Sharp copiers.  Is the a simply way to convert PDFs to Word so that the Word Document will be editable?

   They have tried opening PDFs with Word but some of the PDFs are pure graphics so the OCR is important,....
0
Hi Experts,

With a document scanning project, what is "Searchable PDF"?

I am using Brother Control Center, and I believe when scanning into PDF, they are treated as image, but I know when I use OCR they are converting to simple text format?
0
Become a CompTIA Certified Healthcare IT Tech
LVL 12
Become a CompTIA Certified Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Need to search find closest match in array of strings

I have a static list of about 500 strings containing things like:

VS Credit Voucher Proc-CR Trans 2
VS Credit Voucher Proc-OB Prepaid Trans 2

but am reading from OCR and get the strings from the faxed reports looking like:

VS Credit Voucher Proc-CR Trans 2
VS Crect Voucher Proc-OBPrepaid Trar 2

I need to do a lookup for the best match for each as it appears in the in the static list.

And of course, there needs to be a threshold where NO MATCH is a possibility.

How shall I store the static list? How can I do a search in the list that is resource efficient?

I would sort that list of 500, clearly. But what are the mechanics of the lookup?

I am writing a C# Win Forms (64 bit) application and could include a database, if I could include that into my EXE, to avoid a distinct installation step.

What search algorithm?
 
Thanks.
0
Baby steps with PDFtoText for OCR

What steps are the first for me to take as I create a proof of concept that will be:

- a C# Winforms program
- uses the PDFtoText library for OCR

Are there any demo programs I can review? Should I just dive in?

Thanks
0
how to programmatically isolate PDF from image scan versus an original PDF?

I have a folder filled with PDF's, most of which are scanned copies. But I need a way to pul out the original versions.

I do not want to deal with OCR software and need originals.

Is there a tool which can do this parsing to find originals?

Thanks
0
PaperPort installer detected previous installation
You did a proper uninstallation of PaperPort. You even ran the official PP14 Remover Tool. But when you try to reinstall PaperPort, you get the dialog box above, which you can't get past. There is simply no way to install PaperPort! This article presents a solution that has worked for many PP users.
0
PaperPort Splash Screen
Sometimes PaperPort will not even open. It displays the splash screen (above) and exits, or it may show an "Application Crash" dialog before exiting (sometimes with a dump, sometimes not). There are many reasons for this problem. This article discusses several of them and offers possible solutions.
0
Hi

I have a pdf files got from scanner and I'd like to bulk rename all the files based on OCR data. Can someone provide me at software which can bulk rename base on OCR entries?

Regards,

CK
0
PaperPort Splash Screen
Sometimes PaperPort will not even open. It displays the splash screen (above) and exits, or it may show an "Application Crash" dialog before exiting. There are many reasons for this, but a recent cause that has reached epidemic levels is due to an issue with Firefox. This article offers a solution.
30

Expert Comment

by:Ronny Powell
Comment Utility
It worked as described. Thanks to all involved.
0
LVL 62

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Ronny,
Thanks for joining Experts Exchange today, reading my article, and letting us know that the fix worked for you...very glad to hear that. I'll appreciate it if you take a moment to endorse the article by clicking the thumbs-up icon at the end of it (has 30 in it now). Welcome to EE! Regards, Joe
0
We want to develop an inventory application for a client.  They use Surface tablets to take handwritten notes which they later transcribe manually into Excel sheets.  We have an old application in Access which is close to what they want, but what it lacks is OCR.  We'd like for the client to be able to write their notes directly into a field.

From all that I've read and researched so far, it seems that Access 2016 does not have OCR capabilities, nor could I find add-ins which provide it.  The best I've been able to find has either been the OneNote API (writing notes in OneNote, linking to them in Access), or libraries for WPF (which would entail writing the app from scratch).

Has anyone seeing this done this before?  Any suggestions for accomplishing our goal?
0
As part of a news research project I need to download a series of pages from a site to perform OCR on them.
The site is using PHP and JAVASCRIPT to which I do not have real acquaintance. I have tried to download the image  in order to OCR it, but all pages on documents only show the page 1 and not the following pages.

The page has a button to circulate amongst pages and the code on the inspect is:

<a id="pag_seguinte" class="muda_pag botaoLinha setaDireita" href="http://casacomum.org/cc/visualizador.php?pasta=06337.058.13733&pag=2" title="pg. +1" style="visibility: visible;"></a>

 
Can anyone help me on trying to circulate amongst the pages?
0
OWASP: Threats Fundamentals
LVL 12
OWASP: Threats Fundamentals

Learn the top ten threats that are present in modern web-application development and how to protect your business from them.

We need to start digitizing some of our paper processes.

Basically, we'd like to print invoices and pack slips to PDF (or scan them), while automatically renaming the files based on PO number and/or order number (based on OCR?). Bonus if they automatically print after saving the PDF copy.

What can do this? Or is this even possible??

Thanks in advance!
0
Standalone open source or commerical software which uses Google OCR to be used.

Assume i bought and have the valid Google Vision API credentials and would like to know does any standalone open source or commercial client is available which is already integrated with Google Vision API which has other features as well.

Basically want to convert image to text....bulk conversion etc. via an application,

Thanks.
0
What's your favorite PDF Editor and why?

PDFelement 6 Pro - I've now been using this PDF editor for several months and amazingly, I've found it's OCR and editing capabilities to consistently outperform the genuine Adobe Acrobat X Pro that I also paid several hundred dollars for and own. I also own a license to PDF-XChange Editor Pro and though I find it great for some things I like doing with my PDF's, like highlighting parts of Invoices I send out to clients, it's capabilities don't come close to the PDFelement product.

I'm curious what PDF editor(s) other community users use? Having pretty much mastered what I like doing to PDF's with the three I own, now I want to have a look at others that folks recommend.  Please reply here and let me know :)

Thanks..
0
I need help creating traineddata for Tesseract.  We need to train it for Car VINs.  Linux server got it installed on V4 and I creayed a set of 12 pictures to work with.
In the end, it's to be used in PHP on a website. Everything is completed except it's not accurate enough using only english traineddata.

The existing documentation is too confusing for us.  

I am looking for more specific instruction or someone we could hire for a few hours to help on this.

Out traineddata is located in /usr/share/tesseract/tessdata/
0
Which is the Best OCR engine for most accuracy - commerical or open source in terms of very high quality
0
I'm trying to make a website compliant with the Americans with Disabilities Act. We want visually impaired people to be able to use OCR software more effectively with our college websites. Here is the site:
https://devallauth.dcccd.net/Pages/default.aspx
The OCR reads the page fine. Then when someone enters a search with its Google custom search, the site presents the results in an overlay div but the OCR does not read the search results in the overlay div. The OCR continues to read the original page elements. The overlay div gets added by the Google custom search code.
I then tried adding some code (in a test version of the site) to set the focus to the overlay div; but the OCR still didn't read the search results. Then I noticed that when I would hit the tab key or the enter key, the OCR would start reading the data on the overlay div so I added some JavaScript to force a tab keystroke; but that didn't work. I think that if I can just simulate a tab keystroke or an enter keystroke the right way then the OCR would start reading the overlay div. Here is the code I have for setting the focus on the overlay div wrapper and then forcing a keyboard enter press:
				visibleWrapper.setAttribute('tabindex', -1);	
				visibleWrapper.focus();
				keyUp = jQuery.Event("keyup", {keyCode: 13});
				jQuery(visibleWrapper).trigger(keyUp);

Open in new window

Maybe I should trigger the key event from the <body> or the window? Is there a better route? Any suggestions are appreciated. Thanks.
0

OCR

550

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>