OCR

544

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Share tech news, updates, or what's on your mind.

Sign up to Post

My customer has been using Sharpdesk to do OCR conversion from a PDF to Word so they can then edit the Word Document. Sharpdesk is kind of ugly especially since they no longer have Sharp copiers.  Is the a simply way to convert PDFs to Word so that the Word Document will be editable?

   They have tried opening PDFs with Word but some of the PDFs are pure graphics so the OCR is important,....
0
Cloud Class® Course: Python 3 Fundamentals
LVL 12
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

Hi Experts,

With a document scanning project, what is "Searchable PDF"?

I am using Brother Control Center, and I believe when scanning into PDF, they are treated as image, but I know when I use OCR they are converting to simple text format?
0
Need to search find closest match in array of strings

I have a static list of about 500 strings containing things like:

VS Credit Voucher Proc-CR Trans 2
VS Credit Voucher Proc-OB Prepaid Trans 2

but am reading from OCR and get the strings from the faxed reports looking like:

VS Credit Voucher Proc-CR Trans 2
VS Crect Voucher Proc-OBPrepaid Trar 2

I need to do a lookup for the best match for each as it appears in the in the static list.

And of course, there needs to be a threshold where NO MATCH is a possibility.

How shall I store the static list? How can I do a search in the list that is resource efficient?

I would sort that list of 500, clearly. But what are the mechanics of the lookup?

I am writing a C# Win Forms (64 bit) application and could include a database, if I could include that into my EXE, to avoid a distinct installation step.

What search algorithm?
 
Thanks.
0
Baby steps with PDFtoText for OCR

What steps are the first for me to take as I create a proof of concept that will be:

- a C# Winforms program
- uses the PDFtoText library for OCR

Are there any demo programs I can review? Should I just dive in?

Thanks
0
how to programmatically isolate PDF from image scan versus an original PDF?

I have a folder filled with PDF's, most of which are scanned copies. But I need a way to pul out the original versions.

I do not want to deal with OCR software and need originals.

Is there a tool which can do this parsing to find originals?

Thanks
0
PaperPort installer detected previous installation
You did a proper uninstallation of PaperPort. You even ran the official PP14 Remover Tool. But when you try to reinstall PaperPort, you get the dialog box above, which you can't get past. There is simply no way to install PaperPort! This article presents a solution that has worked for many PP users.
0
PaperPort Splash Screen
Sometimes PaperPort will not even open. It displays the splash screen (above) and exits, or it may show an "Application Crash" dialog before exiting (sometimes with a dump, sometimes not). There are many reasons for this problem. This article discusses several of them and offers possible solutions.
0
Hi

I have a pdf files got from scanner and I'd like to bulk rename all the files based on OCR data. Can someone provide me at software which can bulk rename base on OCR entries?

Regards,

CK
0
PaperPort Splash Screen
Sometimes PaperPort will not even open. It displays the splash screen (above) and exits, or it may show an "Application Crash" dialog before exiting. There are many reasons for this, but a recent cause that has reached epidemic levels is due to an issue with Firefox. This article offers a solution.
26

Expert Comment

by:Michaela Kähne
Comment Utility
Whow, incredible!
It works.
I would have never thought of a Mozilla Firefox update, as at the same time I had a couple of Windows 7 updates ...
Thank u all a lot for publishing and sharing, I got really stuck in despair trying to find the reason for the closedown of the programm
0
LVL 61

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Hi Michaela,
You're very welcome...and thanks to you for joining Experts Exchange today, reading my article, and letting me know that it works for you...I'm happy to hear that! Yes, this is a strange one...still don't know why the presence of "Mozilla Firefox" in that field makes PaperPort close down, but it does. I'm glad that your despair is over. :) Welcome aboard to Experts Exchange! Cheers, Joe
0
We want to develop an inventory application for a client.  They use Surface tablets to take handwritten notes which they later transcribe manually into Excel sheets.  We have an old application in Access which is close to what they want, but what it lacks is OCR.  We'd like for the client to be able to write their notes directly into a field.

From all that I've read and researched so far, it seems that Access 2016 does not have OCR capabilities, nor could I find add-ins which provide it.  The best I've been able to find has either been the OneNote API (writing notes in OneNote, linking to them in Access), or libraries for WPF (which would entail writing the app from scratch).

Has anyone seeing this done this before?  Any suggestions for accomplishing our goal?
0
Cloud Class® Course: CompTIA Healthcare IT Tech
LVL 12
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

As part of a news research project I need to download a series of pages from a site to perform OCR on them.
The site is using PHP and JAVASCRIPT to which I do not have real acquaintance. I have tried to download the image  in order to OCR it, but all pages on documents only show the page 1 and not the following pages.

The page has a button to circulate amongst pages and the code on the inspect is:

<a id="pag_seguinte" class="muda_pag botaoLinha setaDireita" href="http://casacomum.org/cc/visualizador.php?pasta=06337.058.13733&pag=2" title="pg. +1" style="visibility: visible;"></a>

 
Can anyone help me on trying to circulate amongst the pages?
0
We need to start digitizing some of our paper processes.

Basically, we'd like to print invoices and pack slips to PDF (or scan them), while automatically renaming the files based on PO number and/or order number (based on OCR?). Bonus if they automatically print after saving the PDF copy.

What can do this? Or is this even possible??

Thanks in advance!
0
Standalone open source or commerical software which uses Google OCR to be used.

Assume i bought and have the valid Google Vision API credentials and would like to know does any standalone open source or commercial client is available which is already integrated with Google Vision API which has other features as well.

Basically want to convert image to text....bulk conversion etc. via an application,

Thanks.
0
What's your favorite PDF Editor and why?

PDFelement 6 Pro - I've now been using this PDF editor for several months and amazingly, I've found it's OCR and editing capabilities to consistently outperform the genuine Adobe Acrobat X Pro that I also paid several hundred dollars for and own. I also own a license to PDF-XChange Editor Pro and though I find it great for some things I like doing with my PDF's, like highlighting parts of Invoices I send out to clients, it's capabilities don't come close to the PDFelement product.

I'm curious what PDF editor(s) other community users use? Having pretty much mastered what I like doing to PDF's with the three I own, now I want to have a look at others that folks recommend.  Please reply here and let me know :)

Thanks..
0
I need help creating traineddata for Tesseract.  We need to train it for Car VINs.  Linux server got it installed on V4 and I creayed a set of 12 pictures to work with.
In the end, it's to be used in PHP on a website. Everything is completed except it's not accurate enough using only english traineddata.

The existing documentation is too confusing for us.  

I am looking for more specific instruction or someone we could hire for a few hours to help on this.

Out traineddata is located in /usr/share/tesseract/tessdata/
0
Which is the Best OCR engine for most accuracy - commerical or open source in terms of very high quality
0
I'm trying to make a website compliant with the Americans with Disabilities Act. We want visually impaired people to be able to use OCR software more effectively with our college websites. Here is the site:
https://devallauth.dcccd.net/Pages/default.aspx
The OCR reads the page fine. Then when someone enters a search with its Google custom search, the site presents the results in an overlay div but the OCR does not read the search results in the overlay div. The OCR continues to read the original page elements. The overlay div gets added by the Google custom search code.
I then tried adding some code (in a test version of the site) to set the focus to the overlay div; but the OCR still didn't read the search results. Then I noticed that when I would hit the tab key or the enter key, the OCR would start reading the data on the overlay div so I added some JavaScript to force a tab keystroke; but that didn't work. I think that if I can just simulate a tab keystroke or an enter keystroke the right way then the OCR would start reading the overlay div. Here is the code I have for setting the focus on the overlay div wrapper and then forcing a keyboard enter press:
				visibleWrapper.setAttribute('tabindex', -1);	
				visibleWrapper.focus();
				keyUp = jQuery.Event("keyup", {keyCode: 13});
				jQuery(visibleWrapper).trigger(keyUp);

Open in new window

Maybe I should trigger the key event from the <body> or the window? Is there a better route? Any suggestions are appreciated. Thanks.
0
PaperPort XP Compatibility Mode
Nuance's PaperPort may display this error message: PaperPort appears to be running Windows XP Compatibility Mode which may result in errors. We recommend disabling Compatibility Mode for the PaprPort.exe program, see Technote 6629. This article provides a possible solution to the problem.
3
LVL 19

Expert Comment

by:Andrew Leniart
Comment Utility
@Walter
I tried repairing then uninstalling, reinstalling Power PDF Standard. Still getting "DLL not found" when I attempt scanning.

I don't mean to interject here, but I recently had this *exact* problem, albeit when trying to re-install a free PDF writing utility for a client of mine. Whenever he tried to create a PDF, same thing - "DLL not found". After much frustration (and a couple of uninstalls/re-installs) I tracked the problem down to a Windows registry corruption, whereby even uninstalling the app completely and reinstalling it didn't resolve the issue.

I ended up installing the PDF writer on one of my own VM's where it worked correctly, tracked and exported the registry entries from my working VM and imported them into my client's machine. All DLL related problems instantly dissapeared because I'd already verified of course that the necessary DLL's were where they were supposed to be. Your issue may be totally different, but I just thought I'd throw that in as a possible cause.

@Joe - I hope you don't mind me chiming in here, just thought it may be another scenario you may like to consider.

Regards, Andrew
1
LVL 61

Author Comment

by:Joe Winograd, Fellow&MVE
Comment Utility
Walter,
A few more thoughts:

(1) Since you're on W10, make sure you install Patch 1 after installing PP14.5, as PP14.5/Patch1 is the only W10-compliant version of PaperPort. Also, see if any of the Tips in my PaperPort 14 in Windows 10 - A First Look article are things that you haven't tried yet.

(2) Maybe there's a corruption in your user profile that is causing the grief to PaperPort and/or Power PDF. Try creating a new user profile.

(3) This is certainly not the DLL problem, but it would be a good idea to get the latest W10/64-bit drivers from the Brother site for your MFC-J6710DW.

(4) I sent an email to my contacts at Nuance with a link to this thread. They may not want to post publicly about it, but, with their permission, I'll share whatever they say here.

Regards, Joe

Edit: Hi Andrew,
I saw your post after hitting Submit on mine. I don't mind your chiming in here...indeed, I encourage it! Colloaboration with the many bright folks here at Experts Exchange often leads to a solution that eludes an individual. In this case, your idea is certainly a good one... thanks for chiming in! Regards, Joe
0
Anyone has MSWord/editable version of CIS hardening guides?  
If not, appreciate if someone can OCR it to Word (I have problem attaching the PDF files here) as a number of free online ones are limited in
the number of pages that can be converted & boxoft doesn't seems to work well on my PC.
0
Keep up with what's happening at Experts Exchange!
LVL 12
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Trying to build a mini PC that is capable of doing OCR with a limited list of characters (17 total).  Currently working on OCR but QR code could be a option.
It will need to scan for input and save the data (along with location, time and a few other pieces of data) to a online database via wifi (not hard wired).
We have done some testing with a Pi but it does not have the processing power needed.  Results are 20%-44% accurate.
So I am looking for suggestions on a mini computer we could do testing on that could handle everything with 99.9% accuracy.
Device requirements:
1) small 4"x3" ish size
2) On board WiFi
3) Attachable mini camera for data scaning
4) Attachable mini touchscreen (4"x3")
5) On board GPS
We are looking at LattePanda or odroid xu4.  But I need some feedback on if either of them will work or if there are other options.
0
We have a multitude of files we need to OCR, some are images, others PDF. I can find tools which can do these one at a time, but need something that could do hundreds if you point the software in the direction of a folder full. I do not trust online converters as the docs may contain sensitive information. Please let me know of anything that may meet our criteria.
0
I am considering making a site that can auto-analyze a certain type of uploaded report, and instantly display the results as a PDF. There are various steps involved in the creation of the PDF and I want a feeling for the effort and technology needed for each step.

There are three different steps I will discuss here to see where I can use WordPress plugins and where I need to customize the functionality.

The uploaded report would be a merchant's monthly credit card statement, like the following snippet..

Statement Example
1) So, for the first of three steps, I need a WordPress OCR plug-in. Are there many options for that? Is the angle of the text a problem? I can not guarantee neatness. (I added the underlining to make it easier for me to read)

I imagine allowing an authorized user to upload a report. And i need this plug-in to convert images to some form of digital data, like a PDF or a CSV file.

2) I need a way to analyze that data, and wonder if there is a configurable WordPress plugin for this? It will query the items by the Description, then use the numeric values in the Number, Amount and Total columns for mathematical computations. There will be some mathematical steps performed on some of the data as it generates the output for the report.

The results should go into some format, like a CSV file

3) I need a report tool which can import the data results from Step #2 and apply them to various pre-designed fields in the final pre-designed …
0
I need a tool I can use to digitize a report, like the one attached here...

Report
Will this kind of report get a 100% successful conversion rate?

Eventually, I need the tool to be part of my website, but I have not, as yet, chosen my back-end technology. For now, a simple Mac based tool is fine, just so I can hand convert a report that I can start to use in my programming of the back-end.

Windows is okay, if there are limited Mac FREE versions.

I do have Office 365 (Mac) if there is a tool in there which I can use.

I am also interested in hearing what "plug-ins" can work when I deploy this to my website, for online OCR conversions.
0
Security/Privacy related question.  Can text be detected in say a .jpg, . bmp, etc. type file formats?  I know using text within .pdfs can be with OCR.  When I say "detected" I mean with use of a SIEM, DLP or other event driven software?  Not referring to steganography or obfuscation of text in anyway.  Just simple text detection in a jpg or bmp format.  Much thanks.
0
I had this question after viewing PDFTK - filling this PDF but got an error.

After talking with the others on this project, we decided it's ok to have the final PDF as read-only and it doesn't need to be editable.

I ran the commands below I can populate the PDF but not the checkbox. Joe (if you're reading this)....is this because of the LiveCycle issue that the checkboxes don't get checked?  If it is, I got approval to buy LiveCycle. I'll get it and see what's going on.

I tried "No" for value, "On", "1" but I don't see the checkbox checked.

1. i-765 is the orig file

2. notsigned.pdf is the file I QPDF-ed to get rid of the password error message

3. Ran this pdftk.exe notsigned.pdf fill_form i-765.txt output OutputFilled.pdf

4. outputfilled.pdf is the populated PDF.
i-765.pdf
notsigned.pdf
OutputFilled.pdf
0

OCR

544

Solutions

1K

Contributors

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Top Experts In
OCR
<
Monthly
>