We help IT Professionals succeed at work.
Get Started

tesseract OCR problems scanning images

tel2
tel2 asked
on
344 Views
Last Modified: 2019-10-06
Hi tesseract OCR experts,

I’ve just installed tesseract on my Raspberry Pi running Linux (Raspbain) and I’m trying to extract text from PNG screen shots taken on my phone.  (I have hundreds of these screen shots, all in the same size & format, taken over the last year using the LeafSpy Lite app, for the Nissan LEAF EV, and I'll be extracting text from all of them.)

The problem I have is, some of the text is not being extracted.

When I run this command:
$ tesseract sample1.png sample1
It produces sample1.txt (attached), which includes plenty of useful figures, but it excludes:
-      “11.84V” near the bottom left (nice to have this voltage figure, but not vital), and
-      “32.0%” at the bottom (I really need this SOC figure).

I tried feeding tesseract a negative (created with IrfanView on Windows) of the image, in case it was a black/white issue, but that gave the same output.
I tried cropping the 11.84V and 32.0% figures out to TIF files (see sample1_voltage.tif & sample1_soc.tif attached, also created with IrfanView on Windows) then running them through tesseract, and that:
-      failed for the 11.84V (see empty sample1_voltage.txt attached), but
-      worked for the 32.0% (see sample1_soc.txt attached).

I know bash and Perl scripting.  I don’t know Python, but Python is installed so it could be used if necessary, if someone else writes the code, but it's not my preference.
ImageMagicK is also installed, in case I need to use it for cropping or whatever.

I haven’t found anything useful in the tesseract documentation yet, but if I can get it to look at specific rectangles something like this setRectangle command, then maybe that would be simpler, but I don’t see how to use that from the command line (that link seems to be for the R language).

Any suggestions on how to get the 11.84V and 32.0% figures extracted from files like sample1.png in a fully automated way?

I guess I could crop the 32.0% with ImageMagicK, or do a batch crop via IrfanView on Windows.  (I’d prefer to do it all from Linux so it’s all in one place.)  Then I could feed that plus the original file through tesseract and combine the contents of the .txt file outputs.  But cropping doesn’t seem to work for the 11.84V so I’m not sure how to get that.
Any better ideas?

Before anyone puts in a lot of effort with this, please pass your plan by me first, so you don't waste time going down a path that I'm not keen on using.


Here’s what happened when I ran the commands:

$ tesseract sample1.png sample1
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Detected 23 diacritics

$ tesseract sample1_soc.tif sample1_soc
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1

$ tesseract sample1_voltage.tif sample1_voltage
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
Empty page!!
Empty page!!



Here’s my version info:
$ tesseract -v
tesseract 3.04.01
 leptonica-1.74.1
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.1) : libpng 1.6.28 : libtiff 4.0.8 : zlib 1.2.8 : libwebp 0.5.2 : libopenjp2 2.1.2

$ uname -a
Linux raspberrypi 4.19.66-v7+ #1253 SMP Thu Aug 15 11:49:46 BST 2019 armv7l GNU/Linux

Thanks.
tel2
sample1.png
sample1.txt
sample1_voltage.tif
sample1_voltage.txt
sample1_soc.tif
sample1_soc.txt
Comment
Watch Question
Software Engineer
CERTIFIED EXPERT
Distinguished Expert 2019
Commented:
This problem has been solved!
Unlock 1 Answer and 21 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant

An Experts Exchange subscription includes unlimited access to online courses.

Get Started
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE