We help IT Professionals succeed at work.

Remove puch hole marks from image with Java

Hello,

There are some commercial, and expensive tools out there to remove puch hole marks from scanned documents.

I need to automate that and do it in Java so that I have a clean scanned page.

I am trying to find algorithms or free libraries to do so. That is, image enhancing with Java.

At this point I was able to learn how to auto deskew an image but I also need to remove the puch holes.

To goal is to have a java app to do a batch operation on scanned pages and remove puch holes if they exist.

Any ideas or suggestions would be extremely helpful.

Thanks.
Comment
Watch Question

Try Jmagick which is a wrapper for imagemagick
http://www.jmagick.org/lenya/jmagick/live/index.html

Author

Commented:
I've looked at it but it doesn't have what I need. JAI seems more complete than that and I could not figure an algorithm or a class that would accomplish the removal or fillings of the holes.
CERTIFIED EXPERT
Top Expert 2016

Commented:
Any chance of a sample image as attachment?

Author

Commented:
CEHJ,

Here is an image to ilustrate the problem. I will also attach a sample one.It has a page with the hole and then treatment without them.

Author

Commented:
Here is a sample image. Note that aside the holes, there are black margin marks that need to be removed automatically as well.

Sample image
CERTIFIED EXPERT
Top Expert 2016

Commented:
Two things spring to mind

a. copy the right margin to the left (or a white rectangle)
b. try OCR on the whole thing

Author

Commented:
Some OCRs will not work if you have the holes, Tesseract will. But we want to treat the image not OCR it. There are some algorithms for it that I am exploring but I was hoping I would have some solutions here as well.
CERTIFIED EXPERT

Commented:
Most image enhancement is based around finding the right type of filter to apply to the image, so it removes the noise and leaves behind the original image.

There's lots of examples of this sort of thing in Java here:
http://java.sun.com/products/java-media/jai/forDevelopers/jai1_0_1guide-unc/Image-enhance.doc.html

However in your case you're looking to remove a very specific mark rather than general noise, so I think you're going to want to write an algorithm that identifies that mark directly.  That seems pretty simple - something like summing the value of the neighboring 10x10 pixels around each pixel in the image.  Then the peak values of those summations are likely the centers of your holes - which you then target explicitly and erase.

You could further bias it to add in the distance of the pixel from the center of the page, so circles close to the edge score higher than ones in the center.

The size of the box (the 10x10 I gave) should be targeted to approximately the size of the holes you expect - the better that fit the better the algorithm should work.

Doug

Author

Commented:
Doug,

First of all, thanks for your comment. I am familiar with that page. The name of the algorithm is actually Houch Circle which searches for circles on a binary image. So I have been reading about it and working on its implementation.

Any other suggestions as far as removing borders, etc are quite welcome.

Author

Commented:
Sorry for the typo. Hough circle.
Commented:
Carlos,

Take a look at the leptonic image processing library.  It is in C, but ...

-- Michael

Author

Commented:
It somewhat does not fix the problem.

Explore More ContentExplore courses, solutions, and other research materials related to this topic.