How to compare 2 images for similarity?

Hi All,

I have 100000 tiff images in a folder. There are many duplicate images with different names. I want to compare images for duplication.

Could somebody help me in that?
I heard that there are utilities to compare in binary form...

jaipur07Asked:
Who is Participating?
 
Richard QuadlingSenior Software DeveloperCommented:
Hi jaipur07,


Depending upon your OS, there are many options.

As these are binary files, comparing them would normally result in a daft amount of differences.

Another option is to use a program that does the following ...

1 - Get a hash value for each file.
2 - If the hash value already exists in the internal array then this suggests a duplicate image.

I work with PHP.

Using PHP5, this could be accomplished with ...

<?php
$a_hash = array();
foreach(new DirectoryIterator('/your/folder/here/' as $o_FILE)
 {
 $a_hash [ md5_file ( $o_FILE->getPathname () ) ] = $o_FILE->getPathname ();
 }
foreach($a_hash as $s_hash => $a_filenames)
 {
 if (count($a_filenames) > 1)
  {
  print_r($a_filenames);
  }
 }
?>


Regards,

Richard Quadling.
0
 
jaipur07Author Commented:
Thanks Richard

I would appriciate if you can point something in Java
0
 
Richard QuadlingSenior Software DeveloperCommented:
Ah. Not my strong point at all. I've not done Java.

But getting this script running  on windows would take around 2 minutes.

Maybe creating a pointer question in the Java section to this one would be of use.

Watch out for anyone saying you have to compare every file with every other file.

You don't.

The md5 hash is good enough to determine similarity.

So you only need to pass through the files once.
0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

 
PaulCaswellCommented:
Hi jaipur07,

The quickest way, if the images arent too big, would be to use WinZip to zip them all up. Then open the archive and configure WinZip to display the CRC of the files. If the files are really identical then the CRC should match.

Alternatively, I have published a freeware tool you could use if you wish. It can calculate CRC32 or SHA1 signatures for the files. Let me know if that is someting you are interested and I'll post a link and a command-line for you.

Paul
0
 
CarlosMMartinsCommented:
For windows you can find a lot of duplicate image finders...
the first one i got on google was: http://www.snapfiles.com/get/imagecomparer.html

RQuadling would work great if you only have *exact* duplicate images.
However, in most cases, people have resized images, or with a slight modification (some website text layer added, or something similar), and these programs still allow you to match those similar images.

I've tried some over 5 years ago, and they did work ok - although it could be a time consuming task.

If you'd want to do it yourself, you'd need a more complex algorithm: rescaling, calculate similarity for different areas of the image, etc...
0
 
Richard QuadlingSenior Software DeveloperCommented:
Yes. I agree that an md5 would only provide a match where the BINARY is identical. It would NOT make any allowance for the content of the image.

0
 
Jim P.Commented:
You may want to check out Beyond Compare from http://www.scootersoftware.com -- They have a plug in for comparing images.  Not sure if you can automate it.
0
 
hephalumpCommented:
Norton Systemworks 2000 had a utility which found identical files.
You can limit the search to a particular directory or path which then finds all duplicate files.
I haven't tried it on a folder of 100k images but it did work as it found files that I had backed up and were duplicates.
0
 
Jim P.Commented:
No objections.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.