[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

How to compare 2 images for similarity?

Posted on 2006-06-07
12
Medium Priority
?
239 Views
Last Modified: 2008-03-10
Hi All,

I have 100000 tiff images in a folder. There are many duplicate images with different names. I want to compare images for duplication.

Could somebody help me in that?
I heard that there are utilities to compare in binary form...

0
Comment
Question by:jaipur07
9 Comments
 
LVL 40

Accepted Solution

by:
Richard Quadling earned 400 total points
ID: 16850376
Hi jaipur07,


Depending upon your OS, there are many options.

As these are binary files, comparing them would normally result in a daft amount of differences.

Another option is to use a program that does the following ...

1 - Get a hash value for each file.
2 - If the hash value already exists in the internal array then this suggests a duplicate image.

I work with PHP.

Using PHP5, this could be accomplished with ...

<?php
$a_hash = array();
foreach(new DirectoryIterator('/your/folder/here/' as $o_FILE)
 {
 $a_hash [ md5_file ( $o_FILE->getPathname () ) ] = $o_FILE->getPathname ();
 }
foreach($a_hash as $s_hash => $a_filenames)
 {
 if (count($a_filenames) > 1)
  {
  print_r($a_filenames);
  }
 }
?>


Regards,

Richard Quadling.
0
 

Author Comment

by:jaipur07
ID: 16850400
Thanks Richard

I would appriciate if you can point something in Java
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 16850420
Ah. Not my strong point at all. I've not done Java.

But getting this script running  on windows would take around 2 minutes.

Maybe creating a pointer question in the Java section to this one would be of use.

Watch out for anyone saying you have to compare every file with every other file.

You don't.

The md5 hash is good enough to determine similarity.

So you only need to pass through the files once.
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
LVL 16

Assisted Solution

by:PaulCaswell
PaulCaswell earned 400 total points
ID: 16850732
Hi jaipur07,

The quickest way, if the images arent too big, would be to use WinZip to zip them all up. Then open the archive and configure WinZip to display the CRC of the files. If the files are really identical then the CRC should match.

Alternatively, I have published a freeware tool you could use if you wish. It can calculate CRC32 or SHA1 signatures for the files. Let me know if that is someting you are interested and I'll post a link and a command-line for you.

Paul
0
 
LVL 11

Assisted Solution

by:CarlosMMartins
CarlosMMartins earned 400 total points
ID: 16850734
For windows you can find a lot of duplicate image finders...
the first one i got on google was: http://www.snapfiles.com/get/imagecomparer.html

RQuadling would work great if you only have *exact* duplicate images.
However, in most cases, people have resized images, or with a slight modification (some website text layer added, or something similar), and these programs still allow you to match those similar images.

I've tried some over 5 years ago, and they did work ok - although it could be a time consuming task.

If you'd want to do it yourself, you'd need a more complex algorithm: rescaling, calculate similarity for different areas of the image, etc...
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 16850751
Yes. I agree that an md5 would only provide a match where the BINARY is identical. It would NOT make any allowance for the content of the image.

0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 400 total points
ID: 16854116
You may want to check out Beyond Compare from http://www.scootersoftware.com -- They have a plug in for comparing images.  Not sure if you can automate it.
0
 
LVL 1

Assisted Solution

by:hephalump
hephalump earned 400 total points
ID: 16919784
Norton Systemworks 2000 had a utility which found identical files.
You can limit the search to a particular directory or path which then finds all duplicate files.
I haven't tried it on a folder of 100k images but it did work as it found files that I had backed up and were duplicates.
0
 
LVL 38

Expert Comment

by:Jim P.
ID: 17371907
No objections.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

With the shift in today’s hiring climate (http://blog.experts-exchange.com/ee-blog/5-tips-on-succeeding-in-the-new-gig-economy/?cid=Blog_031816), many companies are choosing to hire freelancers to get projects completed efficiently and inexpensively…
In this article, I’ll show how research, determination, and use of modern technology helped me solve a DNA mystery.
Articles on a wide range of technology and professional topics are available on Experts Exchange. These resources are written by members, for members, and can be written about any topic you feel passionate about. Learn how to best write an article t…
Where to go on the main page to find the job listings. How to apply to a job that you are interested in from the list that is featured on our Careers page.
Suggested Courses

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question