Looking for duplicate file finder / remover software

We're looking for a software that can find duplicate files of all types either locally or remotely (network share). The app should be able to find files not just by name but by "signature" (something like MD5 hash). It should be able to find duplicates even if the filenames do not match but contents are the same (that's where other, more "intelligent" comparison mechanisms come into play). It hast to work on Windows XP/2003 and should be robust. I would like to avoid java-based solutions, as they tend to be slow and require java, which is not available on every system.

Please don't post top 5 google searches - I can use google myself quite nicely. This will need to run on XP/2K3, free solution would be ideal. Any commercial software should be reasonably priced and preferrably by an established company, not some fly-by-night operation that won't be there tomorrow to support the product.

Thanks in advance!
LVL 11
Who is Participating?
Gary CaseConnect With a Mentor RetiredCommented:
Duplicate File Finder [Available here:  http://brooksyounce.byethost13.com/ ] will do what you want ... it finds all duplicates regardless of the filenames, dates, etc.   It WILL, of course, take a potentially very long time if you have it set to search a very large set of files ... but it does the job.   I recently ran it overnight to identify duplicates across 3 750GB drives => I didn't time the search (I was gone all of the next day) ... but it definitely took a long time.   I repeated the search with the "Fast Search" option checked, and it was MUCH faster ... and found the same set of duplicates, even though the "Fast Search" option has a "less accurate" caveat by it.

It works fine on local drives; external drives; network drives; etc.

CynepMeHAuthor Commented:
Thanks for suggestion but it didn't work out - it is too limited in features and a bit on a sluggish side.
I think at this point I'm just about ready to give up on "free" solutions. I've tried several and they were worth exactly what I paid for them. I think I should focus on commercial solutions but they need to be comprehensive in terms of features. If there's any commercial products you folks may be familiar with, please post.

Gary CaseRetiredCommented:
Any tool that has to examine every file regardless of the name, date, or any other identifying feature will be "sluggish" with a large # of files.   The tool I suggested works MUCH faster if you check the "Fast Search" option ... and is still very good at finding duplicates (even though it does warn you that this is "less accurate").    Even commercial tools will be "sluggish" with the requirements you've noted to find the duplications without any limitations on the search parameters.
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

CynepMeHAuthor Commented:
Gary, understood - I think performance is not a critical requirements, as long as the job is done. The issue is that we may have number of files that may be named differently but contents are the same. As an example, we may have a user A save a .Net Framework 2.0 file as "dotnetfx.exe". User B may save the same file as "dotnetframework.exe" and User C may have saved it as "MSKBXXXXXX.EXE". I want to be able to identify these types of duplicates as well (MD5 hash?).

So far, I haven't seen within this software any way to identify such instances - maybe I'm missing something.
Gary CaseRetiredCommented:
As long as the location of those files is within the search paths you set, it will identify them as duplicates in that case.

For example, create a file called AAAAA.111 (it can be anything ... just copy some other file and rename it; create a new doc; etc.).   Store it at a known location ... say C:\TestFolder\AAAAA.111

Now copy the file somewhere ... perhaps to D:\MoreStuff\ ... and rename the copy (say to BBBBB.222), so the file at D:\MoreStuff\BBBBB.222  is now a duplicate of C:\TestFolder\AAAAA.111

Now copy the file somewhere else ... perhaps to a mapped network drive K:\DistantStuff\ ... and rename it yet again, perhaps to CCCCC.333, so the file K:\DistantStuff\CCCCC.333 is yet another duplicate of the same file.

When you run Duplicate File Finder, the first thing you need to do is use the "Add Path" button to set the search paths for the duplicates.  It will find ANY duplicates within those search paths, regardless of their names, creation dates, etc. => if they're duplicates, they'll be identified.

In the example above, if you clicked "Add Path" and selected C:\TestFolder as a path; then clicked "Add Path" and selected K:\DistantStuff"  (or if the network location isn't mapped you can simply "point" to the network location); then click on Start Search, it would find AAAAA.111 and CCCCC.333 as duplicates, but would not find BBBBB.222 because you didn't include D:\MoreStuff in the search path.     The search would be quicker if you checked the "Fast Search" box before you clicked on Start Search.    Note you could also simply set the paths to C:, D:, and K: and it would find all 3 of these duplicates .. but this would take a LONG time as it would check EVERY file on all three drives => but it would indeed find all instances of duplicates (which is what you asked for).
CynepMeHAuthor Commented:
Gary, thanks for your suggestions. I want to see what else may surface, although it is beginning to look like we may have to go to a commercial software after all, as I am told we'll need to produce reports too.

I'm currently looking at Quest and NTP Software tools - we'll see how much they cost and what they can do.

If you know of any commercial products that can provide more features and benefits, please advise.

Gary CaseRetiredCommented:
The product I suggested would work fine ==> as I noted, it finds "... duplicates even if the filenames do not match ...";  works "... on Windows XP/2003 ...";  is reasonably "... robust ..." ; is not "... java-based ...";  and is free ("... free solution would be ideal ...").   It is, as the author noted, a bit "... sluggish ..." => but any product that's not index-based and has to search the entire path will take a bit of time.

Bottom line:  Duplicate File Finder would certainly seem to have been a solution that would resolve the question.   The fact that the asker elected to go a different path doesn't negate that solution.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.