Duplicate File Finder Where File Names are Different...

Looking for Windows file compare utility, similar to Ultra Compare or Beyond Compare, or even Directory Opus, but here's the problem.  I have a folder containing several hundred or more files.  I believe these files may already exist somewhere else on my computer in another folder.  I need to determine if the files in the one folder are duplicates.  To make things more complicated, while these file may be duplicates, the file may occur in multiple locations with a different file name.  For example I have a file named IMG00535.jpg but it may exist somewhere else as "SM0110-180906-IMG00535.jpg".  In summary, I want find all files and locations where a portion of the file name matches one of the short names in a specified folder.  Essentially I have a folder containing photos downloaded from a camera.  I believe they have been downloaded before.  However, it is customary for me to rename the image files to include a Project ID and the date taken in addition to the original name given by the camera.  I'm not interested in searching within the folder containing the list of short names.  Bottom line here is essentially to look for duplicate files where the filenames do not fully match, but perhaps the date, size, or hash do.
Steve MeyerSystem Analyst and DeveloperAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Thomas UCommented:
Hi Steve

If it's ok to buy a tool, I recommend https://www.jam-software.com/treesize/duplicate_search.shtml?language=EN
I bought it, because I have the exact same problem with my users. The store files twice, rename it etc etc.

It searches for all files and compares the hash of it.

Or you can say, not interested in hash, only if the filename is the same, or only filedate etc.
Then you can even create a Symolic link to the file instead of deleting them. So the users do not even notice it and you spare harddrive space.

If you want to code something, powershell...search for all files, compare the hash (built in function) and give out which are the same.. If you want to have it this way, I need to check my notes to come with the code. I needed this once and made a powershell script.

regards
Thomas
2

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Owen RubinConsultantCommented:
Thomas makes a good suggestion. I have used that tool many times, and it works fairly well. The odds of a hash collision are very tiny, so it does an excellent job of finding duplicates with different attributes.  I did not know about the symbolic link option though Thomas. Thanks for pointing that out.
1
Thomas UCommented:
my users work a lot with "promotional videos" they download it, copy it to another folder, want to send it to a different user! even in the same network! instead of make a link to it...damn they even copy it from the network drive to their desktop to have them at hand faster and anytime...so I have a loooot of duplicates to deal with ;)
tressize professional saved my ass once...the first time i partition on the server where all data lies, runing out of space 2TB and could not extend because of MBR partition...so I though, well maybe I can delete old data..bought treesize...duplicates find with hash....and I could delete alomst 800GB of duplicate data / create symbolink links......i hate my users ;-)
3
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

Steve MeyerSystem Analyst and DeveloperAuthor Commented:
Thanks, guys, I'm going to checkout TreeSize.  Anyone know of any limitations using trial version, like maximum number of files, etc.?

Any other suggestions?
1
Thomas UCommented:
The page says 30 days trial without ANY limitation in functions whatsoever...so I believe them ;)
1
Senior IT System EngineerIT ProfessionalCommented:
Thomas,
wow, that cool if you can share the Script here that would be greatly appreciated.

Why not using Windows Server 2012R2 and above for the Deduplication file server ?
1
Thomas UCommented:
Hi Senior IT System Engineer (please change your name ;)

I used that one changed some things to my needs.
https://stackoverflow.com/questions/44358602/trying-to-compare-hashes-and-delete-files-with-same-hash-in-powershell
It's slow and messy, but works

Yes, If I could move my fileserver to a 2012R2 and use deduplication in no time, I would've done that already ;). But it's a task still on my list.
0
Steve MeyerSystem Analyst and DeveloperAuthor Commented:
Here's an update.  I downloaded Treesize.  But also found a couple of reviews and an application from KeyMetric called Duplicate File Detector.  Very nice.  Does it all with lots of options for selecting file folders and drives, and what to base comparison of files on, name, date, size, and/or hash, etc..  Results can be deleted, archived, and replaced with links.   Allows me to determine what folder is to be used for master file list, then finds all matching files in all other selected folders.  I needed to get rid of a particular cloud service, so I downloaded about 12.000 files from that cloud to a folder on my PC and then deleted them from the cloud.  Using DFD, I then selected the downloaded folder as master, and scanned the other selected folders on the PC for duplicates.  I then locked those folder trees containing duplicates that I wanted to keep and marked the remaining files for deletion.  I sent them to them to a temporary archive folder.  I based my comparison on matching file type and file size using one of six available hash protocols, (I used SHA256).  The other hash types are CRC32, ADLER32, MD5, SHA1, and ShA512).   Pretty slick.  I am now evaluating TreeSize for comparison.  This app appears to be pretty slick also and they both cost about the same.  I will let you know the results.  Anyone ever use Duplicate File Detector?
2
nobusCommented:
i tried once such softwares, and found out ine must be very cautious when deleting  - i deleted some folders that were not meant to delete...
not so easy if you have hundreds of folders
0
Steve MeyerSystem Analyst and DeveloperAuthor Commented:
Yes indeed, one must be careful not to accidentally delete the wrong files.  The solution here is to archive your deletions to another drive or memory stick.
0
Owen RubinConsultantCommented:
Or just do a full backup before deleting anything.
0
Steve MeyerSystem Analyst and DeveloperAuthor Commented:
I have been evaluating TreeSize.  It looks like this program does everything but shine your shoes.  I have not been able to configure it however to safely search based on a master list or folder of files, not to say it can't do this.  This is accomplished using Duplicate File Dective (DFD), by prioritizing and locking folder trees that are to be searched but left untouched (no files will be deleted in these folders).  These are what I call my Master set of files.  In this way, based on locked folders, only duplicate files in unlocked folders will be purged.  This took no time to configure and run using DFD.

Now that I have learned something, I need to correct my previous explanation.  Disregard that confusing post.  Lets try this:  To avoid permanently deleting cloud files that may not be on my computer, I downloaded all of my cloud files to my computer, then using DFD, I selected and locked those folders containing what I call my Master files.  DFD then found the duplicates in the downloaded cloud folder and archived them, leaving me to determine what to do with the remaining (orphaned) cloud files.  This was easily configured in DFD.  

I am pretty sure TreeSize can do this, but there are so many options and settings, I haven't yet been able to duplicate the same task yet.  However, because Treesize has so many other useful file management features (besides deduplication), I am going to give credit to the experts, even though I found my own solution.  Thanks folks.  

Also, I was lucky when I searched for a coupon and found DFD for $34 on BitDuJour (discounts good for a day).   Both Treesize and DFD are normally around $50, except Treesize does have a $25 version that doesn't include the full set of features.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows 10

From novice to tech pro — start learning today.