How to find and eliminate duplicate files on your Windows systems.

Posted on
6,372 Points
1 Endorsement
Last Modified:
Experience Level: Intermediate
Ed Covney
Retired USN in '88. Then IT & s/w dev. Fully retired in 2015. Now practicing math skills, long neglected, and learning VBA (to demo math).
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade or two, we've accumulated and shared many 10s of thousands of pictures. And in an effort to begin managing them, it seemed that finding and deleting duplicates was a pretty good start. Easier said than done.

About 8 or 9 years ago, I wrote a program that enumerated all files of a type (extension) in any drive (or folder). Once all the file names are "listed", I added a section to then get an MD5 hash of each file. As each file is hashed, it writes 4 pieces of data to a tab (or comma) delimited text file:
32h byte hash,  file size,  file name,  full file name.

Video Steps

1. Select Drive

When you run DupeFF.exe, all available drive letters will be listed down the left side - choose one.

2. Select a Folder

You can search entire drives, or select a specific folder. Whatever you select will be searched entirely including all sub-folders.

3. Select a File Type

You can search for any file type. If you want to select ALL files (all types),
click the last type "*" listed. Or select any other type by entering the extension where the "*" is. To select any excel spreadsheet type, enter "xl*" (it will find xls, xlsm, xlsb, etc.)

4. Click the #4 Enumerate File List button

The program will list all the files in finds based on the criteria you previously provided. By default, the final report text file will be "Tab Delimited". If you prefer, click on the "Commas" check box.  Also by default, only the first 64K characters of files are actually hashed. Again, if you desire, check the "Hash Full File Content" instead.

5. Hash all enumerated files

 Click the #5 "Hash Enumerated File List" button. When complete, it creates a "txtx" text file. Note, the extra x. You probably won't have a "txtx" associated with a program, as it can be very large file. If it's a small file, let notepad open it, else open it with wordpad or word. You can also import it directly into Excel but in the end no matter where it is, we want to place a full copy into the clipboard.

6. Open DupeFF.xlsm

Once open, review the information on the "Instructions" tab. Taking the steps recommended, you'll be able to preview duplicates and once you're convinced duplicates are REALLY duplicates, you can easily delete even thousands of them in seconds.

EE allows me to attach the spread sheet  "DupeFF.xlsm"  that I use in the video, but not the program. For those who'd like a copy of the DupeFF.exe program or its Delphi XE2 source code, please contact me directly:
Author:Ed Covney

Featured Post

Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

Join & Write a Comment

Fix RPC Server is unavailable Error in Exchange 2013, 2010, 2007, and 2003 Server. Different reason can such as network connectivity issue, name resolution issue, firewall, registry corruption that lead to RPC Server Unavailable error.
To make data more confidential it is needed to generate a watermark on a PDF file. So, read the blog and get the complete knowledge about PDF watermarking. Also, learn simple ways to insert watermark in PDF files.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month