Need Script or Macro to Identify and Move Duplicate Files (Based on Size and Date)

I am hosting my domain on a shared Hostgator Linux Server.
I have about 10,000 emails of which 20 percent are duplicates.

I have created a cPanel backup,
downloaded the domain.tar.gz file, and
unzipped it to my Windows 10 file system.

The emails I need are in one folder.
Each email is in plain text format.

In Windows 10 File Manager, when you sort the files by Size, you can see the Date Stamp is usually within one (or a few) minute(s) similar.

When you open the two such files and look at the headers, you will see the Return-Path bounce numbers to be identical, e.g., "bounce-mc.us5_12385835.1708185".

I am trying to find a script, macro, or automated solution that will help me find and move duplicate text files.

Perhaps a simple workaround would be to identify suspected duplicate files that reside in folder Z (using the above mentioned algorithm), then move one of them to folder A, the other to folder B. That alone might be sufficient. I can use Beyond Compare to confirm results.

Adding a more sophisticated text compare on the headers might give full confirmation.

I prefer working on the Windows machine, but if you can only suggest an automated solution in Linux, I could build a Linux machine to accomplish the task.

In case it's relevant, the headers look like this:

Return-Path: <>
Received: from
	by with LMTP id gIXMHuz3yFt+ig8AnVq7BA
	for <>; Thu, 18 Oct 2018 16:15:24 -0500
Return-path: <>
Delivery-date: Thu, 18 Oct 2018 16:15:24 -0500
Received: from ([]:23373)
	by with esmtp (Exim 4.91)
	(envelope-from <>)
	id 1gDFdX-004GnL-KP
	for; Thu, 18 Oct 2018 16:15:24 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=k1;;
Received: from ( by id hp3rue2ddl4j for <>; Thu, 18 Oct 2018 21:15:04 +0000 (envelope-from <>)
Subject: =?utf-8?Q?The=20world=20wobbles=20and=20Baidu=20sits=20pretty?=
From: =?utf-8?Q?PingWest?= <>
Reply-To:  <>
To: <>
Date: Thu, 18 Oct 2018 21:15:04 +0000
Message-ID: <>
X-Mailer: MailChimp Mailer - **CID62efb20649e75d109c61**
X-Campaign: mailchimp87ff9eecfa738064ccd0c1c28.62efb20649
X-campaignid: mailchimp87ff9eecfa738064ccd0c1c28.62efb20649
X-Report-Abuse: Please report abuse for this campaign here: 
X-MC-User: 87ff9eecfa738064ccd0c1c28
Feedback-ID: 12385835:12385835.1708185:us5:mc
List-ID: 87ff9eecfa738064ccd0c1c28mc list <>
Precedence: bulk
X-Auto-Response-Suppress: OOF, AutoReply
X-Accounttype: pd
List-Unsubscribe-Post: List-Unsubscribe=One-Click
Content-Type: multipart/alternative; boundary="_----------=_MCPart_1118978084"
MIME-Version: 1.0
X-Spam-Status: No, score=-3.2
X-Spam-Score: -31
X-Spam-Bar: ---
X-Spam-Flag: NO

Open in new window

Jerry LOperations ManagerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jose Gabriel Ortega CastroEE Solution Guide/Topic Advisor and CEO Faru Bonon ITCommented:
Hi Jerry for the complexity of the question I think that should be a paid project. However here's a start up wth logic

$path= "path whre the files are"
$allfiles = gci -path $Path -File -Recurse

for($i=0; $i -lt $allfiles.count ;$i++){
    for($j=0; $j -lt $allfiles.count ;$j++){
        if($i -eq $j){
            #compare with the file i,j if they are different skip them
            #elsecheck the interestline for both

            $line1= Get-Line $allfiles[$i]
            $line2 = Get-Line $allfiles[$j]

            #if boths applis then is the same file, 
            #do the move


function Get-Line{
    $InterestLine = Get-content $file | where{ $_ -like "*Return-Path*"} | %{
        $bounce = $_.split('_')[1].split("-")[0]  
    return $bounce

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Jerry LOperations ManagerAuthor Commented:
Thank you for looking into this for me.
I'm going to look for ready-built solutions, such as MapiLab Duplicate Remover.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.