Delete duplicate files Based On File Size With WIndows Batch

I have a bunch of files in a directory (with sub directories) with similar names except the last digit is a different number. I would like to keep the version with the largest file size. However some files will not have any duplicates but I do need to keep that file.

files will look like

111~1.mp4    (1mb)
111~2.mp4    (5mb)
111~3.mp4    (2mb)

222~1.mp4    (3mb)

333~1.mp4    (2mb)
333~2.mp4    (4mb)

444~1.mp4    (1mb)
444~2.mp4    (5mb)
444~3.mp4    (3mb)
444~4.mp4    (7mb)
I would like to keep only the largest version size.

111~2.mp4    (5mb)

222~1.mp4    (3mb)

333~2.mp4    (4mb)

444~4.mp4    (7mb)
Chris LopezAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

oBdACommented:
Don't send a batch to do a PowerShell's one-liner.
This is in test mode and will only show the files it would remove. Remove the -WhatIf at the end to run it for real.
gci . *.mp4 | group @{e={$_.Name.Split('~')[0]}} | % {$_.Group | Sort Length -Desc | Select -Skip 1} | del -WhatIf

Open in new window

0
Chris LopezAuthor Commented:
hello I ran the script and it removes every file in the folder and only keeps the file that is the largest size in the folder. I want it to keep the largest file from every instance of duplicates and delete the smaller size duplicates.
0
oBdACommented:
I understand what it is you want to do, but if the script tries to delete every file in the folder, then your naming convention doesn't match what you posted here.
This is what I get with a group of test files matching your names and relative sizes:
PS D:\Temp> gci *.mp4 | select Name, Length | ft -au

Name      Length
----      ------
111~1.mp4    102
111~2.mp4    502
111~3.mp4    202
222~1.mp4    302
333~1.mp4    202
333~2.mp4    402
444~1.mp4    102
444~2.mp4    502
444~3.mp4    302
444~4.mp4    702


PS D:\Temp> gci . *.mp4 | group @{e={$_.Name.Split('~')[0]}} | % {$_.Group | Sort Length -Desc | Select -Skip 1} | del -WhatIf
What if: Performing the operation "Remove File" on target "D:\Temp\111~3.mp4".
What if: Performing the operation "Remove File" on target "D:\Temp\111~1.mp4".
What if: Performing the operation "Remove File" on target "D:\Temp\333~1.mp4".
What if: Performing the operation "Remove File" on target "D:\Temp\444~2.mp4".
What if: Performing the operation "Remove File" on target "D:\Temp\444~3.mp4".
What if: Performing the operation "Remove File" on target "D:\Temp\444~1.mp4".

Open in new window

Try
gci . *.mp4 | group @{e={$_.BaseName -replace '~\d+\Z'}} | % {$_.Group | Sort Length -Desc | Select -Skip 1} | del -WhatIf

Open in new window

or provide a more detailed description of your naming convention.
1
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

Chris LopezAuthor Commented:
yeah im sorry the file names are

newyork~1517057186~17921502268028203~1.mp4
newyork~1517057186~17921502268028203~2.mp4
newyork~1517057186~17921502268028203~3.mp4
0
oBdACommented:
Then the updated script at https:#a42452498 should do the trick.
1
Chris LopezAuthor Commented:
thanks it did work, how would I be able to run this on a directory and ahve it apply to all sub-directories? I tried going up a level on the PS but that didnt work.
0
oBdACommented:
Question beforehand to make it more fail-safe: is the "index" at the end always a single digit?
0
Chris LopezAuthor Commented:
yes  it will always be a single digit number at the end.
0
oBdACommented:
This will now process subfolders, and make sure to only handle files that end with a ~ and a single digit; the '.' in the gci command can be replaced with an absolute path as well.
gci C:\Temp *.mp4 -Recurse | ? {$_.BaseName -match '~\d\z'} | group @{e={$_.BaseName -replace '~\d\Z'}} | % {$_.Group | Sort Length -Desc | Select -Skip 1} | del -WhatIf

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Chris LopezAuthor Commented:
thanks, worked perfectly.
0
oBdACommented:
Question answered.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Batch

From novice to tech pro — start learning today.