Compare many files with SHA1 and delete one if them.

Dear expert oBdA

Develop below code just a idea... if under c:\files\ got 200 files, I want to encrypt them all with SHA1 and compare them all and if there are file which are same hash, delete one of them.

Possible?

Get-ChildItem "C:\files\" -Filter *.* | 
Foreach-Object | New-Object -TypeName System.Security.Cryptography.SHA1CryptoServiceProvider{
$sha1 = New-Object -TypeName System.Security.Cryptography.SHA1CryptoServiceProvider 
$hash1 = [System.BitConverter]::ToString($sha1.ComputeHash([System.IO.File]::ReadAllBytes($_)))
$hash2 = [System.BitConverter]::ToString($sha1.ComputeHash([System.IO.File]::ReadAllBytes($_)))
}
If ($hash1 -eq $hash2) {
	Remove-Item -Path $file (This wouldn't work well, any better idea?)
	$filecheck = 'SAME'
} Else {
	$filecheck = 'DIFFERENT'
}
return $filecheck 

Open in new window

LVL 1
WeTiAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Britt ThompsonSr. Systems EngineerCommented:
You'll want to loop through all the files and add store your hashes in an array as you loop through. If the array already contains the hash you can store it in a duplicates array that you can use to find and delete the duplicate hashes.

$Hashes = @()
$Duplicates = @()
Get-ChildItem "C:\files\" -filter *.* |
Foreach-Object {
$sha1 = New-Object -TypeName System.Security.Cryptography.SHA1CryptoServiceProvider
$hash = [System.BitConverter]::ToString($sha1.ComputeHash([System.IO.File]::ReadAllBytes($_)))
if($Hashes -notcontains $hash){ $Hashes += $hash } else { $Duplicates += $hash }
}
0
rastoiWindows DTS expertCommented:
very similar to this question.
0
oBdACommented:
Save as Find-DuplicateFile.ps1 or Whatever.ps1.
It has the usual cmdlet support for -Verbose, -WhatIf and -Confirm, so you can test what it would do.
To avoid unnecessary processing time, it'll only create file hashes (which are somewhat costly) for files with the same size.
To see what it would do, without actually deleting anything, run it the first time as .\Find-DuplicateFile.ps1 -Path C:\files -Verbose -WhatIf
To be able to confirm every deletion, drop the -WhatIf.
If you're sure it works correctly for the path given, and you don't want to confirm at all, add -Confirm:$false
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
Param(
	[string]$Path,
	[string]$Filter,
	[switch]$Recurse
)
$sha1 = New-Object -TypeName System.Security.Cryptography.SHA1CryptoServiceProvider
$splat = @{}
'Path', 'Filter', 'Recurse' | ForEach-Object {$splat[$_] = $PSBoundParameters[$_]}
Get-ChildItem @splat -File |
	Group-Object -Property Length |
	Where-Object {$_.Count -gt 1} |
	ForEach-Object {
		$_.Group |
			Select-Object -Property FullName, @{n='Hash'; e={[System.BitConverter]::ToString($sha1.ComputeHash([System.IO.File]::ReadAllBytes($_.FullName)))}} |
			Group-Object -Property Hash |
			Where-Object {$_.Count -gt 1} | ForEach-Object {
				Write-Verbose "===== Duplicates with hash '$($_.Name)' ====="
				$_.Group | Select-Object -ExpandProperty FullName | Write-Verbose
				$keepFile = $_.Group[0].FullName
				$_.Group | Select-Object -Skip 1 -ExpandProperty FullName | ForEach-Object {
					If ($PSCmdlet.ShouldProcess($_, "Delete duplicate of '$($keepFile)'")) {
						Remove-Item -Path $_ -WhatIf:$false -Confirm:$false
					}
				}
			}
	}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
How do you know if your security is working?

Protecting your business doesn’t have to mean sifting through endless alerts and notifications. With WatchGuard Total Security Suite, you can feel confident that your business is secure, meaning you can get back to the things that have been sitting on your to-do list.

WeTiAuthor Commented:
oBdA solutions works well, but I would make it run silently without a confirm window.
0
oBdACommented:
As I said: just add -Confirm:$false to the command line.
Or in the first script line, replace "ConfirmImpact='High'" with "ConfirmImpact='Low'", but that's absolutely against all "best practice" guidelines. Removing multiple items usually has a heavy impact, so doing so should have to be confirmed explicitly.
0
WeTiAuthor Commented:
oBdA is right, and his script works for me, thanks for the assist from Britt and Ratoi.
0
WeTiAuthor Commented:
Now I don't want to have param () value, I would want a static values for $Path and $filter and $recurse. How do I do now? like $Path = 'c:\temp\' $splat =@{$Path}?
0
oBdACommented:
Then integrate it as function at the beginning of your script, and call it when you need it.
Function Remove-DuplicateFile {
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
Param(
	[string]$Path,
	[string]$Filter,
	[switch]$Recurse
)
	$sha1 = New-Object -TypeName System.Security.Cryptography.SHA1CryptoServiceProvider
	$splat = @{}
	'Path', 'Filter', 'Recurse' | ForEach-Object {$splat[$_] = $PSBoundParameters[$_]}
	Get-ChildItem @splat -File |
		Group-Object -Property Length |
		Where-Object {$_.Count -gt 1} |
		ForEach-Object {
			$_.Group |
				Select-Object -Property FullName, @{n='Hash'; e={[System.BitConverter]::ToString($sha1.ComputeHash([System.IO.File]::ReadAllBytes($_.FullName)))}} |
				Group-Object -Property Hash |
				Where-Object {$_.Count -gt 1} | ForEach-Object {
					Write-Verbose "===== Duplicates with hash '$($_.Name)' ====="
					$_.Group | Select-Object -ExpandProperty FullName | Write-Verbose
					$keepFile = $_.Group[0].FullName
					$_.Group | Select-Object -Skip 1 -ExpandProperty FullName | ForEach-Object {
						If ($PSCmdlet.ShouldProcess($_, "Delete duplicate of '$($keepFile)'")) {
							Remove-Item -Path $_ -WhatIf:$false -Confirm:$false
						}
					}
				}
		}
}

REM ... other script ...
Remove-DuplicateFile -Path 'C:\Temp' -Filter *.txt -Recurse -Confirm:$false
REM ... other script ...

Open in new window

1
Chris LopezCommented:
@oBdA

first off thanks your help here has helped me aswell, I just have another question will it be possible to only check files against other files int he same folder? I don't need to check all files in all the directories against each other just the files inside the same directory. for instance folder 1 files will only need to be checked against other files in folder 1, and folder 2 only against other files in folder 2 etc etc
0
oBdACommented:
Chris,
just don't use the -Recurse switch, and it will stay inside the specified directory.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Powershell

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.