PowerShell Script to Compare Files

I have two directory structures. C:\Prototype\ and C:\Deployed\Branch\Customer\Code\
Under the Prototype directory are approximately 150 scripts.
Under the C:\Deployed\Branch\Customer\Code\ directory are over 4,000 scripts.

The original intent was any time a script in the Prototype directory was deployed to a Customer, under one or more Branch[es], that script code would be copied to correct location in the Deployed directory path. However, now we want to change things around. In order to make the necessary changes, we need to identify all occurrences of any script under the Deployed path that are the same (line for line) as the code in the Prototype directory. I need the full path, including filename written out to a file.

So, if I have the file a1.sql in C:\Prototype\ and a1.sql in C:\Deployed\Branch_a\Customer_a\Code\ and in C:\Deployed\Branch_b\Customer_a\Code\ and the code in C:\Deployed\Branch_a\Customer_a\Code\ is the same but the code in C:\Deployed\Branch_b\Customer_a\Code\ is different, I want to write a record that contains "C:\Deployed\Branch_a\Customer_a\Code\a1.sql".

If this sounds confusing, what we want to be left with after cleanup is a structure under C:\Deployed\... that contains only code that has been modified from the original Prototype code. I've played around with this quite a bit and am having trouble with the process to iterate through all the files and also with the code for comparing two files.

The way I envision this is to get a collection of filenames in the Prototype directory, then iterate through them. For each file, recurse through the Deployed directory, and when I find a matching filename, compare the contents of the two files. If they are the same, create a record with the path and filename to the file under the Deployed directory. After I've recursed through all directories, do the same with the next file. I know it can, and likely will take some time to execute, which is fine. We are not looking for lightning execution, just a way to cut down on the tedium of manually locating and comparing files.
LVL 15
Doug BishopDatabase DeveloperAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Doug BishopDatabase DeveloperAuthor Commented:
I believe the Compare-Object cmdlet is what I need to use to compare the contents:
Compare-Object (Get-Content $file1) (Get-Content $file2)
but not 100% sure of the format to not really produce any output but rather just return a true (equal) or false (not equal).
Doug BishopDatabase DeveloperAuthor Commented:
PowerShell can be sooo easy (and fun) if you just take time to work it out. It seems too logical at time. I was rushed when I posted this, and initial attempts were done with way too little caffeine in my body :-)

I need to work on it a bit more, but this seems to work:
$PrototypeFileNames = Get-ChildItem -Path C:\Code\Prototype\*.sql -File | % { $_.FullName }
$CustomerFileNames = Get-ChildItem -Path C:\Code -File -Recurse -filter *.sql | ?{ $_.fullname -notmatch "\\Code\\Prototype\\?" } | % { $_.FullName }

foreach ($PrototypeFilename in $PrototypeFileNames)
{
	$PrototypeFileContents = Get-Content $PrototypeFilename
	foreach ($CustomerFilename in $CustomerFileNames)
	{
		if ( $(Split-Path $CustomerFilename -Leaf) -eq $(Split-Path $PrototypeFilename -Leaf) )
		{
			if ((Compare-Object $PrototypeFileContents $(Get-Content $CustomerFilename)).Count -eq 0)
			{
				Write-Output "File matches Template: $CustomerFilename
			}
		}
	}
}

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
footechCommented:
One problem with the above.
A file containing
line 1
line 2

will be detected the same as a file containing
line 2
line 1

A quick test indicates that you can use the -SyncWindow parameter of Compare-Object set to 0 and it should be accurate.  I saw your question earlier, but didn't have the time to post - anyway my initial thought was to compute a hash for each file in C:\Prototype (storing the results), and then use that to compare to any file with a matching name.
Start would look something like
$PrototypeFiles = Get-ChildItem -Path C:\Code\Prototype\*.sql -File | Select FullName,@{n="Hash";e={Get-FileHash $_.FullName -Algorithm sha1| Select -expandProperty Hash}}

Open in new window

This would require PS 4.0+.
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

Doug BishopDatabase DeveloperAuthor Commented:
You are joking about line1/line2 right? That is like saying 42=24 :-)
In going to have to confirm that one myself.  Are you saying if file 1 contained 1,2,3 with each number on a line, that would match file 2 that contained 2,1,3?
footechCommented:
Without the -SyncWindow parameter set right, yes.
Doug BishopDatabase DeveloperAuthor Commented:
footech: I was not aware of the -SyncWindow parameter. I did some reading on it, and I guess it makes sense (to some degree) and doesn't (to another degree). My thoughts on the subject (and there are those who have argued the other side of the coin) is that if Compare-Object is comparing objects, 1-2-3 does not match 3-2-1 (-SyncWindow 0) unless you specifically set a value for -SyncWindow. But then, from some of the examples I've seen of it's use (e.g. comparing XML, AD, etc.) it makes better sense.

Thanks for the heads up. I appreciate it. I've actually made some minor changes (e.g. remove all blank lines, log results) and considering adding -SyncWindow 0, although the differences I am looking for would be actual code differences, and not just moving lines around. 1-2-3 DOES NOT equal 1-2-3-4 or 1-2-4 in any language. I am not looking for the actual differences, only whether or not they are the same.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Powershell

From novice to tech pro — start learning today.