We help IT Professionals succeed at work.

Powershell: Faster comparison of 2 large strings/files?

I have a working script, it is just way too slow...
The script compares a large text file of SMTP addresses against addresses in Active Directory.

I have done it two ways:
1st method: do a query directly against AD for each email address.
2nd: I put all of the AD ProxyAddresses, returned in a query, into a string.
Then I compare each address from the file to those values from AD.
Below is the 2nd method, which seems to be about the same speed as using AD directly for each check.

I bet you Powershell gurus know how to speed this up???


 
$Domain = "DC=YOUR,DC=local" # Root context of AD domain.
$SMTPDomain = "Sample.com"  # Looking for users@Sample.com

$SMTPList = get-content "C:\Script Files\TestAddresses.txt" 

$Filter = "(proxyaddresses=*smith@$SMTPDomain*)"     #### Limited to *smith...* for testing ###
$ProxyAddresses = get-qadobject -DontUseDefaultIncludedProperties -Includedproperties proxyaddresses -SearchRoot $domain -Sizelimit 0 -SearchScope Subtree -ldapFilter $Filter | Select proxyaddresses

ForEach ($email in $SMTPList) {
  	$Found = $null
	$Found = ($ProxyAddresses | Out-String | select-string -pattern $Email)
	
	If ($Found -eq $null)
	{
	 $ListNotFound=$ListNotfound+"`n"+$Email
	 Write-Host "Didn't Find: " $Email 
	} 
	Else 
	 { 
	 $ListFound=$ListFound+"`n"+$Email
	 Write-Host "Was Found: " $Email
	 }
}

$ListNotFound | out-file -filepath "C:\Temp\NotFound.txt"
$ListFound | out-file -filepath "C:\Temp\Found.txt"

Open in new window

Comment
Watch Question

Took 5 seconds to process about 350 addresses in a domain with around 70000 objects.

Replace the stuff in the Customise Vars section. I assume your input text file has the full email address you are looking for contained in it.

Your LDAP query was slowing it down a lot as you were looking for *emailaddress@domain.com* instead of just smtp:emailaddress@domain.com, but beyond that I hope I improved it a bit as well.

#Customise Vars
$Domain = "LDAP://DC=Domain,DC=com"
$TestAddressesFile = "C:\Temp\TestAddresses.txt"
$FoundFile = "C:\Temp\Found.txt"
$NotFoundFile = "C:\Temp\NotFound.txt"

#Gets Results from AD Query for Users
Function ADSIQueryUsers
{
	param
	(
		$OUPath,
		$LDAPQuery
	)
	
	$SearchOU = [ADSI]$OUPath

	$Searcher = New-Object System.DirectoryServices.DirectorySearcher($SearchOU)
	$Searcher.Filter = $LDAPQuery

	[array]$SearchResults = $Searcher.FindAll()
	
	return $SearchResults
}

$FoundArray = @()
$NotFoundArray = @()

$TestAddresses = gc $TestAddressesFile

foreach ($Address in $TestAddresses)
{
	$LDAPQueryFilterUsers = "(&(objectClass=user)(ProxyAddresses=smtp:$Address))"
	
	[array]$SearchResultsUsers = ADSIQueryUsers $Domain $LDAPQueryFilterUsers
	
	if ($SearchResultsUsers.Count -eq 1)
	{
		Write-Host "User Found : $Address"
		$FoundArray += $Address
	}
	elseif ($SearchResultsUsers -eq $Null)
	{
		Write-Host "Didn't Find : $Address"
		$NotFoundArray += $Address
	}
	else
	{
		Write-Host "Duplicate? : $Address"
	}
}

$FoundArray | Out-File $FoundFile
$NotFoundArray | Out-File $NotFoundFile

Open in new window

BRONZE EXPERT

Author

Commented:
Thanks. Works great.
After some testing, I guess your method is faster because of the ADSI methods.
I had to modify to look for all mail-enabled objects, not just users.

For me, from a PC, it is running in about 2.5 minutes.
100,000+ objects and checking for 5800+ email addresses from text file.

I had to modify a little, to look for all mail-enabled objects, not just users. Speed did not seem to change really. (objectClass=*)

You also pointed out something else I was unaware of: that you could query proxyaddresses with =smtp:....  Like '(proxyaddresses=smtp:user@acme.com). I had always used *'s such as (proxyaddresses=*user@acme.com*). Although I knew this was a multi-valued field, I didn't realize a value could be queried like that.

Excellent.

I would still be curious if there was a method to make the search better in memory.
After importing all data from text file and AD.
But the AD search seems to work pretty fast. It just seems to me that a memory search SHOULD be faster.

BRONZE EXPERT

Author

Commented:
Good stuff. Thanks!
Great solution for comparing a text file of values to values that are in Active Directory.
It has to retrieve the values from AD so I expect that is as quick as it will get more or less.

You can mess around with wrapping your commands/sections of your script in Measure-Command to benchmark speeds. i.e:

Measure-Command {
Your Script Line 1
Your Script Line 2
}

Maybe you can find the biggest bottleneck and look at reducing those sections.

I suspect that if you dumped out the whole of your AD fields into memory it would actually perform slower. I find if I get 60000 users worth of data from AD and enumerate the object count with $SearchResults.Count that it takes ages as it has to step through the list, and it uses up a lot of system RAM to boot. More than a couple of minutes just for counting the number of objects.

Subsequently comparing a list of 5800 objects against the list of 60000 objects would have to scan the list many many times, singular comparisons which perform quickly per check would seem logically quicker to me.

The only other thing I can think of is to check that you have indexed the proxy addresses field, check on Google for indexing AD fields, may speed up LDAP queries.