Link to home
Start Free TrialLog in
Avatar of sirbounty
sirbountyFlag for United States of America

asked on

Powershell regex followup

An expert previously helped me on devising this function that locates certain key/value pairs from a passed email body.
I ran across a snag today and would like to sort it.
The ID field should be presented as
ID:BD123456 (where 1-6 are alphanumeric characters).
However, some users provide an unnecessary suffix (BD123456-7) which is irrelevant, so it can safely be ignored.
I believe that is how it is currently designed in the function code below.
However, today's issue arose when an errant trailing colon was placed at the end of this string (ID:BD12346:).
Can this be redesigned to simply locate the ID field and then take anything starting with BD and the trailing 6 characters, ignoring anything afterwards?
This is how that field is currently referenced:
'ID' =			"$($Prefix)\S{6}(-.)?"

Open in new window

And here's the function code:
           
 $TS = (Get-Culture).NumberFormat.NumberGroupSeparator
	    $DS = (Get-Culture).NumberFormat.NumberDecimalSeparator
	    $reTS = [regex]::Escape($TS)
	    $reDS = [regex]::Escape($DS)
	    $AcceptedSizeUnits = 'kb', 'mb'
	    $DefaultSizeUnit = 'kb'
	    $AcceptedTrue = 'y', 't', '1', 'yes', 'true'
	    $AcceptedFalse = 'n', 'f', '0', 'no', 'false'
	    ## Define all accepted fields here.
	    ## Format: "<Field TYPE>" = "Field Name 1[|Alias 1[|Alias 2[...]]]"[, "Field Name 2[|Alias 1[|Alias 2[...]]]"]
	    $AcceptedFields = New-Object -TypeName PSObject -Property @{
		    'Bool' =		'Hide|Hidden|Members Hidden|Hide Members'
		    'ByteSize' =	'Size|MaxSize|Max Size'
		    'Date' =		'EndDate|End Date|Expiration|Expiration Date|DeletionDate|Deletion Date'
		    'EmpID' =		'Owner|PrimaryOwner|Primary Owner',
						    'Backup|BackupOwner|Backup Owner|SecondaryOwner|Secondary Owner'
		    'ID' =			'ID'
		    'String' =		'Display|DisplayName|Display Name'
	    }
	    ## Define the REs to identify the field TYPEs defined above here
	    $reAcceptedFields = New-Object -TypeName PSObject -Property @{
		    'Bool' =		"($(($AcceptedTrue + $AcceptedFalse) -join '|'))"
		    'ByteSize' =	"(\d+|\d{1,3}($($reTS)\d{3})*)($($reDS)\d+)?(\s*($($AcceptedSizeUnits -join '|')))?"
		    'Date' =		'\d{1,2}/\d{1,2}/(\d{2}|\d{4})'
		    'EmpID' =		'\S+'
		    'ID' =			"$($Prefix)\S{6}(-.)?"
		    'String' =		'.*?'
	    }
	    $AcceptedFieldsList = $AcceptedFields | 
            Get-Member -MemberType NoteProperty | Where-Object {$PSItem.Name} | 
            ForEach-Object {$AcceptedFields.($PSItem.Name)} | ForEach-Object {$PSItem.Split('|')[0]}
	    $Result = "" | Select-Object -Property ($AcceptedFieldsList + "Errors")
	    $Result.Errors = @()
	    $reGetBool =		"(?:\A|\s*)(?<Field>$($AcceptedFields.Bool -join '|'))\s*:\s*(?<Data>$($reAcceptedFields.Bool))(?:\s|\Z)"
	    $reGetByteSize =	"(?:\A|\s*)(?<Field>$($AcceptedFields.ByteSize -join '|'))\s*:\s*(?<Data>$($reAcceptedFields.ByteSize))(?:\s|\Z)"
	    $reGetDate =		"(?:\A|\s*)(?<Field>$($AcceptedFields.Date -join '|'))\s*:\s*(?<Data>$($reAcceptedFields.Date))(?:\s|\Z)"
	    $reGetEmpID =		"(?:\A|\s*)(?<Field>$($AcceptedFields.EmpID -join '|'))\s*:\s*(?<Data>$($reAcceptedFields.EmpID))(?:\s|\Z)"
	    $reGetID =			"(?:\A|\s*)(?<Field>$($AcceptedFields.ID -join '|'))\s*:\s*(?<Data>$($reAcceptedFields.ID))(?:\s|\Z)"
	    $reGetString =		"(?:\A|\s*)(?<Field>$($AcceptedFields.String -join '|'))\s*:\s*(?<Data>$($reAcceptedFields.String))(?:\s*\Z)"
	    $reCatchErrors =	"(?:\A\s*)(?<Field>\S+?)\s*:\s*(?<Data>.*?)(?:\s*\Z)"
	    $LineNumber = 0
        if ($emailbody.Contains('From:')) {
            $emailbody = $emailbody.Substring(0,$emailbody.IndexOf('From:'))}
	    Switch -regex ($emailBody.Replace("`r`n", "`r").Split("`r")) {
		    ".*" { $LineNumber += 1 }
		    $reGetBool {
			    $Field = ($AcceptedFields.Bool | Where-Object {$PSItem.Split('|') -contains $Matches['Field']}).Split('|')[0]
			    $RawData = $Matches['Data']
			    "[$($LineNumber)] Found field '$($Matches['Field'])'$(If ($Matches['Field'] -ne $Field) {" --> Alias for '$($Field)'"}): '$($RawData)'" | Write-Verbose
			    $Result.$Field = $AcceptedTrue -contains $RawData ; Continue
		    }
		    $reGetByteSize {
			    $Field = ($AcceptedFields.ByteSize | Where-Object {$PSItem.Split('|') -contains $Matches['Field']}).Split('|')[0]
			    $RawData = $Matches['Data']
			    "[$($LineNumber)] Found field '$($Matches['Field'])'$(If ($Matches['Field'] -ne $Field) {" --> Alias for '$($Field)'"}): '$($RawData)'" | Write-Verbose
			    $Data = If ($RawData.ToLower().EndsWith('b')) {$RawData} Else {$RawData + $DefaultSizeUnit}
			    $Result.$Field = ([int64]1 * $Data.Replace($TS, '').Replace($DS, '.').Replace(' ', '')/1kb) ; Continue
		    }
            $reGetDate {
	            $Field = ($AcceptedFields.Date | Where-Object {$PSItem.Split('|') -contains $Matches['Field']}).Split('|')[0]
	            $RawData = $Matches['Data']
	            Try {
		            $Result.$Field = ([DateTime]$RawData).ToShortDateString()
		            "[$($LineNumber)] Found field '$($Matches['Field'])'$(If ($Matches['Field'] -ne $Field) {" --> Alias for '$($Field)'"}): '$($RawData)'" | Write-Verbose
		            Continue
	            } Catch { } # Just let it fall through to $reCatchErrors
            }
		    $reGetID {
			    $Field = ($AcceptedFields.ID | Where-Object {$PSItem.Split('|') -contains $Matches['Field']}).Split('|')[0]
			    $RawData = $Matches['Data']
			    "[$($LineNumber)] Found field '$($Matches['Field'])'$(If ($Matches['Field'] -ne $Field) {" --> Alias for '$($Field)'"}): '$($RawData)'" | Write-Verbose
			    $Result.$Field = $RawData.substring(0,8) ; Continue
		    }
		    $reGetEmpID {
			    $Field = ($AcceptedFields.EmpID | Where-Object {$PSItem.Split('|') -contains $Matches['Field']}).Split('|')[0]
			    $RawData = $Matches['Data']
			    "[$($LineNumber)] Found field '$($Matches['Field'])'$(If ($Matches['Field'] -ne $Field) {" --> Alias for '$($Field)'"}): '$($RawData)'" | Write-Verbose
			    $Result.$Field = $RawData ; Continue
		    }
		    $reGetString {
			    $Field = ($AcceptedFields.String | Where-Object {$PSItem.Split('|') -contains $Matches['Field']}).Split('|')[0]
			    $RawData = $Matches['Data']
			    "[$($LineNumber)] Found field '$($Matches['Field'])'$(If ($Matches['Field'] -ne $Field) {" --> Alias for '$($Field)'"}): '$($RawData)'" | Write-Verbose
			    $Result.$Field = $RawData.Replace(([char]8217), "'").replace(([char]8211),"-") ; Continue
		    }
		    $reCatchErrors {
			    $Field = $Matches['Field']
			    $RawData = $Matches['Data']
			    "[$($LineNumber)] Found potential field '$($Field)': '$($RawData)'" | Write-Warning
			    $Result.Errors += New-Object -TypeName PSObject -Property ([ordered]@{
				    'Line' = $LineNumber ; 'Field' = $Field ; 'Data' = $RawData})
		    }
		    Default { $LineNumber += 1 }
	    }
	    $Result

Open in new window

Avatar of SubSun
SubSun
Flag of India image

Check this..
PS C:\> "ID:BD123456-7" -match "ID:(?<ID>\w{2}\d{6}).*"
True
PS C:\> $Matches.ID
BD123456
PS C:\> "ID:BD123456" -match "ID:(?<ID>\w{2}\d{6}).*"
True
PS C:\> $Matches.ID
BD123456

Open in new window

The regular expression should work, however If you can provide a sample input and the complete script, then I would be able check and tell what's wrong in script.
Avatar of sirbounty

ASKER

The full code is quite complex and spread across multiple scripts.
This is the entire function though.
Are you suggesting that this line
'ID' =                  "$($Prefix)\S{6}(-.)?"
be replaced with
'ID' =                  "$($Prefix)\w{2}\d{6}).*"

What is the difference in S(6) and w(2)?
\S{6} matches any character which is not a white space exactly 6 times
Match 123456, %$#@#$ TESTST etc..
\w{2}\d{6} matches any word character exactly 2 times followed by digit exactly 6 times
Match format BD123456 AC121211 etc..

What is the value for $($Prefix)
$Prefix = 'BD'
Which should always be BD, but the latter 6 could be any alphanumeric...
EXPERT CERTIFIED SOLUTION
Avatar of Qlemo
Qlemo
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If it should match any alphanumeric 6 times which comes after BD, like Qlemo said removing (-.) from your expression should work. Are you directly matching the data with this pattern and extract the match? or are you combining this pattern with other patterns to get the final result?
PS C:\> "BD@%@^@&" -match "BD\S{6}(-.)?"
True
PS C:\> $Matches

Name                           Value
----                           -----
0                              BD@%@^@&


PS C:\> "BD@%@^@&-7" -match "BD\S{6}(-.)?"
True
PS C:\> $Matches

Name                           Value
----                           -----
1                              -7
0                              BD@%@^@&-7


PS C:\> "BD@%@^@&" -match "(BD\S{6})(-.)?"
True
PS C:\> $Matches

Name                           Value
----                           -----
1                              BD@%@^@&
0                              BD@%@^@&


PS C:\> "BD@%@^@&-7" -match "BD\S{6}"
True
PS C:\> $Matches

Name                           Value
----                           -----
0                              BD@%@^@&
PS C:\>

Open in new window

I'll try this adjustment today.
I am merely trying to extract the BDxxxxxx reference out of the email body, so long as it follows, in some shape or form, an ID definition.

Could be,
ID:BD123456
ID: BD9d4c61
ID: BD3a461f-R
ID : BD33ab45
etc.
I ultimately want to retrieve only the BD and 6 characters following (the -R in the 3rd sample is irrelevant for my needs, but is sometimes included, which is why I think the original expert included the (-.)
If your current code matches "ID:BD123456" but not "XY:BD123456", then field names are checked for and the less restrictive match should work fine. Otherwise expect wrong matches.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hey guys - I'm so sorry, I had a family emergency, but I will test this out and get it closed tomorrow.  Again, my apologies.
Thanks guys - again my apologies for the delay.
No problem, Thanks for closing the question!