D B
asked on
Powershell String Parsing with Regular Expression
I had this question after getting PowerShell Regular Expression answered.
This is the third spec change and I've been told the last (who hasn't heard that before?)
The file can have one of two layouts (note 'XYZ' and either 'TERM' or 'DENIED' are static values):
XYZ|ASSOC_ID|USER_ID|TERM
XYZ|ASSOC_ID|USER_ID|DENIE D|SERVER_CLIENT_ROLE
A sample file might be:
XYZ|0034192|ab123456|TERM
XYZ|09681|ab192837|TERM
XYZ|0192847|ab98765|DENIED |TESTSERVE R_GOOGLE_A NYTHING
XYZ|6273|ab11839|DENIED|PR ODSERVER_M ICROSOFT_W HATEVER
Each row will have at least four elements that are pipe delimited. Optionally, there can be an additional element, that is made up of server, client and role, which are each delimited by an underscore character.
The parsing needs two results: 1) an array that contains the third element (USER_ID) and SERVER and CLIENT from the fifth element if the fourth element has the value 'DENIED'. However, if the fourth element has the value 'TERM', using the format from the above referenced question, the output would be a single string, consisting of ONLY USER_ID values, be comma-delimited and each USER_ID enclosed in single quotes .
Given the example above, the array with 'DENIED' would contain:
USER_ID SERVER CLIENT
ab98765 TESTSERVER GOOGLE
ab11839 PRODSERVER MICROSOFT
And the string of values for rows with 'TERM' would contain
'ab123456','ab192837'
I realize this will probably take two passes (one to build the array, one to build the string) and that is fine (and probably simpler to follow and maintain for those of you who want to do the whole thing in one line of code :-) )
This is the third spec change and I've been told the last (who hasn't heard that before?)
The file can have one of two layouts (note 'XYZ' and either 'TERM' or 'DENIED' are static values):
XYZ|ASSOC_ID|USER_ID|TERM
XYZ|ASSOC_ID|USER_ID|DENIE
A sample file might be:
XYZ|0034192|ab123456|TERM
XYZ|09681|ab192837|TERM
XYZ|0192847|ab98765|DENIED
XYZ|6273|ab11839|DENIED|PR
Each row will have at least four elements that are pipe delimited. Optionally, there can be an additional element, that is made up of server, client and role, which are each delimited by an underscore character.
The parsing needs two results: 1) an array that contains the third element (USER_ID) and SERVER and CLIENT from the fifth element if the fourth element has the value 'DENIED'. However, if the fourth element has the value 'TERM', using the format from the above referenced question, the output would be a single string, consisting of ONLY USER_ID values, be comma-delimited and each USER_ID enclosed in single quotes .
Given the example above, the array with 'DENIED' would contain:
USER_ID SERVER CLIENT
ab98765 TESTSERVER GOOGLE
ab11839 PRODSERVER MICROSOFT
And the string of values for rows with 'TERM' would contain
'ab123456','ab192837'
I realize this will probably take two passes (one to build the array, one to build the string) and that is fine (and probably simpler to follow and maintain for those of you who want to do the whole thing in one line of code :-) )
Does your file have a header row? If so, what is it?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Note that aikimark's suggestion above assumes there is no header in the file, but it doesn't matter as long as the third column header is neither TERM nor DENIED.
However, "we batchelors" try not to use something like
However, "we batchelors" try not to use something like
$A=Import-Csv -delimiter "|" -path $TermPathAndFilename -header "F1","F2","USER_ID","TD","F5"
$A | %{
unless there is a need for having direct access to the content beyond for the pipe ;-). It does no harm to compress it to:Import-Csv -delimiter "|" -path $TermPathAndFilename -header "F1","F2","USER_ID","TD","F5" |
%{
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Qlemo: I am a bit confused by your comment. Your second code block is identical to the first; there does not appear to be anything 'compressed.'
Also, your comment, "unless there is a need for having direct access to the content beyond for the pipe." If you are talking about the data I've labeled SERVER, CLIENT, ROLE, it is certainly required (at least SERVER and CLIENT are, as those are elements of the $denied array). Were you thinking of something else?
Also, your comment, "unless there is a need for having direct access to the content beyond for the pipe." If you are talking about the data I've labeled SERVER, CLIENT, ROLE, it is certainly required (at least SERVER and CLIENT are, as those are elements of the $denied array). Were you thinking of something else?
Both code snippets are not the same, the second is not using $A. This can make a difference for larger files. The pipe does not store more than one line at any time; data is processed "on the fly", which is important if you conditionally dismiss a lot of info in the loop.
And since we do not need to have access to the original data outside of the loop, only to the result of the loop, storing the file content in a variable just consumes memory unnecessarily.
You will not see any difference usually, but if you go into larger files (hundreds of MB), there is a significant effect.
My own code example (using switch) is efficient to that regard too, as it uses optimized internal processing for larger files (if compared to get-content and switch, or import-csv and switch in a loop).
As I see it, your current choice of script should be what you think you can understand and expand more easily. RegEx can get complex and confusing, and is difficult to read. The regex used here is still ok ;-).
And since we do not need to have access to the original data outside of the loop, only to the result of the loop, storing the file content in a variable just consumes memory unnecessarily.
You will not see any difference usually, but if you go into larger files (hundreds of MB), there is a significant effect.
My own code example (using switch) is efficient to that regard too, as it uses optimized internal processing for larger files (if compared to get-content and switch, or import-csv and switch in a loop).
As I see it, your current choice of script should be what you think you can understand and expand more easily. RegEx can get complex and confusing, and is difficult to read. The regex used here is still ok ;-).
ASKER
Ok, I understand now (and see the difference). I was only looking at the first line and for some reason my mind didn't see the difference in the 2nd line. Makes sense. I will probably be dealing with a file that has no more than 10-20 rows of data. I am getting somewhat better at regex, but learning it well is learning a language within a language.
I grew up a COBOL/PASCAL/BASIC then VB then SQL Server developer so all these |, %, '.*\|.*\|(?<...' {} and stuff as well as piping and is taking a lot of getting use to. I really wish I had a lot better comprehension than I do. I tend to pick up PowerShell and use it when I have a task that I feel is best accommodated using it, but don't have the luxury of the time required to master it (plus not sure, at my age--referencing COBOL and PASCAL probably gives you an idea--that I really want to) ;-)
I do have one additional question that came up once I started integrating this into my script, and I will be more than happy to create a new question. I have the $DENIED array, that contains USER, SERVER and CLIENT. There is another array that contains CLIENT and DATABASE. Is there an easy way to match CLIENT between the two arrays and add DATABASE to the $DENIED array, so that it contains USER, SERVER, CLIENT, DATABASE?
I grew up a COBOL/PASCAL/BASIC then VB then SQL Server developer so all these |, %, '.*\|.*\|(?<...' {} and stuff as well as piping and is taking a lot of getting use to. I really wish I had a lot better comprehension than I do. I tend to pick up PowerShell and use it when I have a task that I feel is best accommodated using it, but don't have the luxury of the time required to master it (plus not sure, at my age--referencing COBOL and PASCAL probably gives you an idea--that I really want to) ;-)
I do have one additional question that came up once I started integrating this into my script, and I will be more than happy to create a new question. I have the $DENIED array, that contains USER, SERVER and CLIENT. There is another array that contains CLIENT and DATABASE. Is there an easy way to match CLIENT between the two arrays and add DATABASE to the $DENIED array, so that it contains USER, SERVER, CLIENT, DATABASE?
No, there is no easy way. You need to compare or search, and then extend, each single member.
However, if you use the advanced technique of hash tables instead of arrays, at least for the database one, then you can look up the database directly.
That is, you build your database "array" like this:
However, if you use the advanced technique of hash tables instead of arrays, at least for the database one, then you can look up the database directly.
That is, you build your database "array" like this:
$db = @{}
# looping code
$db[$client] = $database
# ...
you can get the DB just by using $db[$client]. If we integrate that into my code (with $db already filled, of course):$Term = ''
$Denied = @()
switch -regex -file c:\temp\ee\Q_28988112.txt
{
'.*\|.*\|(?<ID>.*)\|TERM' { $Term += "'$($matches['ID'])'," }
'.*\|.*\|(?<ID>.*)\|DENIED\|(?<srv>.+)_(?<clt>.+)_.*' { $Denied += [PSCustomObject] @{
user_ID = $Matches['ID']
Server = $Matches['Srv']
Client = $Matches['Clt']
Database = $db[$Matches['Clt']
}
}
}
$Term = $Term.Remove($Term.Length-1)
Write-Host "Term: $Term"
Write-Host "Denied:"
$Denied
Note that you cannot use the Client property, it does not exist at that point in time, so we have to repeat the $Matches['Clt'].
ASKER
I did find Add-Member that allows me to add Database as a new column, but still trying to figure out the matching logic to copy the value of DATABASE from my $ServerClientDatabase array to the $Denied array when I match on CLIENT.
BTW, COBOL/PASCAL/BASIC/VB applies to all three of us, at least age-wise. I never used COBOL, but learned it for fun in my early teen years. Like another dozen programming languages ...
Using PowerShell effectively is a science for itself. The good news is that you can use it very similar to other programming and scripting languages, and then move to more advanced techniques step by step. It is important to keep in mind there are more effective or flexible ways, but that does not mean you have to use them from the beginning or at all ;-).
Using PowerShell effectively is a science for itself. The good news is that you can use it very similar to other programming and scripting languages, and then move to more advanced techniques step by step. It is important to keep in mind there are more effective or flexible ways, but that does not mean you have to use them from the beginning or at all ;-).
Back on topic: If you do not want to change the way you generate your (client, database) array, you'll have to compare "manually":
# Generating test data
$db = @"
client,database
PC1,Db1
PC2,Db2
"@ -split "`n" | convertFrom-CSV
$denied = @"
User_ID,Server,Client
Usr1,Srv1,PC1
Usr2,Srv2,PC2
Usr3,Srv3,PC1
"@ -split "`n" | convertFrom-CSV
foreach ($obj in $denied) {
$curDB = $db | ? { $_.Client -eq $obj.Client }
$obj | Add-Member Noteproperty Database $curDB.Database
}
$Denied
ASKER
For fairness, I've asked a new question. I tried (which I prefer) your initial method of adding it inside the loop that builds the array, but didn't get any results.
ASKER
I'm wondering if the issue might be Database = $db[$Matches['Clt']]
since the $db table contains ctl (actually client_id, but I changed the code accordingly) and the element I am wanting from thethe table is db_name when I match client_id, but I am not seeing that referenced.
I am going to close this question and award points. Feel free to post the solution in the new question.
since the $db table contains ctl (actually client_id, but I changed the code accordingly) and the element I am wanting from thethe table is db_name when I match client_id, but I am not seeing that referenced.
I am going to close this question and award points. Feel free to post the solution in the new question.
ASKER
I am splitting points between aikimark and Qlemo. I originally went with aikimark's solution as, for me, and I believe the rest of the team, it will be easier to understand. However, QLemo's provides some benefits, especially in light of the subsequent question I asked and have linked to, so it may end up being what I implement, with a lot of comments for us non PowerShelll guru types who will have to subsequently maintain it. Unlike most developers, I love commenting my code :-)
I still don't know if your file has headers or not. The Import-CSV statement will be different if it does have headers as will any columnar/field references in the script.
aikimark, as I have said it does not matter here as long as the switch will not have a hit. You provide the headers with import-csv, and so (a) know the header names and (b) get headers, should they be included in the CSV file, as first line of data. Usually that makes a big difference, but we are lucky here.
@Q
If the file did have headers, it is possible that the first row might cause a logic error in the script.
If the file did have headers, it is possible that the first row might cause a logic error in the script.
... which conditions have been described by me.
ASKER
Open in new window
From what PowerShell I am learning, I do think I need another <Col#> designator '\A(?<Col1>.*?)\|(?<Col2>.