Leo Torres
asked on
Powershell to format a file
I need to read a file that looks like below and is turned into columns and rows so i can insert into a SQL table.
Here is a break down:
20 20070426 20070426 83100 msg
PP YYYYMMDD YYYYMMDD Code 501
After the msg if one exists the pattern repeats up to 25 times in 1 line
I have a sample here of raw data, Included an actual RAW file with what I would like the final data to look like.
Formatted.xlsx
Here is a break down:
20 20070426 20070426 83100 msg
PP YYYYMMDD YYYYMMDD Code 501
After the msg if one exists the pattern repeats up to 25 times in 1 line
I have a sample here of raw data, Included an actual RAW file with what I would like the final data to look like.
202007042620070426 83100 501
202007053120070531 99520 501 202007053120070531 53011 202007053120070531 4010 202007053120070531 2724
202007051520070515 4241 202007051520070515 4019 202007051520070515 4439
RAW-sample.txtFormatted.xlsx
ASKER
No not correct. This data is clustered. Not sure If I am making any sense.
OK let me see if I explain it this way.
Suppose you have 5 column which we do in this case. The last column is optional (msg).
Then suppose you have 5 rows. Instead of the data rows being placed vertical as it is in tables and columns. The data in this instance is horizontal.
Here is an illustration. Suppose the numbers represent the column in which the data belongs the location of the numbers represent how they can be organized in the file
1 2 3 4 5
1 2 3 4 5 1 2 3 4 1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 5 1 2 3 4
1 2 3 4 5 1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Hope this help preview file
Will test your code on file shortly
No that I have reviewed and tested code. it works up to column 28 in the file but what is happening is that the same columns are repeating themselves after column 28 from position 33 to position 62 in text file. then it repeats again. I believe it can do that up to 12 times is the maximum.
Again think of a table with 3 or x rows instead of the data being vertical the data is horizontal.
OK let me see if I explain it this way.
Suppose you have 5 column which we do in this case. The last column is optional (msg).
Then suppose you have 5 rows. Instead of the data rows being placed vertical as it is in tables and columns. The data in this instance is horizontal.
Here is an illustration. Suppose the numbers represent the column in which the data belongs the location of the numbers represent how they can be organized in the file
1 2 3 4 5
1 2 3 4 5 1 2 3 4 1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 5 1 2 3 4
1 2 3 4 5 1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Hope this help preview file
Will test your code on file shortly
No that I have reviewed and tested code. it works up to column 28 in the file but what is happening is that the same columns are repeating themselves after column 28 from position 33 to position 62 in text file. then it repeats again. I believe it can do that up to 12 times is the maximum.
Again think of a table with 3 or x rows instead of the data being vertical the data is horizontal.
ASKER
Code wise this gave me an error when I ran it but its what I am trying to do.
Get-Content C:\Users\ltorres\Documents\Powershell\SampleDataRAPS.txt | select `
@{n='POS' ; e= {-join $_[ 0.. 1]}},
@{n='Start'; e= {-join $_[ 2.. 9]}},
@{n='End' ; e= {-join $_[10..17]}},
@{n='Code' ; e= {-join $_[19..23]}},
@{n='msg' ; e= {-join $_[26..28]}},
@{n='POS' ; e= {-join $_[33.. 34]}},
@{n='Start'; e= {-join $_[35.. 42]}},
@{n='End' ; e= {-join $_[43..50]}},
@{n='Code' ; e= {-join $_[51..55]}},
@{n='msg' ; e= {-join $_[56..58]}}
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This look really promising. I this data is in one SQL Column and your code seems to be doing the trick I will continue to test.. Thank you for your efforts.
Leo, look at your Formatted.xlsx please, and see why I was wondering about that! It is important to provide correct examples ;-).
The clustered data do not have any relation, they are just clustered for some other reason? Because just "streaming" the content into rows removes any relation, that is important.
It is also important that the "row" length for a single record is the same each time, i.e. optional parts are filled with spaces, but it appears to be that way.
The clustered data do not have any relation, they are just clustered for some other reason? Because just "streaming" the content into rows removes any relation, that is important.
It is also important that the "row" length for a single record is the same each time, i.e. optional parts are filled with spaces, but it appears to be that way.
The more traditional attempt is like
Get-Content Raw-sample.txt | % {
$line = $_
while ($line[0] -ne ' ') {
-join $line[0..29]
if ($line.Length -gt 32) {$line = $line.Remove(0,32)} else { $line = ' ' }
}
} | select `
@{n='POS' ; e= {-join $_[ 0.. 1]}},
@{n='Start'; e= {-join $_[ 2.. 9]}},
@{n='End' ; e= {-join $_[10..17]}},
@{n='Code' ; e= {-join $_[19..23]}},
@{n='msg' ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv
Footech, having to use the regex twice does indeed look suspicious.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you...
Finally got a chance to look at this again. Here's what I was thinking (code below). I had forgotten that you could reference the groups by name like $_.groups["POS"].value instead of $_.groups[1].value. Get-Member wasn't jogging my memory. I changed the regex just a bit because I noticed some errors.
@Qlemo - I just took a glance at your code in #a40290619 and it's making my head spin. I'm going to have to spend a little time with it so I can understand what's happening. :)
Also, in your last comment, I've never seen this construct before...
$pattern = "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) +(?<Msg>[0-9]{3,7}(?![0-9]))?"
gi .\RAW-sample.txt | Select-String -Pattern $pattern -AllMatches | % {$_.matches} |
% { New-Object PsObject -Property @{
POS = $_.groups["POS"].value
Start = $_.groups["Start"].value
End = $_.groups["End"].value
Code = $_.groups["Code"].value
Msg = $_.groups["Msg"].value
}
} | Select POS,Start,End,Code,Msg |
Export-CSV result.csv -notype
@Qlemo - I just took a glance at your code in #a40290619 and it's making my head spin. I'm going to have to spend a little time with it so I can understand what's happening. :)
Also, in your last comment, I've never seen this construct before...
$match = $pattern.match((get-content RAW-sample.txt))
$(
while ($match.Success)
{
#stuff
}
$match = $match.NextMatch()
}
)
Firstly, the RegEx class parses the source string for each match (instead of the first per line). To get the next match, we have to call .NextMatch() obviously ;-).
The .Match requires a value as parameter. If using a cmdlet like I did, we need to make it a subexpression by enclosing in an additional pair of parens. Hence the strange look of ((get-content RAW-sample.txt)).
Lastly, foreach, if aso. statements do not push results into the pipeline, they are dismissed. Unless you embed them in an subexpression: $(while ... ).
The .Match requires a value as parameter. If using a cmdlet like I did, we need to make it a subexpression by enclosing in an additional pair of parens. Hence the strange look of ((get-content RAW-sample.txt)).
Lastly, foreach, if aso. statements do not push results into the pipeline, they are dismissed. Unless you embed them in an subexpression: $(while ... ).
ASKER
Thank you Qlemo for going above and beyond!
Open in new window