Powershell to format a file

I need to read a file that looks like below and is turned into columns and rows so i can insert into a SQL table.

Here is a break down:
20   20070426     20070426        83100   msg
PP  YYYYMMDD YYYYMMDD      Code     501


After the msg if one exists the pattern repeats up to 25 times in 1 line

I have a sample here of raw data, Included an actual RAW file with what I would like the final data to look like.


202007042620070426 83100  501                                                                                                                                                                                                                                                                                                   
202007053120070531 99520  501   202007053120070531 53011        202007053120070531 4010         202007053120070531 2724                                                                                                                                                                                                         
202007051520070515 4241         202007051520070515 4019         202007051520070515 4439 

Open in new window

RAW-sample.txt
Formatted.xlsx
LVL 8
Leo TorresSQL DeveloperAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

QlemoBatchelor, Developer and EE Topic AdvisorCommented:
The result should not have any of the data after the first msg in each line,  correct?
Get-Content Raw-sample.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

0
Leo TorresSQL DeveloperAuthor Commented:
No not correct. This data is clustered. Not sure If I am making any sense.

OK let me see if I explain it this way.

Suppose you have 5 column which we do in this case. The last column is optional (msg).

Then suppose you have 5 rows. Instead of the data rows being placed vertical as it is in tables and columns. The data in this instance is horizontal.

Here is an illustration. Suppose the numbers represent the column in which the data belongs the location of the numbers represent how they can be organized in the file


1 2 3 4 5
1 2 3 4 5 1 2 3 4  1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 5 1 2 3 4  
1 2 3 4 5 1 2 3 4 5 1 2 3 4
1 2 3 4  1 2 3 4  1 2 3 4   1 2 3 4

Hope this help preview file
Will test your code on file shortly

No that I have reviewed and tested code. it works up to column 28 in the file but what is happening is that the same columns are repeating themselves after column 28 from position 33 to position 62 in text file. then it repeats again. I believe it can do that up to 12 times is the maximum.

Again think of a table with 3 or x rows instead of the data being vertical the data is horizontal.
0
Leo TorresSQL DeveloperAuthor Commented:
Code wise this gave me an error when I ran it but its what I am trying to do.

Get-Content C:\Users\ltorres\Documents\Powershell\SampleDataRAPS.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}},
  @{n='POS'  ; e= {-join $_[33.. 34]}},
  @{n='Start'; e= {-join $_[35.. 42]}},
  @{n='End'  ; e= {-join $_[43..50]}},
  @{n='Code' ; e= {-join $_[51..55]}},
  @{n='msg'  ; e= {-join $_[56..58]}} 
  

Open in new window

0
What were the top attacks of Q1 2018?

The Threat Lab team analyzes data from WatchGuard’s Firebox Feed, internal and partner threat intelligence, and a research honeynet, to provide insightful analysis about the top threats on the Internet. Check out our Q1 2018 report for smart, practical security advice today!

footechCommented:
I have a sneaking suspicion that there's a cleaner way to do this, but I'll have to dig through some old work to try to find out.  Anyway, this seems to give the desired output.
$pattern = "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) {2}(?<Msg>[0-9]{3,7}(?![0-9]))?"
gi .\RAW-sample.txt | Select-String -Pattern $pattern -AllMatches | select -expand matches | select -expand value |
 ? { $_ -match $pattern } |
 % { New-Object PsObject -Property @{
                                POS = $matches.POS
                                Start = $matches.Start
                                End = $matches.End
                                Code = $matches.Code
                                Msg = $matches.Msg
                                }
 } | Select POS,Start,End,Code,Msg |
 Export-CSV result.csv -notype

Open in new window

0
Leo TorresSQL DeveloperAuthor Commented:
This look really promising. I this data is in one SQL Column and your code seems to be doing the trick I will continue to test.. Thank you for your efforts.
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Leo, look at your Formatted.xlsx please, and see why I was wondering about that! It is important to provide correct examples ;-).

The clustered data do not have any relation, they are just clustered for some other reason? Because just "streaming" the content into rows removes any relation, that is important.
It is also important that the "row" length for a single record is the same each time, i.e. optional parts are filled with spaces, but it appears to be that way.
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
The more traditional attempt is like
Get-Content Raw-sample.txt | % {
  $line = $_
  while ($line[0] -ne ' ') {
    -join $line[0..29]
    if ($line.Length -gt 32) {$line = $line.Remove(0,32)} else { $line = ' ' }
  }
} | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

Footech, having to use the regex twice does indeed look suspicious.
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Don't know whether you can call that "more clean":
$pattern = [regex] "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) {2}(?<Msg>[0-9]{3,7}(?![0-9]))?"
$match = $pattern.match((get-content RAW-sample.txt))
$(
  while ($match.Success)
  {
    New-Object PsObject -Property @{
        POS   = $match.Groups['POS'  ].value
        Start = $match.Groups['Start'].value
        End   = $match.Groups['End'  ].value
        Code  = $match.Groups['Code' ].value
        Msg   = $match.Groups['Msg'  ].value
    }
    $match = $match.NextMatch()
  }
) |
 Select POS,Start,End,Code,Msg |
 Export-CSV result.csv -notype

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Leo TorresSQL DeveloperAuthor Commented:
Thank you...
0
footechCommented:
Finally got a chance to look at this again.  Here's what I was thinking (code below).  I had forgotten that you could reference the groups by name like $_.groups["POS"].value instead of $_.groups[1].value.  Get-Member wasn't jogging my memory.  I changed the regex just a bit because I noticed some errors.
$pattern = "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) +(?<Msg>[0-9]{3,7}(?![0-9]))?"
gi .\RAW-sample.txt | Select-String -Pattern $pattern -AllMatches | % {$_.matches} | 
 % { New-Object PsObject -Property @{
                                POS = $_.groups["POS"].value
                                Start = $_.groups["Start"].value
                                End = $_.groups["End"].value
                                Code = $_.groups["Code"].value
                                Msg = $_.groups["Msg"].value
                                }
 } | Select POS,Start,End,Code,Msg |
  Export-CSV result.csv -notype

Open in new window


@Qlemo - I just took a glance at your code in #a40290619 and it's making my head spin.  I'm going to have to spend a little time with it so I can understand what's happening.  :)

Also, in your last comment, I've never seen this construct before...
$match = $pattern.match((get-content RAW-sample.txt))
$(
  while ($match.Success)
  {
    #stuff
    }
    $match = $match.NextMatch()
  }
)

Open in new window

0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Firstly, the RegEx class parses the source string for each match (instead of the first per line). To get the next match, we have to call .NextMatch() obviously ;-).

The .Match requires a value as parameter. If using a cmdlet like I did, we need to make it a subexpression by enclosing in an additional pair of parens. Hence the strange look of ((get-content RAW-sample.txt)).

Lastly, foreach, if aso. statements do not push results into the pipeline, they are dismissed. Unless you embed them in an subexpression:  $(while ... ).
0
Leo TorresSQL DeveloperAuthor Commented:
Thank you Qlemo for going above and beyond!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Powershell

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.