Link to home
Start Free TrialLog in
Avatar of Leo Torres
Leo TorresFlag for United States of America

asked on

Powershell to format a file

I need to read a file that looks like below and is turned into columns and rows so i can insert into a SQL table.

Here is a break down:
20   20070426     20070426        83100   msg
PP  YYYYMMDD YYYYMMDD      Code     501


After the msg if one exists the pattern repeats up to 25 times in 1 line

I have a sample here of raw data, Included an actual RAW file with what I would like the final data to look like.


202007042620070426 83100  501                                                                                                                                                                                                                                                                                                   
202007053120070531 99520  501   202007053120070531 53011        202007053120070531 4010         202007053120070531 2724                                                                                                                                                                                                         
202007051520070515 4241         202007051520070515 4019         202007051520070515 4439 

Open in new window

RAW-sample.txt
Formatted.xlsx
Avatar of Qlemo
Qlemo
Flag of Germany image

The result should not have any of the data after the first msg in each line,  correct?
Get-Content Raw-sample.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

Avatar of Leo Torres

ASKER

No not correct. This data is clustered. Not sure If I am making any sense.

OK let me see if I explain it this way.

Suppose you have 5 column which we do in this case. The last column is optional (msg).

Then suppose you have 5 rows. Instead of the data rows being placed vertical as it is in tables and columns. The data in this instance is horizontal.

Here is an illustration. Suppose the numbers represent the column in which the data belongs the location of the numbers represent how they can be organized in the file


1 2 3 4 5
1 2 3 4 5 1 2 3 4  1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 5 1 2 3 4  
1 2 3 4 5 1 2 3 4 5 1 2 3 4
1 2 3 4  1 2 3 4  1 2 3 4   1 2 3 4

Hope this help preview file
Will test your code on file shortly

No that I have reviewed and tested code. it works up to column 28 in the file but what is happening is that the same columns are repeating themselves after column 28 from position 33 to position 62 in text file. then it repeats again. I believe it can do that up to 12 times is the maximum.

Again think of a table with 3 or x rows instead of the data being vertical the data is horizontal.
Code wise this gave me an error when I ran it but its what I am trying to do.

Get-Content C:\Users\ltorres\Documents\Powershell\SampleDataRAPS.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}},
  @{n='POS'  ; e= {-join $_[33.. 34]}},
  @{n='Start'; e= {-join $_[35.. 42]}},
  @{n='End'  ; e= {-join $_[43..50]}},
  @{n='Code' ; e= {-join $_[51..55]}},
  @{n='msg'  ; e= {-join $_[56..58]}} 
  

Open in new window

SOLUTION
Avatar of footech
footech
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This look really promising. I this data is in one SQL Column and your code seems to be doing the trick I will continue to test.. Thank you for your efforts.
Leo, look at your Formatted.xlsx please, and see why I was wondering about that! It is important to provide correct examples ;-).

The clustered data do not have any relation, they are just clustered for some other reason? Because just "streaming" the content into rows removes any relation, that is important.
It is also important that the "row" length for a single record is the same each time, i.e. optional parts are filled with spaces, but it appears to be that way.
The more traditional attempt is like
Get-Content Raw-sample.txt | % {
  $line = $_
  while ($line[0] -ne ' ') {
    -join $line[0..29]
    if ($line.Length -gt 32) {$line = $line.Remove(0,32)} else { $line = ' ' }
  }
} | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

Footech, having to use the regex twice does indeed look suspicious.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you...
Finally got a chance to look at this again.  Here's what I was thinking (code below).  I had forgotten that you could reference the groups by name like $_.groups["POS"].value instead of $_.groups[1].value.  Get-Member wasn't jogging my memory.  I changed the regex just a bit because I noticed some errors.
$pattern = "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) +(?<Msg>[0-9]{3,7}(?![0-9]))?"
gi .\RAW-sample.txt | Select-String -Pattern $pattern -AllMatches | % {$_.matches} | 
 % { New-Object PsObject -Property @{
                                POS = $_.groups["POS"].value
                                Start = $_.groups["Start"].value
                                End = $_.groups["End"].value
                                Code = $_.groups["Code"].value
                                Msg = $_.groups["Msg"].value
                                }
 } | Select POS,Start,End,Code,Msg |
  Export-CSV result.csv -notype

Open in new window


@Qlemo - I just took a glance at your code in #a40290619 and it's making my head spin.  I'm going to have to spend a little time with it so I can understand what's happening.  :)

Also, in your last comment, I've never seen this construct before...
$match = $pattern.match((get-content RAW-sample.txt))
$(
  while ($match.Success)
  {
    #stuff
    }
    $match = $match.NextMatch()
  }
)

Open in new window

Firstly, the RegEx class parses the source string for each match (instead of the first per line). To get the next match, we have to call .NextMatch() obviously ;-).

The .Match requires a value as parameter. If using a cmdlet like I did, we need to make it a subexpression by enclosing in an additional pair of parens. Hence the strange look of ((get-content RAW-sample.txt)).

Lastly, foreach, if aso. statements do not push results into the pipeline, they are dismissed. Unless you embed them in an subexpression:  $(while ... ).
Thank you Qlemo for going above and beyond!