We help IT Professionals succeed at work.

Powershell to format a file

253 Views
Last Modified: 2014-09-01
I need to read a file that looks like below and is turned into columns and rows so i can insert into a SQL table.

Here is a break down:
20   20070426     20070426        83100   msg
PP  YYYYMMDD YYYYMMDD      Code     501


After the msg if one exists the pattern repeats up to 25 times in 1 line

I have a sample here of raw data, Included an actual RAW file with what I would like the final data to look like.


202007042620070426 83100  501                                                                                                                                                                                                                                                                                                   
202007053120070531 99520  501   202007053120070531 53011        202007053120070531 4010         202007053120070531 2724                                                                                                                                                                                                         
202007051520070515 4241         202007051520070515 4019         202007051520070515 4439 

Open in new window

RAW-sample.txt
Formatted.xlsx
Comment
Watch Question

Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
The result should not have any of the data after the first msg in each line,  correct?
Get-Content Raw-sample.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
No not correct. This data is clustered. Not sure If I am making any sense.

OK let me see if I explain it this way.

Suppose you have 5 column which we do in this case. The last column is optional (msg).

Then suppose you have 5 rows. Instead of the data rows being placed vertical as it is in tables and columns. The data in this instance is horizontal.

Here is an illustration. Suppose the numbers represent the column in which the data belongs the location of the numbers represent how they can be organized in the file


1 2 3 4 5
1 2 3 4 5 1 2 3 4  1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 5 1 2 3 4  
1 2 3 4 5 1 2 3 4 5 1 2 3 4
1 2 3 4  1 2 3 4  1 2 3 4   1 2 3 4

Hope this help preview file
Will test your code on file shortly

No that I have reviewed and tested code. it works up to column 28 in the file but what is happening is that the same columns are repeating themselves after column 28 from position 33 to position 62 in text file. then it repeats again. I believe it can do that up to 12 times is the maximum.

Again think of a table with 3 or x rows instead of the data being vertical the data is horizontal.
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Code wise this gave me an error when I ran it but its what I am trying to do.

Get-Content C:\Users\ltorres\Documents\Powershell\SampleDataRAPS.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}},
  @{n='POS'  ; e= {-join $_[33.. 34]}},
  @{n='Start'; e= {-join $_[35.. 42]}},
  @{n='End'  ; e= {-join $_[43..50]}},
  @{n='Code' ; e= {-join $_[51..55]}},
  @{n='msg'  ; e= {-join $_[56..58]}} 
  

Open in new window

CERTIFIED EXPERT
Top Expert 2014
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
This look really promising. I this data is in one SQL Column and your code seems to be doing the trick I will continue to test.. Thank you for your efforts.
Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
Leo, look at your Formatted.xlsx please, and see why I was wondering about that! It is important to provide correct examples ;-).

The clustered data do not have any relation, they are just clustered for some other reason? Because just "streaming" the content into rows removes any relation, that is important.
It is also important that the "row" length for a single record is the same each time, i.e. optional parts are filled with spaces, but it appears to be that way.
Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
The more traditional attempt is like
Get-Content Raw-sample.txt | % {
  $line = $_
  while ($line[0] -ne ' ') {
    -join $line[0..29]
    if ($line.Length -gt 32) {$line = $line.Remove(0,32)} else { $line = ' ' }
  }
} | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

Footech, having to use the regex twice does indeed look suspicious.
"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Thank you...
CERTIFIED EXPERT
Top Expert 2014

Commented:
Finally got a chance to look at this again.  Here's what I was thinking (code below).  I had forgotten that you could reference the groups by name like $_.groups["POS"].value instead of $_.groups[1].value.  Get-Member wasn't jogging my memory.  I changed the regex just a bit because I noticed some errors.
$pattern = "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) +(?<Msg>[0-9]{3,7}(?![0-9]))?"
gi .\RAW-sample.txt | Select-String -Pattern $pattern -AllMatches | % {$_.matches} | 
 % { New-Object PsObject -Property @{
                                POS = $_.groups["POS"].value
                                Start = $_.groups["Start"].value
                                End = $_.groups["End"].value
                                Code = $_.groups["Code"].value
                                Msg = $_.groups["Msg"].value
                                }
 } | Select POS,Start,End,Code,Msg |
  Export-CSV result.csv -notype

Open in new window


@Qlemo - I just took a glance at your code in #a40290619 and it's making my head spin.  I'm going to have to spend a little time with it so I can understand what's happening.  :)

Also, in your last comment, I've never seen this construct before...
$match = $pattern.match((get-content RAW-sample.txt))
$(
  while ($match.Success)
  {
    #stuff
    }
    $match = $match.NextMatch()
  }
)

Open in new window

Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
Firstly, the RegEx class parses the source string for each match (instead of the first per line). To get the next match, we have to call .NextMatch() obviously ;-).

The .Match requires a value as parameter. If using a cmdlet like I did, we need to make it a subexpression by enclosing in an additional pair of parens. Hence the strange look of ((get-content RAW-sample.txt)).

Lastly, foreach, if aso. statements do not push results into the pipeline, they are dismissed. Unless you embed them in an subexpression:  $(while ... ).
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Thank you Qlemo for going above and beyond!
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.