Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Powershell to format a file

Posted on 2014-08-27
12
Medium Priority
?
207 Views
Last Modified: 2014-09-01
I need to read a file that looks like below and is turned into columns and rows so i can insert into a SQL table.

Here is a break down:
20   20070426     20070426        83100   msg
PP  YYYYMMDD YYYYMMDD      Code     501


After the msg if one exists the pattern repeats up to 25 times in 1 line

I have a sample here of raw data, Included an actual RAW file with what I would like the final data to look like.


202007042620070426 83100  501                                                                                                                                                                                                                                                                                                   
202007053120070531 99520  501   202007053120070531 53011        202007053120070531 4010         202007053120070531 2724                                                                                                                                                                                                         
202007051520070515 4241         202007051520070515 4019         202007051520070515 4439 

Open in new window

RAW-sample.txt
Formatted.xlsx
0
Comment
Question by:Leo Torres
  • 5
  • 5
  • 2
12 Comments
 
LVL 72

Expert Comment

by:Qlemo
ID: 40289207
The result should not have any of the data after the first msg in each line,  correct?
Get-Content Raw-sample.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

0
 
LVL 8

Author Comment

by:Leo Torres
ID: 40289295
No not correct. This data is clustered. Not sure If I am making any sense.

OK let me see if I explain it this way.

Suppose you have 5 column which we do in this case. The last column is optional (msg).

Then suppose you have 5 rows. Instead of the data rows being placed vertical as it is in tables and columns. The data in this instance is horizontal.

Here is an illustration. Suppose the numbers represent the column in which the data belongs the location of the numbers represent how they can be organized in the file


1 2 3 4 5
1 2 3 4 5 1 2 3 4  1 2 3 4 5 1 2 3 4
1 2 3 4 1 2 3 4 5 1 2 3 4  
1 2 3 4 5 1 2 3 4 5 1 2 3 4
1 2 3 4  1 2 3 4  1 2 3 4   1 2 3 4

Hope this help preview file
Will test your code on file shortly

No that I have reviewed and tested code. it works up to column 28 in the file but what is happening is that the same columns are repeating themselves after column 28 from position 33 to position 62 in text file. then it repeats again. I believe it can do that up to 12 times is the maximum.

Again think of a table with 3 or x rows instead of the data being vertical the data is horizontal.
0
 
LVL 8

Author Comment

by:Leo Torres
ID: 40289322
Code wise this gave me an error when I ran it but its what I am trying to do.

Get-Content C:\Users\ltorres\Documents\Powershell\SampleDataRAPS.txt | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}},
  @{n='POS'  ; e= {-join $_[33.. 34]}},
  @{n='Start'; e= {-join $_[35.. 42]}},
  @{n='End'  ; e= {-join $_[43..50]}},
  @{n='Code' ; e= {-join $_[51..55]}},
  @{n='msg'  ; e= {-join $_[56..58]}} 
  

Open in new window

0
Who's Defending Your Organization from Threats?

Protecting against advanced threats requires an IT dream team – a well-oiled machine of people and solutions working together to defend your organization. Download our resource kit today to learn more about the tools you need to build you IT Dream Team!

 
LVL 41

Assisted Solution

by:footech
footech earned 800 total points
ID: 40290038
I have a sneaking suspicion that there's a cleaner way to do this, but I'll have to dig through some old work to try to find out.  Anyway, this seems to give the desired output.
$pattern = "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) {2}(?<Msg>[0-9]{3,7}(?![0-9]))?"
gi .\RAW-sample.txt | Select-String -Pattern $pattern -AllMatches | select -expand matches | select -expand value |
 ? { $_ -match $pattern } |
 % { New-Object PsObject -Property @{
                                POS = $matches.POS
                                Start = $matches.Start
                                End = $matches.End
                                Code = $matches.Code
                                Msg = $matches.Msg
                                }
 } | Select POS,Start,End,Code,Msg |
 Export-CSV result.csv -notype

Open in new window

0
 
LVL 8

Author Comment

by:Leo Torres
ID: 40290264
This look really promising. I this data is in one SQL Column and your code seems to be doing the trick I will continue to test.. Thank you for your efforts.
0
 
LVL 72

Expert Comment

by:Qlemo
ID: 40290576
Leo, look at your Formatted.xlsx please, and see why I was wondering about that! It is important to provide correct examples ;-).

The clustered data do not have any relation, they are just clustered for some other reason? Because just "streaming" the content into rows removes any relation, that is important.
It is also important that the "row" length for a single record is the same each time, i.e. optional parts are filled with spaces, but it appears to be that way.
0
 
LVL 72

Expert Comment

by:Qlemo
ID: 40290619
The more traditional attempt is like
Get-Content Raw-sample.txt | % {
  $line = $_
  while ($line[0] -ne ' ') {
    -join $line[0..29]
    if ($line.Length -gt 32) {$line = $line.Remove(0,32)} else { $line = ' ' }
  }
} | select `
  @{n='POS'  ; e= {-join $_[ 0.. 1]}},
  @{n='Start'; e= {-join $_[ 2.. 9]}},
  @{n='End'  ; e= {-join $_[10..17]}},
  @{n='Code' ; e= {-join $_[19..23]}},
  @{n='msg'  ; e= {-join $_[26..28]}} | export-csv -NoType Formatted.csv

Open in new window

Footech, having to use the regex twice does indeed look suspicious.
0
 
LVL 72

Accepted Solution

by:
Qlemo earned 1200 total points
ID: 40290694
Don't know whether you can call that "more clean":
$pattern = [regex] "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) {2}(?<Msg>[0-9]{3,7}(?![0-9]))?"
$match = $pattern.match((get-content RAW-sample.txt))
$(
  while ($match.Success)
  {
    New-Object PsObject -Property @{
        POS   = $match.Groups['POS'  ].value
        Start = $match.Groups['Start'].value
        End   = $match.Groups['End'  ].value
        Code  = $match.Groups['Code' ].value
        Msg   = $match.Groups['Msg'  ].value
    }
    $match = $match.NextMatch()
  }
) |
 Select POS,Start,End,Code,Msg |
 Export-CSV result.csv -notype

Open in new window

0
 
LVL 8

Author Closing Comment

by:Leo Torres
ID: 40292957
Thank you...
0
 
LVL 41

Expert Comment

by:footech
ID: 40293127
Finally got a chance to look at this again.  Here's what I was thinking (code below).  I had forgotten that you could reference the groups by name like $_.groups["POS"].value instead of $_.groups[1].value.  Get-Member wasn't jogging my memory.  I changed the regex just a bit because I noticed some errors.
$pattern = "(?<POS>[0-9]{2})(?<Start>[0-9]{8})(?<End>[0-9]{8}) (?<Code>[0-9]{4,7}) +(?<Msg>[0-9]{3,7}(?![0-9]))?"
gi .\RAW-sample.txt | Select-String -Pattern $pattern -AllMatches | % {$_.matches} | 
 % { New-Object PsObject -Property @{
                                POS = $_.groups["POS"].value
                                Start = $_.groups["Start"].value
                                End = $_.groups["End"].value
                                Code = $_.groups["Code"].value
                                Msg = $_.groups["Msg"].value
                                }
 } | Select POS,Start,End,Code,Msg |
  Export-CSV result.csv -notype

Open in new window


@Qlemo - I just took a glance at your code in #a40290619 and it's making my head spin.  I'm going to have to spend a little time with it so I can understand what's happening.  :)

Also, in your last comment, I've never seen this construct before...
$match = $pattern.match((get-content RAW-sample.txt))
$(
  while ($match.Success)
  {
    #stuff
    }
    $match = $match.NextMatch()
  }
)

Open in new window

0
 
LVL 72

Expert Comment

by:Qlemo
ID: 40296501
Firstly, the RegEx class parses the source string for each match (instead of the first per line). To get the next match, we have to call .NextMatch() obviously ;-).

The .Match requires a value as parameter. If using a cmdlet like I did, we need to make it a subexpression by enclosing in an additional pair of parens. Hence the strange look of ((get-content RAW-sample.txt)).

Lastly, foreach, if aso. statements do not push results into the pipeline, they are dismissed. Unless you embed them in an subexpression:  $(while ... ).
0
 
LVL 8

Author Comment

by:Leo Torres
ID: 40297564
Thank you Qlemo for going above and beyond!
0

Featured Post

Who's Defending Your Organization from Threats?

Protecting against advanced threats requires an IT dream team – a well-oiled machine of people and solutions working together to defend your organization. Download our resource kit today to learn more about the tools you need to build you IT Dream Team!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In real business world data are crucial and sometimes data are shared among different information systems. Hence, an agreeable file transfer protocol need to be established.
Transferring FSMO roles is done when an admin wants to split roles between certain Domain Controllers or the Domain Controller holding the Roles has been forcefully demoted using dcpromo / forceremoval
Loops Section Overview
Screencast - Getting to Know the Pipeline

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question