We help IT Professionals succeed at work.

Capturing Data In Large Text File Between A Pattern

Have very large file that contains random number of lines (strings) between a pattern in the file for example INS. I need to capture all the lines starting with the patter "INS" up to before the next INS. Not sure if, Select-String, or ,Where-Object,  is the proper method. Not really sure where to start with this one.
Comment
Watch Question

Top Expert 2014

Commented:
What do you want to do with the captured text?  Best to supply a sample file for input (doesn't have to be real, just representative of what to expect), and then a file (or other description) of what the output should be given the input file.

How many instances of the pattern will there be in the file?  Always two?  Or more?
Robert PollickConfiguration Manager

Author

Commented:
One file could have as many as 300 INS instances. I need to capture all the lines after the initial INS up to the next. I have attached a small sample to explain my issue.
sample.txt
Top Expert 2014

Commented:
Use the regex object from the .Net framework with this pattern:
$re = [regex]'(?:^|\n)(INS(?:.|\n)+?)(?=\nINS)'

Open in new window


You can use the object's matchall method to do the parsing you desire.  Then iterate the resulting groups/captures.
Top Expert 2014

Commented:
I'm still not getting what you want to do with the captured text?  In what form should it be passed down the pipeline?
The way I'm reading it, given your sample input file the output would just be the entire file.  Reading your description another way I would just exclude lines that have "INS".  Sorry, it's just not clear to me.
Robert PollickConfiguration Manager

Author

Commented:
I need to break the entire file up into blocks of data with the "INS" as the delimiter. Not separate files, just one large file but with a start (INS) and finish (capturing all lines up to the next INS) for the accounts in the text file.

INS (start of first account)
second
third
fourth
fifth
INS (start of the next account)

There could be five lines or 20 lines, it all depends on the data that has changed for the users demographic information.
Top Expert 2014
Commented:
Still not sure I'm getting in what form it needs to be passed down the pipeline, but give this a shot and let me know if it's what you're envisioning.
$file = "c:\temp\somefile.txt"
Get-Content $file | ForEach-Object -Begin { $init = $true } -Process `
{
    If ( $_ -match "^INS" )
    {
        # so we don't output anything on the first match
        If ( -not $init )
        {
            # output the chunk
            $out
            Write-Host "====" -ForegroundColor Yellow  #this can be removed, just helps to visualize the chunks on screen
        }
        $init = $false
        # start a new chunk
        $out = @($_)
    }
    Else
    {
        # building up the chunk
        $out += $_
    }
}

Open in new window

Robert PollickConfiguration Manager

Author

Commented:
Exactly works as it needs.

Love this place, constantly learning and constantly impressed with the depth of knowledge here.  Thanks again!
Top Expert 2014

Commented:
OK, great!  Glad it's working for you.