Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 342
  • Last Modified:

Large file - search for specific rows and save into a new small file.

Hi, Experts -
I have a large .dat file (5g). I need to loop thru each record with a specific word and when
found extract those rows and save them into a new file. That file will be a small file I can work with. I think with Powershell, this task is doable.

Can you someone please give me coding for that ?

Thanks,
0
ivan_belal
Asked:
ivan_belal
1 Solution
 
Joseph DalyCommented:
Im sure this can be done with powershell however the tool i use for tasks like this is wingrep.

http://www.wingrep.com/
0
 
Dale HarrisProfessional Services EngineerCommented:
Try this example:

$TextFile = gc "decofind.txt"
$TextFile | Select-String Men -context 0,1 >> "FoundWords.txt"

Open in new window


Attached is the Declaration of Independence and results from the text.

Explanation of Context (which you'll need to use)

  -Context <Int32[]>
      Captures the specified number of lines before and after the line with the match. This allows you to view the match in context.

      If you enter one number as the value of this parameter, that number determines the number of lines captured before and after the match. If you enter two numbers as the value, the first number determines the number of lines before the match and the second number determines the number of lines after the match.

      In the default display, lines with a match are indicated by a right angle bracket (ASCII 62) in the first column of the display. Unmarked lines are the context.

      This parameter does not change the number of objects generated by Select-String. Select-String generates one MatchInfo (Microsoft.PowerShell.Commands.MatchInfo) object for each match. The context is stored as an array of strings in the Context property of the object.

      When you pipe the output of a Select-String command to another Select-String command, the receiving command searches only the text in the matched line (the value of the Line property of the MatchInfo object), not the text in the context lines. As a result, the Context parameter is not valid on the receiving Select-String command.

      When the context includes a match, the MatchInfo object for each match includes all of the context lines, but the overlapping lines appear only once in the display.
DecofInd.txt
FoundWords.txt
0
 
rwskasCommented:
When searching large amounts of data like this, you want to adjust the ReadCount.

This skips sending each line to the pipeline, which is time consuming.

This should provide significantly faster results:

$MyData = Get-Content "C:\path to my file" -ReadCount 0
$MyResults = $MyData |  Select-String "MyString"
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Dale HarrisProfessional Services EngineerCommented:
Do you really want 0 as the readcount?  Why not try 1000?

http://learningpcs.blogspot.com/2012/03/powershell-v2-get-content-readcount.html

DH
0
 
ivan_belalAuthor Commented:
Unfortunately each case, I get outofmemoryexception ???
Thanks,
0
 
rwskasCommented:
@DaleHarris
If you not going to be piping the data to anything, then no, you would not want to set it at 1000. You would then be sending multiple chunks of data throught he pipeline - instead, we just want the entire thing, not every line, not every 1000 lines. The whole thing.

This is the the fastest way to save a large amount of data into an array.
0
 
Dale HarrisProfessional Services EngineerCommented:
You might have to break it up into more manageable chunks?  Try to stay well below your RAM level available.
0
 
Brent ChallisPrincipal: ITCommented:
Here is some code that I wrote a while ago to process a large file while minimising the impact on memory:
function Get-NextLine
{
      Param
      (
            [string]$FileName,
            [switch]$Dispose
      )
      if ($Dispose)
      {
      }
      else
      {
            if (-not (Test-Path Variables:StreamReaders))
            {
                  $Script:StreamReaders = @{}
            }
            if ( -not ($Script:StreamReaders -contains $FileName))
            {
                  $Script:StreamReaders += @{$FileName = (New-Object -TypeName System.IO.StreamReader($FileName))}
            }
      }
      $reader = $Script:StreamReaders[$FileName]
      $line = $reader.ReadLine()
      return $line
}

$line = Get-NextLine c:\data\productreport.txt
while ($line -ne $null)
{
      Write-Host $line
      $line = Get-NextLine c:\data\productreport.txt
}

You could try replacing the line:
Write-Host $line
with an if statement to write it to an output file is appropriate.
0

Featured Post

Evaluating UTMs? Here's what you need to know!

Evaluating a UTM appliance and vendor can prove to be an overwhelming exercise.  How can you make sure that you're getting the security that your organization needs without breaking the bank? Check out our UTM Buyer's Guide for more information on what you should be looking for!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now