Windows batch & Powershell: retain specific lines based on strings.

Luis Diaz
Luis Diaz used Ask the Experts™
on
Hello experts,

I have a file with multiple lines.

I am looking for an script or a regular expression to retain lines which start with the following string: 2019-11-21 and contains the following string : “loading file”

Thank you for your help.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
Try this:
Select-String -Path .\SomeFile.txt -Pattern '^\2019-11-21.*?loading file'

Open in new window

Luis DiazIT consultant

Author

Commented:
Thank you and how should I proceed to generate a new file with the result?
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
Depends - what exactly are you interested in? Just the lines matching, lines and line number, only whether a line exists, ...?
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

Luis DiazIT consultant

Author

Commented:
only whether lines exist. Thank you for your help.
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
What's your desired output format then?
csv, just some textual information (if so, please provide a sample of what you'd like to see in case the line was found or not found), ...
Luis DiazIT consultant

Author

Commented:
Files to read and to generate in txt format.
Please find attached files.
test.txt: file to read.
result.txt: file to generate

I change the approach but the need remain the same: retain lines which start with "D:\" and contains "()"
Thank you for your help.
test.txt
result.txt
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
Now I'm somewhat confused, sorry.
* The sample file test.txt doesn't contain any line matching this pattern (starting with D:\ and contains "()").
* The result file looks like the expected pattern should be starting with D:\ or contains "(...)"
Please clarify.
Luis DiazIT consultant

Author

Commented:
Sorry, my mistake. I expect to read test.txt file with the following requirement:
 pattern should be starting with D:\or contains "(...)"
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
This will match anything either starting with "D:\", or having something enclosed in round brackets anywhere:
Select-String -Path .\test.txt -Pattern "^D:\\|\(.*?\)" | Select-Object -ExpandProperty Line | Set-Content -Path .\result.txt

Open in new window

Luis DiazIT consultant

Author

Commented:
Thank you, I will test it and keep you informed.
Luis DiazIT consultant

Author

Commented:
Tested and it works!

Just for my knowledge and in order to clarify regular expression could you please let me know how to manage the following cases

Þ      Retain lines which start with toto and contains in the same line tata
Þ      Remove lines which start with toto and contains in the same line tata
Þ      Retain lines which finish with toto and contains in the same line tata
Þ      Remove lines which finish with toto and contains in the same line tata
Þ      Retain lines which start with toto and contains in another line tata
Þ      Remove lines which start with toto and contains in another line tata
Þ      Retain lines which finish with toto and contains in another line tata
Þ      Remove lines which with toto and contains in another line tata

Additional question:

Strings composed by numeric or special character such as (toto1 or toto\) follow the same regular expression approach.

Thank you for your help.
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
Retain lines which start with toto and contains in the same line tata
^toto.*?tata
^ anchors the pattern to the start of the string, . is any character, * is a quantifier meaning zero or more repetitions of the previous character; the ? after the quantifier makes it non-greedy, that is, it matches as few characters as possible.
Retain lines which finish with toto and contains in the same line tata
tata.*?toto$
$ anchors the pattern to the end of the string.
Retain lines which start with toto and contains in another line tata
(?ms)^toto.*tata
(?ms) sets Multi-line mode (^and $ match beginning and end of each line, not the input string) and Single-line mode (. matches any character, including \n)
Retain lines which finish with toto and contains in another line tata
(?ms)toto$.*tata

You can use these regex patterns with the -match operator; for "remove", just use "-notmatch" instead of match.
If you're using Select-String, add the -NotMatch switch argument for "Remove".

For details, see here (among other RegEx oriented sites):
Regular Expressions Tutorial > Learn How to Use and Get The Most out of Regular Expressions
https://www.regular-expressions.info/tutorial.html
Luis DiazIT consultant

Author

Commented:
Thank you oBdA.
I will test them and keep you informed.
Luis DiazIT consultant

Author

Commented:
oBdA,
One question related to the pattern “"^D:\\|\(.*?\)"

The aim was to retain everything that start with D:\ and contains in between (). The pattern works however the question is why
Why we have \\ double anti slash and |\ characters prior to ().

Thank you for your help.
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
Because the \ is the escape character in a RegEx, and round brackets have special meanings in a RegEx, too, so they need to be escaped.
Luis DiazIT consultant

Author

Commented:
oBdA,

I would like to capitalize the replacement by implementing a function that allows to put any regex expression:
Possible to take a reference the following:
Select-String -Path .\test.txt -Pattern "^D:\\|\(.*?\)" | Select-Object -ExpandProperty Line | Set-Content -Path .\result.txt
and set up a function with the following argument:
>InputFile: file to read
>OutputFile: result file
>Partern: partern related to regular expresion
>Mode: match/not match or contain/not contain

If you have questions, please contact me.
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
As a general function with any pattern, or as a specialized version for just the four cases outlined above?
Luis DiazIT consultant

Author

Commented:
A general version with any pattern of possible. Thank you.
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
OutputFile is optional; results will be sent to the pipeline if not specified.
Function Select-MyString {
[CmdletBinding()]
Param(
	[Parameter(Position=0, Mandatory=$true)]
	[String]$Pattern,
	[Parameter(Position=1, Mandatory=$true)]
	[String]$InputFile,
	[Parameter()]
	[Switch]$NotMatch,
	[Parameter()]
	[String]$OutputFile,
	[Parameter()]
	[Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding]$Encoding = 'ASCII'
)
	$splat = @{
		Path = $InputFile
		Pattern = $Pattern
		NotMatch = $NotMatch.ToBool()
		Encoding = $Encoding
	}
	If ($OutputFile) {
		Select-String @splat | Select-Object -ExpandProperty Line | Set-Content -Path $OutputFile -Encoding $Encoding
	} Else {
		Select-String @splat | Select-Object -ExpandProperty Line
	}
}

Open in new window

Luis DiazIT consultant

Author

Commented:
Hello oBdA,

Sorry for the delay.
I tested and it works.

I have some questions:
1-What is the purpose of the -NoMatch parameter? and how should I call it?
2-I don't understand when the pattern not match any line of the input file the function generates in the output file the same content of the inputfile.

Example:

Function Select-MyString {
[CmdletBinding()]
Param(
	[Parameter(Position=0, Mandatory=$true)]
	[String]$Pattern,
	[Parameter(Position=1, Mandatory=$true)]
	[String]$InputFile,
	[Parameter()]
	[Switch]$NotMatch,
	[Parameter()]
	[String]$OutputFile,
	[Parameter()]
	[Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding]$Encoding = 'ASCII'
)
	$splat = @{
		Path = $InputFile
		Pattern = $Pattern
		NotMatch = $NotMatch.ToBool()
		Encoding = $Encoding
	}
	If ($OutputFile) {
		Select-String @splat | Select-Object -ExpandProperty Line | Set-Content -Path $OutputFile -Encoding $Encoding
	} Else {
		Select-String @splat | Select-Object -ExpandProperty Line
	}
}

Select-MyString -InputFile .\input.txt -OutputFile .\ouput.txt -Pattern "^tata"

Open in new window


I attached the input and the output file generated.

I was also wandering if we can perform the following adjustment:
1-Report as parameter the output folder and not the file and output automatically the output file with the following name YYYMMDD_HHMMSS_output

Thank you for your help.
input.txt
output.txt
Luis DiazIT consultant

Author

Commented:
Hello oBdA,

I was wondering if you need additional information, please let me know.

Regards,
Luis.
Most Valuable Expert 2018
Distinguished Expert 2018
Commented:
Sorry, must have missed the notification.
1-What is the purpose of the -NoMatch parameter? and how should I call it?
Same as in Select-String (it will actually be passed through): Finds text that does not match the specified pattern.
2-I don't understand when the pattern not match any line of the input file the function generates in the output file the same content of the inputfile.
It doesn't. Note that in the script, the file is called ".\ouput.txt", while you posted a file "output.txt", which is probably the result from an earlier test. Run it again without specifying a target file, and you won't see any output.
PS C:\EE\29165192> . .\Select-MyString.ps1
PS C:\EE\29165192> gc .\input.txt
toto
toto tata
toto tata titi
tete
PS C:\EE\29165192> Select-MyString -InputFile .\input.txt -Pattern "^tata"
PS C:\EE\29165192> Select-MyString -InputFile .\input.txt -Pattern "^tete"
tete
PS C:\EE\29165192>

Open in new window

This supports
1. specifying no output file or directory, and the output will be sent to the pipeline.
2. specifying an output file using -OutputFile
3. specifying an output directory where an output file with auto-generated name will be created.
Function Select-MyString {
[CmdletBinding(DefaultParameterSetName='ToPipeline')]
Param(
	[Parameter(Position=0, Mandatory=$true)]
	[String]$Pattern,
	[Parameter(Position=1, Mandatory=$true)]
	[String]$InputFile,
	[Parameter()]
	[Switch]$NotMatch,
	[Parameter(Mandatory=$true, ParameterSetName='ToFile')]
	[ValidateScript({Test-Path -Path ([IO.Path]::GetDirectoryName($_)) -PathType Container})]
	[String]$OutputFile,
	[Parameter(Mandatory=$true, ParameterSetName='ToDirectory')]
	[ValidateScript({Test-Path -Path $_ -PathType Container})]
	[String]$OutputDirectory,
	[Parameter()]
	[Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding]$Encoding = 'ASCII'
)
	$splat = @{
		Path = $InputFile
		Pattern = $Pattern
		NotMatch = $NotMatch.ToBool()
		Encoding = $Encoding
	}
	If ($PSCmdlet.ParameterSetName -eq 'ToPipeline') {
		Select-String @splat | Select-Object -ExpandProperty Line
	} Else {
		If ($PSCmdlet.ParameterSetName -eq 'ToDirectory') {
			$OutputFile = Join-Path -Path $OutputDirectory -ChildPath "$(Get-Date -Format 'yyyyMMdd_HHmmss')_output$([IO.Path]::GetExtension($InputFile))"
		}
		Select-String @splat | Select-Object -ExpandProperty Line | Set-Content -Path $OutputFile -Encoding $Encoding
	}
}

Open in new window

Luis DiazIT consultant

Author

Commented:
Thank you oBdA, I tested and it works.

I will keep the question active for a moment as I have a last requirement which is related to the question:
https://www.experts-exchange.com/questions/29166752/Powershell-log-file-function.html

Once I have the revised function I will posted in order to get the last requirement:

Log output log file for the following cases:
-output file has been generated (pattern reported retains lines related to inputfile)
-output file hasn't been generated (pattern reported retains any lines related to inputfile)

Thank you for your help.
Luis DiazIT consultant

Author

Commented:
Now that we have the full log function, I was wondering if we can added in the previous script in order to cover the cases:

-output file has been generated (pattern reported retains lines related to inputfile)
-output file hasn't been generated (pattern reported retains any lines related to inputfile)

Log function:

$dailyAppend = $True #or $False without YYMMDD_ prefix
$currentDir = Split-Path $script:MyInvocation.MyCommand.Path
$logFile = "$($currentDir)\{0}log-{1}.log" -f $(If ($dailyAppend) {Get-Date -Format 'yyyyMMdd_'} Else {''}), [IO.Path]::GetFileNameWithoutExtension($MyInvocation.MyCommand.Name)

Function Write-Log {
	Param([Parameter(ValueFromPipeline=$true)][string]$Message)
	Process {
		 Add-Content -Value "[$(Get-Date -Format 'yyyyMMdd_HHmmss')] $($Message)" -Path $Script:logFile -PassThru |
			Write-Host
	}
}
Write-Log "Example of log function"

Open in new window

Most Valuable Expert 2018
Distinguished Expert 2018
Commented:
Not exactly certain I completely understand you, but Select-MyString will now by default return a boolean based on whether the pattern was found (or not found if -NotMatch was used), and act accordingly.
If you want the actual results (which may be empty) sent to the pipeline instead, you can use the -PassThru argument.
To avoid confusion like the one from above, an existing OutputFile will now be deleted if no results were found.
$dailyAppend = $True #or $False without YYMMDD_ prefix
$currentDir = Split-Path $script:MyInvocation.MyCommand.Path
$logFile = "$($currentDir)\{0}log-{1}.log" -f $(If ($dailyAppend) {Get-Date -Format 'yyyyMMdd_'} Else {''}), [IO.Path]::GetFileNameWithoutExtension($MyInvocation.MyCommand.Name)

Function Write-Log {
	Param([Parameter(ValueFromPipeline=$true)][string]$Message)
	Process {
		 Add-Content -Value "[$(Get-Date -Format 'yyyyMMdd_HHmmss')] $($Message)" -Path $Script:logFile -PassThru |
			Write-Host
	}
}

Function Select-MyString {
[CmdletBinding(DefaultParameterSetName='ToPipeline')]
Param(
	[Parameter(Position=0, Mandatory=$true)]
	[String]$Pattern,
	[Parameter(Position=1, Mandatory=$true)]
	[String]$InputFile,
	[Parameter()]
	[Switch]$NotMatch,
	[Parameter(Mandatory=$true, ParameterSetName='ToFile')]
	[ValidateScript({Test-Path -Path ([IO.Path]::GetDirectoryName($_)) -PathType Container})]
	[String]$OutputFile,
	[Parameter(Mandatory=$true, ParameterSetName='ToDirectory')]
	[ValidateScript({Test-Path -Path $_ -PathType Container})]
	[String]$OutputDirectory,
	[Parameter()]
	[Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding]$Encoding = 'ASCII',
	[Parameter()]
	[Switch]$PassThru
)
	$splat = @{
		Path = $InputFile
		Pattern = $Pattern
		NotMatch = $NotMatch.ToBool()
		Encoding = $Encoding
	}
	$result = Select-String @splat | Select-Object -ExpandProperty Line
	If ($PSCmdlet.ParameterSetName -ne 'ToPipeline') {
		If ($PSCmdlet.ParameterSetName -eq 'ToDirectory') {
			$OutputFile = Join-Path -Path $OutputDirectory -ChildPath "$(Get-Date -Format 'yyyyMMdd_HHmmss')_output$([IO.Path]::GetExtension($InputFile))"
		}
		If ($result) {
			$result | Set-Content -Path $OutputFile -Encoding $Encoding
		} ElseIf (Test-Path -Path $OutputFile) {
			Remove-Item -Path $OutputFile -Force
		}
	}
	If ($PassThru) {
		$result
	} Else {
		$null -ne $result
	}
}

$found = Select-MyString -InputFile .\input.txt -OutputFile .\output.txt -Pattern "^tata"
If ($found) {
	Write-Log "Pattern found in input file."
} Else {
	Write-Log "Pattern not found in input file."
}

Open in new window

Luis DiazIT consultant

Author

Commented:
I tested with -OutputDir instead of -OutputFile and -Pattern that matches and doesn't matches and it works!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial