Link to home
Start Free TrialLog in
Avatar of jb428j
jb428j

asked on

How do I remove characters from a text file ifrom each row?

Hello Experts,

I am getting better at scripting in powershell, but I am still a novice and I am having problems finding a solution to this one..

I have a group of .txt files exported from SQL Server and delimited with a pipe - I
I am trying to remove all characters and spaces that are before the first | in each row.

Anyone able to help with this one? I could use Textpad, but that wouldn't help me learn this.

Also,  any resource you have on this would be good too. I didn't find anything in google or I just didn't understand what I saw.


Idea of how the data look:


             12345| File Path would sit here | more information about the data | etc |

I need to remove that ID section leaving the pipe. That first column is fixed in the length but not the data.
Avatar of LindyS
LindyS
Flag of United States of America image

If you're using PowerShell to export the field, you could use the split option.

If you have the same number of pipes you want ignored, it would be something like;

{$_."ObjectName".Split("|")[-4]}

That would remove everything after the fourth pipe symbol from the right.
Avatar of GusGallows
In powershell you can use a combination of the indexof() function and the substring() function. The indexof() function will return the position of the first occurance of the "|" in your string. The substring will return the last so many characters of the string. So you figure out what position the object is in, subtract it from the length of the string, and then return those many characters from the right of the string. The code looks as follows:

#Write the string to a variable
$line = "12345| File Path would sit here | more information about the data | etc |"
#grab the length of the string
$len = $line.length
#find the character position of the first "|"
$PipePos = $line.indexof("|")
#subtract the position of the first character from the length of the string
$retChars = ($len - $PipePos)
#return everything except what is before the first "|"
$return = $line.Substring($len - $retchars, $retChars)
$return

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of GusGallows
GusGallows
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here are two more approaches to the problem.

 The first is based on the assumption that you have a fixed number of characters that you want to prune from the beginning of each line, the second goes looking for the | symbol.

Each of these could be put in to a ps1 file and the file called with the appropriate filenames, or alternatively, you could put the code in to a function in a profile script.
Param ($SourceFile,$OutputFile)
{
Get-Content $SourceFile | ForEach-Object {$_.SubString(5)} | Out-File $OutputFile
}

Param ($SourceFile,$OutputFile)
{
Get-Content $SourceFile | ForEach-Object {$_.SubString($_.IndexOf('|'))} | Out-File $OutputFile
}

Open in new window

A more complex but more flexible approach would be to define a filter in a profile script that takes the content of a file from the pipeline, prunes each line in turn and outputs the pruned lines to the pipeline to provide flexibility.  In the attached code I have written the filter and provided an example of how it could be used.  To provide greater flexibility there are parameters to define whether you want the left or the right part of the string, what character to use as the delimiter (I have set a default to the comma but in your environment you could change it to the pipe symbol), and a switch to indicate whether or not you want to include the delimiter character.

I have included the switch parameter in the example with a value of true even though I have set the default value of true to demonstrate how a switch can be used.
filter Split-StringAtCharacter
{ 
Param
(
	[char]$CharToSplitAt = ',',
	[switch]$IncludeChar = $true,
	[ValidateSet("Right","Left")]
	[string]$Mode = "right"
)
    try
    {
    	$stringToProcess = $_
        $charPos = $stringToProcess.IndexOf($CharToSplitAt)
        $subString = ""
        switch ($mode)
        {
        	"right" 
        		{
        			if (-not $IncludeChar)
        			{
        				$charPos++
        			}
        			$subString = $stringToProcess.SubString($charPos)
        		}
        	"left" 
        		{
        			if ($IncludeChar)
        			{
        				$charPos++
        			}
        			$subString = $stringToProcess.SubString(0,$charPos)
        		}
        }
        Write-Output $subString
    }
    catch
    {
        Write-Warning "An error occurred processing: $stringToProcess `n$_"
    }
}	        

Get-Content "SampleStrings.txt" | Split-StringAtCharacter -CharToSplitAt '|' -Mode right -IncludeChar:$true | Out-File "SampleStringsOut.txt"
Get-Content "SampleStringsOut.txt"

Open in new window

An improved version of the code more in keeping with how code should be written for processing pipeline data is attached.  The idea of the BEGIN, PROCESS and END blocks is that the cmdlet is designed to be able to initialise data, process each item from the pipeline in turn and then clean up at the conclusion.
function Split-StringAtCharacter
{
Param
(
	[char]$CharToSplitAt = ',',
	[switch]$IncludeChar = $true,
	[ValidateSet("Right","Left")]
	[string]$Mode = "right"
)
	BEGIN 
	{
		#Initialisation code
	}
	
	PROCESS 
	{
		try
	    {
	    	$stringToProcess = $_
	        $charPos = $stringToProcess.IndexOf($CharToSplitAt)
	        $subString = ""
	        switch ($mode)
	        {
	        	"right" 
	        		{
	        			if (-not $IncludeChar)
	        			{
	        				$charPos++
	        			}
	        			$subString = $stringToProcess.SubString($charPos)
	        		}
	        	"left" 
	        		{
	        			if ($IncludeChar)
	        			{
	        				$charPos++
	        			}
	        			$subString = $stringToProcess.SubString(0,$charPos)
	        		}
	        }
	        Write-Output $subString
	    }
	    catch
	    {
	        Write-Warning "An error occurred processing: $stringToProcess `n$_"
	    }
	}
	
	END 
	{
		#Cleanup code
	}
}

Get-Content "SampleStrings.txt" | Split-StringAtCharacter -CharToSplitAt '|' -Mode left -IncludeChar:$true | Out-File "SampleStringsOut.txt"
Get-Content "SampleStringsOut.txt"

Open in new window

Avatar of jb428j
jb428j

ASKER

Sorry for the late response all. Work has been in overload lately. I will do my best to see what works for my particular situation.  Again, thank you all.
Avatar of jb428j

ASKER

This is exactly what I was looking for and the added comments in the script are amazing for helping me understand how it all worked. This script will go along way as I really start to understand these commands so that I may take this information incorporate it into old and new scripts.

Thank you ALL  for your help. I really do appreciate all of your time.