Powershell script to remove any line with extra double quote "

I have a text file that need to import into database, sometimes the import failed due to extra double quote found in the line, so before performing the import how do I use PS script to scan every line and remove unwanted extra doule quote.

as we seen below, between delimiter "$" , we have extra double quote in "Maxis Segar ", this is only example but we have many like this in the text lines.

"$"Limited  Company "Maxis Segar ""$"
motioneyeAsked:
Who is Participating?
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Best if you try yourself on a small file, by removing all replaces but the first, then add the next etc., and view on screen.

  1. -replace '"\$"', '|' : Replace all "$" by |. I assume there will never be a pipe symbol in the data, so we can use that as field delimiter.
    The resulting line will have something like "data|data "quoted inline data"|data"$.
    Note the trailing dollar sign - it should not be there for correct CSV, and we will need to take care of it later.
    We need to escape the dollar sign with a backslash as dollar has a special meaning in regular expressions.
  2. -replace '"', '': Remove the remaining double quotes.
    Result: data|data quoted inline data|data$
  3. -replace '\|', '"$"': Replace the pipe back with the original delimiter "$". Again, pipe has a special regexp meaning and needs to be escaped.
    Result: data"$"data quoted inline data"$"data$
  4. -replace '(.+)\$', '"$1"$': This just adds the missing double quote in front of the trailing dollar sign.
    Result: data"$"data quoted inline data"$"data"$
0
 
LearnctxEngineerCommented:
There's probably a bunch of ways to do this. But here's a quickie.

foreach ($line in $file)
{
	$line -replace '("\$"[\w\s]*)(")?([\w\s]+)?(")?("\$")','$1$3$5'
}

Open in new window

It is matching your specific example patterns though. EG:

"$"Limited  Company "Maxis Segar ""$"
"$"Limited  Company Maxis Segar "$"
"$"Blah Blah "$"
"$"sdfsdfsdf "Blah Blah" "$"
0
 
Benjamin VoglarIT ProCommented:
$test = Get-Content C:\IT\querry.txt

$newtest = $test -replace '""' , '"'

$newtest > c:\it\newquerry.txt
0
WEBINAR: 10 Easy Ways to Lose a Password

Join us on June 27th at 8 am PDT to learn about the methods that hackers use to lift real, working credentials from even the most security-savvy employees. We'll cover the importance of multi-factor authentication and how these solutions can better protect your business!

 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Benjamin, it is not about replacing two successive double quotes only - each double quote inside of the delimiters should get removed.

Another way to do it is to remove the delimiters, remove double quotes, then add back delimiters:
Get-Content file.txt -replace '"\$"', '' -replace '"', '' -replace '(.+)', '"$"$1"$"' | Out-File file2.txt

Open in new window

There are no real pros or cons compared to -replace '("\$"[\w\s]*)(")?([\w\s]+)?(")?("\$")','$1$3$5' or similar regular expressions.
0
 
motioneyeAuthor Commented:
Hi,

I'm trying with one of the example script here but getting error as below.

Windows PowerShell
Copyright (C) 2014 Microsoft Corporation. All rights reserved.

PS C:\Users\BDXTY> Get-Content D:\Programs\actual\Folder3\Output\SolidFile.csv -replace '"\$"', '' -replace '"', '' -replace '(.+)', '"$"$1"$"' | Out-File D:\Programs\actual\Folder3\Output\SolidFile.csv
Get-Content : A parameter cannot be found that matches parameter name 'replace'.
At line:1 char:47
+ Get-Content D:\Programs\actual\Folder3\Output\SolidFile.csv -replace '"\$"', '' -replace '"',  ...
+                                               ~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Get-Content], ParameterBindingException
    + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.PowerShell.Commands.GetContentCommand

PS C:\Users\BDXTY>
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Sorry, forgot to copy the parens over to EE:
(Get-Content D:\Programs\actual\Folder3\Output\SolidFile.csv) -replace '"\$"', '' -replace '"', '' -replace '(.+)', '"$"$1"$"' | Out-File D:\Programs\actual\Folder3\Output\SolidFile.csv

Open in new window

Also,  be careful, you are overwriting the original file. Dangerous if in a pipe, which reads the file content while it processes (and replaces) it - leading to an empty output file ... The parens take care of that, as file content is first read in completely before getting processed.
0
 
motioneyeAuthor Commented:
Hi Qlemo.
The script works but somehow it removed all double quote for each column delimeter, from the example below.

"$"Limited  Company "Maxis Segar ""$"

It should be like this

"$"Limited  Company Maxis Segar "$"
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
With above input it generates the above output. If it does not for you, can you please post an input and output file, as you get it?
0
 
motioneyeAuthor Commented:
Hi Qlemo
You can try with this input file attached here, What I would like this scripts does for me is to preserve all "$"  but only remove those extra double quote within a delimiter.
input.csv
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
This is the first line (the header) of the input file:
"buildingName"$"c"$"cn"$"co"$"dn"$"employeeNumber"$"employeeType"$"givenName"$"mxsBusinessRegion"$"mxsBusinessRegionAcronym"$"mxsCrossCompanyManager"$"mxsExpectedEndDate"$"mxsGlobalID"$"mxsJobFamily"$"mxsLocationCode"$"mxsOrganizationChart"$"mxsOrganizationChartAcronym"$"mxsStartDate"$"mxsStatus"$"mxsTerminationDate"$"l"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"manager"$"managerEmployeeNumber"$"mobile"$"ntUserDomainId"$"o"$"sn"$"telephoneNumber"$"uid"$

Open in new window

and that is something completely different (and unexpected): The delimiter is a dollar sign only, the double quotes are enclosing each field, but embedded quotes are not doubled. Even worse, the header does not contain unique column names. Whoever creates this file: shame on you!

This should work:
(get-content .\input.csv) -replace '"\$"', '|' -replace '"', '' -replace '\|', '"$"' -replace '(.+)\$', '"$1"$' | Out-File .\input.csv

Open in new window

0
 
motioneyeAuthor Commented:
Hi,
One more issues that  puzzled me now, I tried with creating many entries in the text file by simply duplicating the 1st line in text file to multiple entry to be exact 1000 duplicate line. I ran the command again, at this time the size of the text file was double, as shown in the screenshot, any reasons why this happened?
Capture.PNG
0
 
motioneyeAuthor Commented:
Hi Qlemo,
I managed to resolved this, by adding -encoding ASCII, than the file has the same size as the original file.
Btw, don't you mind to explain how this works at every -replace ? what actually it does here ...

-replace '"\$"', '|' -replace '"', '' -replace '\|', '"$"' -replace '(.+)\$', '"$1"$'
0
 
motioneyeAuthor Commented:
Thanks Qlemo, you always be  nice and helping us here :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.