Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Powershell script to remove any line with extra double quote "

Posted on 2016-08-19
13
Medium Priority
?
296 Views
Last Modified: 2016-08-29
I have a text file that need to import into database, sometimes the import failed due to extra double quote found in the line, so before performing the import how do I use PS script to scan every line and remove unwanted extra doule quote.

as we seen below, between delimiter "$" , we have extra double quote in "Maxis Segar ", this is only example but we have many like this in the text lines.

"$"Limited  Company "Maxis Segar ""$"
0
Comment
Question by:motioneye
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
13 Comments
 
LVL 17

Expert Comment

by:Learnctx
ID: 41762150
There's probably a bunch of ways to do this. But here's a quickie.

foreach ($line in $file)
{
	$line -replace '("\$"[\w\s]*)(")?([\w\s]+)?(")?("\$")','$1$3$5'
}

Open in new window

It is matching your specific example patterns though. EG:

"$"Limited  Company "Maxis Segar ""$"
"$"Limited  Company Maxis Segar "$"
"$"Blah Blah "$"
"$"sdfsdfsdf "Blah Blah" "$"
0
 
LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41762168
$test = Get-Content C:\IT\querry.txt

$newtest = $test -replace '""' , '"'

$newtest > c:\it\newquerry.txt
0
 
LVL 71

Expert Comment

by:Qlemo
ID: 41762297
Benjamin, it is not about replacing two successive double quotes only - each double quote inside of the delimiters should get removed.

Another way to do it is to remove the delimiters, remove double quotes, then add back delimiters:
Get-Content file.txt -replace '"\$"', '' -replace '"', '' -replace '(.+)', '"$"$1"$"' | Out-File file2.txt

Open in new window

There are no real pros or cons compared to -replace '("\$"[\w\s]*)(")?([\w\s]+)?(")?("\$")','$1$3$5' or similar regular expressions.
0
Are You Ready for GDPR?

With the GDPR deadline set for May 25, 2018, many organizations are ill-prepared due to uncertainty about the criteria for compliance. According to a recent WatchGuard survey, a staggering 37% of respondents don't even know if their organization needs to comply with GDPR. Do you?

 

Author Comment

by:motioneye
ID: 41765051
Hi,

I'm trying with one of the example script here but getting error as below.

Windows PowerShell
Copyright (C) 2014 Microsoft Corporation. All rights reserved.

PS C:\Users\BDXTY> Get-Content D:\Programs\actual\Folder3\Output\SolidFile.csv -replace '"\$"', '' -replace '"', '' -replace '(.+)', '"$"$1"$"' | Out-File D:\Programs\actual\Folder3\Output\SolidFile.csv
Get-Content : A parameter cannot be found that matches parameter name 'replace'.
At line:1 char:47
+ Get-Content D:\Programs\actual\Folder3\Output\SolidFile.csv -replace '"\$"', '' -replace '"',  ...
+                                               ~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Get-Content], ParameterBindingException
    + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.PowerShell.Commands.GetContentCommand

PS C:\Users\BDXTY>
0
 
LVL 71

Expert Comment

by:Qlemo
ID: 41765126
Sorry, forgot to copy the parens over to EE:
(Get-Content D:\Programs\actual\Folder3\Output\SolidFile.csv) -replace '"\$"', '' -replace '"', '' -replace '(.+)', '"$"$1"$"' | Out-File D:\Programs\actual\Folder3\Output\SolidFile.csv

Open in new window

Also,  be careful, you are overwriting the original file. Dangerous if in a pipe, which reads the file content while it processes (and replaces) it - leading to an empty output file ... The parens take care of that, as file content is first read in completely before getting processed.
0
 

Author Comment

by:motioneye
ID: 41766422
Hi Qlemo.
The script works but somehow it removed all double quote for each column delimeter, from the example below.

"$"Limited  Company "Maxis Segar ""$"

It should be like this

"$"Limited  Company Maxis Segar "$"
0
 
LVL 71

Expert Comment

by:Qlemo
ID: 41766696
With above input it generates the above output. If it does not for you, can you please post an input and output file, as you get it?
0
 

Author Comment

by:motioneye
ID: 41768223
Hi Qlemo
You can try with this input file attached here, What I would like this scripts does for me is to preserve all "$"  but only remove those extra double quote within a delimiter.
input.csv
0
 
LVL 71

Assisted Solution

by:Qlemo
Qlemo earned 2000 total points
ID: 41768393
This is the first line (the header) of the input file:
"buildingName"$"c"$"cn"$"co"$"dn"$"employeeNumber"$"employeeType"$"givenName"$"mxsBusinessRegion"$"mxsBusinessRegionAcronym"$"mxsCrossCompanyManager"$"mxsExpectedEndDate"$"mxsGlobalID"$"mxsJobFamily"$"mxsLocationCode"$"mxsOrganizationChart"$"mxsOrganizationChartAcronym"$"mxsStartDate"$"mxsStatus"$"mxsTerminationDate"$"l"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"mail"$"manager"$"managerEmployeeNumber"$"mobile"$"ntUserDomainId"$"o"$"sn"$"telephoneNumber"$"uid"$

Open in new window

and that is something completely different (and unexpected): The delimiter is a dollar sign only, the double quotes are enclosing each field, but embedded quotes are not doubled. Even worse, the header does not contain unique column names. Whoever creates this file: shame on you!

This should work:
(get-content .\input.csv) -replace '"\$"', '|' -replace '"', '' -replace '\|', '"$"' -replace '(.+)\$', '"$1"$' | Out-File .\input.csv

Open in new window

0
 

Author Comment

by:motioneye
ID: 41769851
Hi,
One more issues that  puzzled me now, I tried with creating many entries in the text file by simply duplicating the 1st line in text file to multiple entry to be exact 1000 duplicate line. I ran the command again, at this time the size of the text file was double, as shown in the screenshot, any reasons why this happened?
Capture.PNG
0
 

Author Comment

by:motioneye
ID: 41769924
Hi Qlemo,
I managed to resolved this, by adding -encoding ASCII, than the file has the same size as the original file.
Btw, don't you mind to explain how this works at every -replace ? what actually it does here ...

-replace '"\$"', '|' -replace '"', '' -replace '\|', '"$"' -replace '(.+)\$', '"$1"$'
0
 
LVL 71

Accepted Solution

by:
Qlemo earned 2000 total points
ID: 41770003
Best if you try yourself on a small file, by removing all replaces but the first, then add the next etc., and view on screen.

  1. -replace '"\$"', '|' : Replace all "$" by |. I assume there will never be a pipe symbol in the data, so we can use that as field delimiter.
    The resulting line will have something like "data|data "quoted inline data"|data"$.
    Note the trailing dollar sign - it should not be there for correct CSV, and we will need to take care of it later.
    We need to escape the dollar sign with a backslash as dollar has a special meaning in regular expressions.
  2. -replace '"', '': Remove the remaining double quotes.
    Result: data|data quoted inline data|data$
  3. -replace '\|', '"$"': Replace the pipe back with the original delimiter "$". Again, pipe has a special regexp meaning and needs to be escaped.
    Result: data"$"data quoted inline data"$"data$
  4. -replace '(.+)\$', '"$1"$': This just adds the missing double quote in front of the trailing dollar sign.
    Result: data"$"data quoted inline data"$"data"$
0
 

Author Closing Comment

by:motioneye
ID: 41775772
Thanks Qlemo, you always be  nice and helping us here :)
0

Featured Post

Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Previously, on our Nano Server Deployment series, we've created a new nano server image and deployed it on a physical server in part 2. Now we will go through configuration.
In this post we will be converting StringData saved within a text file into a hash table. This can be further used in a PowerShell script for replacing settings that are dynamic in nature from environment to environment.
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question