Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 692
  • Last Modified:

remove special charaters from a .CSV file

I want a script to  remove all the special characters from a .CSV file.  Thanks in advance
Contents of the file is like follows:

WF_CRIT_INCL_CD,Übereinst. erf. (Schreibw. e.),EQ_THIS_NC,German,Common,7,N,Y,Y,
All,
WF_CRIT_INCL_CD,K.Üb.einst.mgl.(G/Kl.schr.ign),NE_ALL_NC,German,Common,8,N,Y,Y,All,
WF_CRIT_INCL_CD,Größer als,GREATER_THAN,German,Common,9,N,Y,Y,All,
WF_CRIT_INCL_CD,Kleiner als,LESS_THAN,German,Common,10,N,Y,Y,All,
WF_CRIT_INCL_CD,Zwischen,BETWEEN,German,Common,11,N,Y,Y,All
WF_CRIT_INCL_CD,Nicht zwischen,NOT_BETWEEN,German,Common,12,N,Y,Y,All
WF_CRIT_INCL_CD,Gleich Null,IS_NULL,German,Common,13,N,Y,Y,All
WF_CRIT_INCL_CD,Ungleich Null,IS_NOT_NULL,German,Common,14,N,Y,Y,All
WF_CRIT_TYPE_CD,Prozesseigenschaft,PROPERTY,German,Workflow,1,N,Y,Y,All
WF_CRIT_TYPE_CD,Task-Eigenschaft,TASK_PROPERTY,German,Taskflow,1,N,Y,Y,All
WF_CRIT_TYPE_CD,Business Component,BUSCOMP,German,Common,2,N,Y,Y,All
WF_CRIT_TYPE_CD,Applet,APPLET,German,Common,3,N,Y,Y,All
WF_CRIT_TYPE_CD,Ausdruck,EXPRESSION,German,Common,4,N,Y,Y,All
WF_DATA_TYPE_CD,String,VARCHAR,German,Common,1,N,Y,Y,All
WF_DATA_TYPE_CD,Nu
0
conversekid
Asked:
conversekid
  • 5
  • 5
  • 2
  • +1
9 Solutions
 
ozoCommented:
which characters are special?
0
 
conversekidAuthor Commented:
In this example there are not special characters. I just wanted to give the format. We need to find and remove all the special characters if any..
0
 
ozoCommented:
what are "special characters"?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
conversekidAuthor Commented:
Hi ,

The following are special characters:

@"?/>.<,:;""'{[}]|\\+=_-)(*&^%$#@!~`";  Thanks
0
 
zmoCommented:
well then

cat file | sed 's/[@\?\/>.<,:;{[}]|\\+=_-)(*&^%$#@!~`";]//g'  > file.nospecialchar
0
 
zmoCommented:
with the correct escaping of course :-S

or maybe you could just use all non word characters, but as I've seen you also have non-ascii characters that are considered as valid
0
 
zmoCommented:
cat file | sed 's/\[[]!@#$%^&\*(){}",.><~`\]*//g' | sed "s/'*//g" > file.nospecialchar

would be a solution
0
 
Brian UtterbackPrinciple Software EngineerCommented:
You can do the same thing more efficiently with the tr command:

/usr/bin/tr -d '[:punct:]' < file > file.nospecialchar

This is assuming you are using the "C" locale. run the locale command to check, if you don't know.


 
0
 
conversekidAuthor Commented:
Hi,

Great!  Is it possible for me to have a small script so that I can change the characters to be removed from the file?
0
 
zmoCommented:
but I'd advice you to use blu's suggestion, though you can't modify what is a "punct"uation there, but it will respect your locale.

about the script below, be careful, there are some characters to be backslashed, other not...

you use the script below that way :

script.sh source.csv destination.file
#!/bin/sh
 
cat $1 | sed 's/\[[]!@#$%^&\*(){}",.><~`\]*//g' | sed "s/'*//g" > $2

Open in new window

0
 
Brian UtterbackPrinciple Software EngineerCommented:
Well, in the case of the tr with the character class '[:punct:]' the answer is no, because each locale defined has its own set of characters that are recognized as punctuation. However, you give the characters to the tr command in the same way
that zino did to the sed command. Just replace the [:punct:] part with the list of characters you want to delete.

It can get tricky depending on your shell and the characters you want to remove. Some of the letters will possibly need to be escaped. You may even need to use the octal escape version. See the tr man page.
0
 
conversekidAuthor Commented:
Hi, I definitely agree with blu's suggestion. I am trying to get it as a script so that I can give it to users.

zmo, if possible can you help me understand why we are using sed twice...and is it possible to take the input from the user as in what to remove and then remove the same....only if its easy to make it that way.
0
 
zmoCommented:
I did sed twice because of a weird reason I could'nt put the single quote character escaped in the first sed.

You can use the "read" command (man read) to get the characters, but be careful of what you take there, because some characters will need to be escaped others not...
0
 
conversekidAuthor Commented:
Thanks to Zmo and blu for your quick response..!!!!!!!!!
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 5
  • 5
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now