Solved

remove special charaters from a .CSV file

Posted on 2008-10-23
14
686 Views
Last Modified: 2013-12-27
I want a script to  remove all the special characters from a .CSV file.  Thanks in advance
Contents of the file is like follows:

WF_CRIT_INCL_CD,Übereinst. erf. (Schreibw. e.),EQ_THIS_NC,German,Common,7,N,Y,Y,
All,
WF_CRIT_INCL_CD,K.Üb.einst.mgl.(G/Kl.schr.ign),NE_ALL_NC,German,Common,8,N,Y,Y,All,
WF_CRIT_INCL_CD,Größer als,GREATER_THAN,German,Common,9,N,Y,Y,All,
WF_CRIT_INCL_CD,Kleiner als,LESS_THAN,German,Common,10,N,Y,Y,All,
WF_CRIT_INCL_CD,Zwischen,BETWEEN,German,Common,11,N,Y,Y,All
WF_CRIT_INCL_CD,Nicht zwischen,NOT_BETWEEN,German,Common,12,N,Y,Y,All
WF_CRIT_INCL_CD,Gleich Null,IS_NULL,German,Common,13,N,Y,Y,All
WF_CRIT_INCL_CD,Ungleich Null,IS_NOT_NULL,German,Common,14,N,Y,Y,All
WF_CRIT_TYPE_CD,Prozesseigenschaft,PROPERTY,German,Workflow,1,N,Y,Y,All
WF_CRIT_TYPE_CD,Task-Eigenschaft,TASK_PROPERTY,German,Taskflow,1,N,Y,Y,All
WF_CRIT_TYPE_CD,Business Component,BUSCOMP,German,Common,2,N,Y,Y,All
WF_CRIT_TYPE_CD,Applet,APPLET,German,Common,3,N,Y,Y,All
WF_CRIT_TYPE_CD,Ausdruck,EXPRESSION,German,Common,4,N,Y,Y,All
WF_DATA_TYPE_CD,String,VARCHAR,German,Common,1,N,Y,Y,All
WF_DATA_TYPE_CD,Nu
0
Comment
Question by:conversekid
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 2
  • +1
14 Comments
 
LVL 84

Assisted Solution

by:ozo
ozo earned 50 total points
ID: 22793982
which characters are special?
0
 

Author Comment

by:conversekid
ID: 22794011
In this example there are not special characters. I just wanted to give the format. We need to find and remove all the special characters if any..
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 50 total points
ID: 22794083
what are "special characters"?
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 

Author Comment

by:conversekid
ID: 22794330
Hi ,

The following are special characters:

@"?/>.<,:;""'{[}]|\\+=_-)(*&^%$#@!~`";  Thanks
0
 
LVL 5

Assisted Solution

by:zmo
zmo earned 250 total points
ID: 22795025
well then

cat file | sed 's/[@\?\/>.<,:;{[}]|\\+=_-)(*&^%$#@!~`";]//g'  > file.nospecialchar
0
 
LVL 5

Assisted Solution

by:zmo
zmo earned 250 total points
ID: 22795051
with the correct escaping of course :-S

or maybe you could just use all non word characters, but as I've seen you also have non-ascii characters that are considered as valid
0
 
LVL 5

Assisted Solution

by:zmo
zmo earned 250 total points
ID: 22795072
cat file | sed 's/\[[]!@#$%^&\*(){}",.><~`\]*//g' | sed "s/'*//g" > file.nospecialchar

would be a solution
0
 
LVL 22

Accepted Solution

by:
blu earned 200 total points
ID: 22795369
You can do the same thing more efficiently with the tr command:

/usr/bin/tr -d '[:punct:]' < file > file.nospecialchar

This is assuming you are using the "C" locale. run the locale command to check, if you don't know.


 
0
 

Author Comment

by:conversekid
ID: 22798948
Hi,

Great!  Is it possible for me to have a small script so that I can change the characters to be removed from the file?
0
 
LVL 5

Assisted Solution

by:zmo
zmo earned 250 total points
ID: 22799021
but I'd advice you to use blu's suggestion, though you can't modify what is a "punct"uation there, but it will respect your locale.

about the script below, be careful, there are some characters to be backslashed, other not...

you use the script below that way :

script.sh source.csv destination.file
#!/bin/sh
 
cat $1 | sed 's/\[[]!@#$%^&\*(){}",.><~`\]*//g' | sed "s/'*//g" > $2

Open in new window

0
 
LVL 22

Assisted Solution

by:blu
blu earned 200 total points
ID: 22799036
Well, in the case of the tr with the character class '[:punct:]' the answer is no, because each locale defined has its own set of characters that are recognized as punctuation. However, you give the characters to the tr command in the same way
that zino did to the sed command. Just replace the [:punct:] part with the list of characters you want to delete.

It can get tricky depending on your shell and the characters you want to remove. Some of the letters will possibly need to be escaped. You may even need to use the octal escape version. See the tr man page.
0
 

Author Comment

by:conversekid
ID: 22799316
Hi, I definitely agree with blu's suggestion. I am trying to get it as a script so that I can give it to users.

zmo, if possible can you help me understand why we are using sed twice...and is it possible to take the input from the user as in what to remove and then remove the same....only if its easy to make it that way.
0
 
LVL 5

Assisted Solution

by:zmo
zmo earned 250 total points
ID: 22799521
I did sed twice because of a weird reason I could'nt put the single quote character escaped in the first sed.

You can use the "read" command (man read) to get the characters, but be careful of what you take there, because some characters will need to be escaped others not...
0
 

Author Comment

by:conversekid
ID: 22802076
Thanks to Zmo and blu for your quick response..!!!!!!!!!
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Suggested Courses

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question