kweise
asked on
Clean up numbers using sed
Hi, I have csv files with an issue. I need to clean up the numeric fields (not surrounded by quotes) by removing extra spaces, and moving the possible negative sign from the rear to the front. Unfortunately the data is coming from another system that I can't correct. I'm hoping for some sed commands to clean this up. For example, the following two lines:
"10447","857386007 CR","0001","03346004", 6.25-,"","","404672704 004","04-672704",""
"10486","13766","0001","03 403509", 435.00 ,"","","409D32180 001","09-D32180",""
The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these two lines would be:
"10447","857386007 CR","0001","03346004",-6.2 5,"","","4 04672704 004","04-672704",""
"10486","13766","0001","03 403509",43 5.00 ,"","","409D32180 001","09-D32180",""
My sed/regex skills are very minimal at this point, but maybe there's even a better way to correct this? Thanks for any assistance!
"10447","857386007 CR","0001","03346004", 6.25-,"","","404672704 004","04-672704",""
"10486","13766","0001","03
The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these two lines would be:
"10447","857386007 CR","0001","03346004",-6.2
"10486","13766","0001","03
My sed/regex skills are very minimal at this point, but maybe there's even a better way to correct this? Thanks for any assistance!
ASKER
Thanks, but it didn't seem to do anything at all. A diff between the original and the output shows no changes.
Sorry.
Try this first.
Try this first.
sed 's/ *\([-0-9.]*\)\([-]*\)/\2\1/g' filename
Is this working for you?
sed 's/ *\([-0-9.]*\)\([-]\)/\2\1/ g' filename
sed 's/ *\([-0-9.]*\)\([-]\)/\2\1/
ASKER
Thanks, that fixes the lines with the negative numbers, but the positive ones still have whitespace surrounding them.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Sorry missed the second reply, the second statement fixes the spacing problem, and the third fixes the spacing and negative sign on negative numbers only, but leaves spaces around positive numbers.
ASKER
Wow, you're fast. :- )
Which order am I running which statements now?
Which order am I running which statements now?
ASKER
Nevermind, I got it, awesome, thanks!
Any order, doesn't matter.
Make sure it is doing what you want. Then if you run with -i, it would directly change the file
Make sure it is doing what you want. Then if you run with -i, it would directly change the file
ASKER
Great solution and very quick too!
Glad to help:)
ASKER
Arrgh, I didn't notice, it's also removing spaces from the quoted fields also, can't have that. : -)
Ok, wait a minute, lets fix it.
ASKER
Also, so I can learn, can you basically explain how it's working? This is above my sed level, so would be good to understand it!
Ok, try this one:
sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-\)/\2\1/g' filename
For understanding, this is the starting point.
The
http://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
The
http://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
ASKER
Almost got it, now the final issue is the positive numbers have a trailing whitespace. The negative ones are perfect.
I am so sorry.
Try this one
Try this one
sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-*\)/\2\1/g'
Feel free to ask any more questions. Understanding RegEx is a lot of practice after learning some basics.
ASKER
Please don't be sorry, you've already given me a ton of help on fixing this! :- )
Did that last command work for you on my two test records? It's still doing the same thing for me, 1 space remains at the end of the positive numbers.
Did that last command work for you on my two test records? It's still doing the same thing for me, 1 space remains at the end of the positive numbers.
Actually, to find a rule was kind of hard - a rule that would only pick the right cases.
Anyhow, please try this one and let me know.
Anyhow, please try this one and let me know.
sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-*\)/\2\1/g' -e 's/\([0-9]\)[ ][ ]*,/\1,/g'
ASKER
No, I see, at some point, I didn't notice when, it started removing some data between dashes. ie, the line
"10486","13766","0001","03 403509", 435.00 ,"","","409D32180 001","09-D32180",""
is now
"10486","13766","0001","03 403509",43 5.00 ,"","","409D32180 001","-D32180","" - Missing "09" before "-D32180"
Another line with more data has been affected too. Somehow it needs to only change data between commas, without quotes (numeric). This might be too difficult for sed :-)
"10827","799608","0001","0 3401001", 165.00 ," "," ","5011400928 SIEU-300809-3 20101208003","01-140-0928 ","SIEU-300809-3"
is now
"10827","799608","0001","0 3401001",1 65.00,""," ","5011400 928 SIEU--3008093 20101208003","-01-1400928" ,"SIEU--30 08093"
"10486","13766","0001","03
is now
"10486","13766","0001","03
Another line with more data has been affected too. Somehow it needs to only change data between commas, without quotes (numeric). This might be too difficult for sed :-)
"10827","799608","0001","0
is now
"10827","799608","0001","0
Sed is not strong enough. I will have to use Perl one liner. Kindly open anther question for that, if you don't mind.
ASKER
ok, submitted under -
Experts Exchange
Programming
Prog Languages
Scripting Languages
Perl
Cleanup CSV file numeric fields
ID:27416423
Thanks!
Experts Exchange
Programming
Prog Languages
Scripting Languages
Perl
Cleanup CSV file numeric fields
ID:27416423
Thanks!
:)
See solution
See solution
Open in new window