Link to home
Start Free TrialLog in
Avatar of kweise
kweise

asked on

Clean up numbers using sed

Hi, I have csv files with an issue. I need to clean up the numeric fields (not surrounded by quotes) by removing extra spaces, and moving the possible negative sign from the rear to the front. Unfortunately the data is coming from another system that I can't correct. I'm hoping for some sed commands to clean this up. For example, the following two lines:

"10447","857386007 CR","0001","03346004",      6.25-,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",    435.00 ,"","","409D32180                              001","09-D32180",""

The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these two lines would be:

"10447","857386007 CR","0001","03346004",-6.25,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",435.00 ,"","","409D32180                              001","09-D32180",""

My sed/regex skills are very minimal at this point, but maybe there's even a better way to correct this? Thanks for any assistance!
Avatar of farzanj
farzanj
Flag of Canada image

Try
sed 's/  \(*[-0-9.]*\)\([-]*\)/\2\1/g' filename

Open in new window

Avatar of kweise
kweise

ASKER

Thanks, but it didn't seem to do anything at all. A diff between the original and the output shows no changes.
Sorry.

Try this first.


sed 's/  *\([-0-9.]*\)\([-]*\)/\2\1/g' filename

Open in new window

Is this working for you?


sed 's/  *\([-0-9.]*\)\([-]\)/\2\1/g' filename
Avatar of kweise

ASKER

Thanks, that fixes the lines with the negative numbers, but the positive ones still have whitespace surrounding them.
ASKER CERTIFIED SOLUTION
Avatar of farzanj
farzanj
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of kweise

ASKER

Sorry missed the second reply, the second statement fixes the spacing problem, and the third fixes the spacing and negative sign on negative numbers only, but leaves spaces around positive numbers.
Avatar of kweise

ASKER

Wow, you're fast. :- )
Which order am I running which statements now?
Avatar of kweise

ASKER

Nevermind, I got it, awesome, thanks!
Any order, doesn't matter.

Make sure it is doing what you want.  Then if you run with -i, it would directly change the file
Avatar of kweise

ASKER

Great solution and very quick too!
Glad to help:)
Avatar of kweise

ASKER

Arrgh, I didn't notice, it's also removing spaces from the quoted fields also, can't have that. : -)
Ok, wait a minute, lets fix it.
Avatar of kweise

ASKER

Also, so I can learn, can you basically explain how it's working? This is above my sed level, so would be good to understand it!
Ok, try this one:

sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-\)/\2\1/g' filename

Open in new window

Avatar of kweise

ASKER

Almost got it, now the final issue is the positive numbers have a trailing whitespace. The negative ones are perfect.
I am so sorry.

Try this one
sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-*\)/\2\1/g'

Open in new window

Feel free to ask any more questions.  Understanding RegEx is a lot of practice after learning some basics.
Avatar of kweise

ASKER

Please don't be sorry, you've already given me a ton of help on fixing this! :- )

Did that last command work for you on my two test records? It's still doing the same thing for me, 1 space remains at the end of the positive numbers.
Actually, to find a rule was kind of hard - a rule that would only pick the right cases.

Anyhow, please try this one and let me know.


sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-*\)/\2\1/g' -e 's/\([0-9]\)[ ][ ]*,/\1,/g'

Open in new window

Avatar of kweise

ASKER

No, I see, at some point, I didn't notice when, it started removing some data between dashes. ie, the line

"10486","13766","0001","03403509",    435.00 ,"","","409D32180                              001","09-D32180",""
is now
"10486","13766","0001","03403509",435.00 ,"","","409D32180                              001","-D32180","" - Missing "09" before "-D32180"

Another line with more data has been affected too. Somehow it needs to only change data between commas, without quotes (numeric). This might be too difficult for sed :-)

"10827","799608","0001","03401001",    165.00 ," "," ","5011400928 SIEU-300809-3       20101208003","01-140-0928 ","SIEU-300809-3"
is now
"10827","799608","0001","03401001",165.00,"","","5011400928 SIEU--3008093       20101208003","-01-1400928","SIEU--3008093"
Sed is not strong enough. I will have to use Perl one liner.  Kindly open anther question for that, if you don't mind.
Avatar of kweise

ASKER

ok, submitted under -    
    Experts Exchange
    Programming
    Prog Languages
    Scripting Languages
    Perl
    Cleanup CSV file numeric fields

ID:27416423

Thanks!
:)

See solution