• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 262
  • Last Modified:

Clean up numbers using sed

Hi, I have csv files with an issue. I need to clean up the numeric fields (not surrounded by quotes) by removing extra spaces, and moving the possible negative sign from the rear to the front. Unfortunately the data is coming from another system that I can't correct. I'm hoping for some sed commands to clean this up. For example, the following two lines:

"10447","857386007 CR","0001","03346004",      6.25-,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",    435.00 ,"","","409D32180                              001","09-D32180",""

The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these two lines would be:

"10447","857386007 CR","0001","03346004",-6.25,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",435.00 ,"","","409D32180                              001","09-D32180",""

My sed/regex skills are very minimal at this point, but maybe there's even a better way to correct this? Thanks for any assistance!
0
kweise
Asked:
kweise
  • 14
  • 12
1 Solution
 
farzanjCommented:
Try
sed 's/  \(*[-0-9.]*\)\([-]*\)/\2\1/g' filename

Open in new window

0
 
kweiseAuthor Commented:
Thanks, but it didn't seem to do anything at all. A diff between the original and the output shows no changes.
0
 
farzanjCommented:
Sorry.

Try this first.


sed 's/  *\([-0-9.]*\)\([-]*\)/\2\1/g' filename

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
farzanjCommented:
Is this working for you?


sed 's/  *\([-0-9.]*\)\([-]\)/\2\1/g' filename
0
 
kweiseAuthor Commented:
Thanks, that fixes the lines with the negative numbers, but the positive ones still have whitespace surrounding them.
0
 
farzanjCommented:
Run this one for those lines

sed 's/  *\([-0-9.]*\)/\1/g' filename
0
 
kweiseAuthor Commented:
Sorry missed the second reply, the second statement fixes the spacing problem, and the third fixes the spacing and negative sign on negative numbers only, but leaves spaces around positive numbers.
0
 
kweiseAuthor Commented:
Wow, you're fast. :- )
Which order am I running which statements now?
0
 
kweiseAuthor Commented:
Nevermind, I got it, awesome, thanks!
0
 
farzanjCommented:
Any order, doesn't matter.

Make sure it is doing what you want.  Then if you run with -i, it would directly change the file
0
 
kweiseAuthor Commented:
Great solution and very quick too!
0
 
farzanjCommented:
Glad to help:)
0
 
kweiseAuthor Commented:
Arrgh, I didn't notice, it's also removing spaces from the quoted fields also, can't have that. : -)
0
 
farzanjCommented:
Ok, wait a minute, lets fix it.
0
 
kweiseAuthor Commented:
Also, so I can learn, can you basically explain how it's working? This is above my sed level, so would be good to understand it!
0
 
farzanjCommented:
Ok, try this one:

sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-\)/\2\1/g' filename

Open in new window

0
 
farzanjCommented:
For understanding, this is the starting point.
The
http://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
0
 
kweiseAuthor Commented:
Almost got it, now the final issue is the positive numbers have a trailing whitespace. The negative ones are perfect.
0
 
farzanjCommented:
I am so sorry.

Try this one
sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-*\)/\2\1/g'

Open in new window

0
 
farzanjCommented:
Feel free to ask any more questions.  Understanding RegEx is a lot of practice after learning some basics.
0
 
kweiseAuthor Commented:
Please don't be sorry, you've already given me a ton of help on fixing this! :- )

Did that last command work for you on my two test records? It's still doing the same thing for me, 1 space remains at the end of the positive numbers.
0
 
farzanjCommented:
Actually, to find a rule was kind of hard - a rule that would only pick the right cases.

Anyhow, please try this one and let me know.


sed -e 's/\([,]\)[ \t][ \t]*\([0-9-]\)/\1\2/g' -e 's/\([.0-9]*\)\(-*\)/\2\1/g' -e 's/\([0-9]\)[ ][ ]*,/\1,/g'

Open in new window

0
 
kweiseAuthor Commented:
No, I see, at some point, I didn't notice when, it started removing some data between dashes. ie, the line

"10486","13766","0001","03403509",    435.00 ,"","","409D32180                              001","09-D32180",""
is now
"10486","13766","0001","03403509",435.00 ,"","","409D32180                              001","-D32180","" - Missing "09" before "-D32180"

Another line with more data has been affected too. Somehow it needs to only change data between commas, without quotes (numeric). This might be too difficult for sed :-)

"10827","799608","0001","03401001",    165.00 ," "," ","5011400928 SIEU-300809-3       20101208003","01-140-0928 ","SIEU-300809-3"
is now
"10827","799608","0001","03401001",165.00,"","","5011400928 SIEU--3008093       20101208003","-01-1400928","SIEU--3008093"
0
 
farzanjCommented:
Sed is not strong enough. I will have to use Perl one liner.  Kindly open anther question for that, if you don't mind.
0
 
kweiseAuthor Commented:
ok, submitted under -    
    Experts Exchange
    Programming
    Prog Languages
    Scripting Languages
    Perl
    Cleanup CSV file numeric fields

ID:27416423

Thanks!
0
 
farzanjCommented:
:)

See solution
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 14
  • 12
Tackle projects and never again get stuck behind a technical roadblock.
Join Now