Link to home
Start Free TrialLog in
Avatar of kweise
kweise

asked on

Cleanup CSV file numeric fields

Hi, I have csv files with an issue. I need to clean up the numeric fields (not surrounded by quotes) by removing extra spaces, and moving the possible negative sign from the rear to the front. Unfortunately the data is coming from another system that I can't correct. I'm hoping for some sed commands to clean this up. For example, the following two lines:

"10447","857386007 CR","0001","03346004",      6.25-,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",    435.00 ,"","","409D32180                              001","09-D32180",""
"10827","799608","0001","03401001",    165.00 ," "," ","5011400928 SIEU-300809-3       20101208003","01-140-0928 ","SIEU-300809-3"

The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these lines would be:

"10447","857386007 CR","0001","03346004",-6.25,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",435.00 ,"","","409D32180                              001","09-D32180",""
"10827","799608","0001","03401001",165.00 ," "," ","5011400928 SIEU-300809-3       20101208003","01-140-0928 ","SIEU-300809-3"

Thanks!
Avatar of farzanj
farzanj
Flag of Canada image

I am sure there will be some modifications required because of the complexity.

Please try this one.


perl -pe 's/([^"])\s+(-?[.-\w]+)(-)?\s*([^"])/$1$2$3$4/g' filename

Open in new window

Try this one too
perl -pe 's/([,])\s+(-?[.-\d]+)(-)?\s*([,])/$1$2$3$4/g' filename

Open in new window

Avatar of kweise
kweise

ASKER

The first one changed
"10560","10471","0001","03401015",    202.80 ,"","","5158670042 HLXU-654201-6       20101214001","15-867-0042","HLXU-654201-6"
into
"10560","10471","0001","03401015",202.80,"","","5158670042HLXU-654201-620101214001","15-867-0042","HLXU-654201-6"
so it removed all spaces from inside the quotes.

The second seemed better in that regard
"10560","10471","0001","03401015",202.80,"","","5158670042 HLXU-654201-6       20101214001","15-867-0042","HLXU-654201-6"

However both left the negative signs at the end of the numbers.
"10447","857260895 CR","0001","03346004",393.75-,"","","404647375                              003","04-647375",""
ASKER CERTIFIED SOLUTION
Avatar of farzanj
farzanj
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of kweise

ASKER

That one seems to work. I have to run out on an errand right now, but I will do some more testing on it to be sure. Thanks for all your help. Will award the points when I return and re-check it in an hour or so.
Sure, not a problem
Avatar of kweise

ASKER

Again, fast help, thanks! This one seems to do the trick, worked on a second file also!
Great!