kweise
asked on
Cleanup CSV file numeric fields
Hi, I have csv files with an issue. I need to clean up the numeric fields (not surrounded by quotes) by removing extra spaces, and moving the possible negative sign from the rear to the front. Unfortunately the data is coming from another system that I can't correct. I'm hoping for some sed commands to clean this up. For example, the following two lines:
"10447","857386007 CR","0001","03346004", 6.25-,"","","404672704 004","04-672704",""
"10486","13766","0001","03 403509", 435.00 ,"","","409D32180 001","09-D32180",""
"10827","799608","0001","0 3401001", 165.00 ," "," ","5011400928 SIEU-300809-3 20101208003","01-140-0928 ","SIEU-300809-3"
The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these lines would be:
"10447","857386007 CR","0001","03346004",-6.2 5,"","","4 04672704 004","04-672704",""
"10486","13766","0001","03 403509",43 5.00 ,"","","409D32180 001","09-D32180",""
"10827","799608","0001","0 3401001",1 65.00 ," "," ","5011400928 SIEU-300809-3 20101208003","01-140-0928 ","SIEU-300809-3"
Thanks!
"10447","857386007 CR","0001","03346004", 6.25-,"","","404672704 004","04-672704",""
"10486","13766","0001","03
"10827","799608","0001","0
The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these lines would be:
"10447","857386007 CR","0001","03346004",-6.2
"10486","13766","0001","03
"10827","799608","0001","0
Thanks!
Try this one too
perl -pe 's/([,])\s+(-?[.-\d]+)(-)?\s*([,])/$1$2$3$4/g' filename
ASKER
The first one changed
"10560","10471","0001","03 401015", 202.80 ,"","","5158670042 HLXU-654201-6 20101214001","15-867-0042" ,"HLXU-654 201-6"
into
"10560","10471","0001","03 401015",20 2.80,"","" ,"51586700 42HLXU-654 201-620101 214001","1 5-867-0042 ","HLXU-65 4201-6"
so it removed all spaces from inside the quotes.
The second seemed better in that regard
"10560","10471","0001","03 401015",20 2.80,"","" ,"51586700 42 HLXU-654201-6 20101214001","15-867-0042" ,"HLXU-654 201-6"
However both left the negative signs at the end of the numbers.
"10447","857260895 CR","0001","03346004",393. 75-,"","", "404647375 003","04-647375",""
"10560","10471","0001","03
into
"10560","10471","0001","03
so it removed all spaces from inside the quotes.
The second seemed better in that regard
"10560","10471","0001","03
However both left the negative signs at the end of the numbers.
"10447","857260895 CR","0001","03346004",393.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That one seems to work. I have to run out on an errand right now, but I will do some more testing on it to be sure. Thanks for all your help. Will award the points when I return and re-check it in an hour or so.
Sure, not a problem
ASKER
Again, fast help, thanks! This one seems to do the trick, worked on a second file also!
Great!
Please try this one.
Open in new window