• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 261
  • Last Modified:

Cleanup CSV file numeric fields

Hi, I have csv files with an issue. I need to clean up the numeric fields (not surrounded by quotes) by removing extra spaces, and moving the possible negative sign from the rear to the front. Unfortunately the data is coming from another system that I can't correct. I'm hoping for some sed commands to clean this up. For example, the following two lines:

"10447","857386007 CR","0001","03346004",      6.25-,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",    435.00 ,"","","409D32180                              001","09-D32180",""
"10827","799608","0001","03401001",    165.00 ," "," ","5011400928 SIEU-300809-3       20101208003","01-140-0928 ","SIEU-300809-3"

The first, needs the beginning whitespace removed, and the negative sign moved from the rear of the field (6.25-) to the front (-6.25). The second line would just need the beginning and trailing whitespace removed. So basically, remove beginning/trailing whitespace from a field surrounded by commas and not containing quotes. And move the negative sign, if present. Result of these lines would be:

"10447","857386007 CR","0001","03346004",-6.25,"","","404672704                              004","04-672704",""
"10486","13766","0001","03403509",435.00 ,"","","409D32180                              001","09-D32180",""
"10827","799608","0001","03401001",165.00 ," "," ","5011400928 SIEU-300809-3       20101208003","01-140-0928 ","SIEU-300809-3"

Thanks!
0
kweise
Asked:
kweise
  • 5
  • 3
1 Solution
 
farzanjCommented:
I am sure there will be some modifications required because of the complexity.

Please try this one.


perl -pe 's/([^"])\s+(-?[.-\w]+)(-)?\s*([^"])/$1$2$3$4/g' filename

Open in new window

0
 
farzanjCommented:
Try this one too
perl -pe 's/([,])\s+(-?[.-\d]+)(-)?\s*([,])/$1$2$3$4/g' filename

Open in new window

0
 
kweiseAuthor Commented:
The first one changed
"10560","10471","0001","03401015",    202.80 ,"","","5158670042 HLXU-654201-6       20101214001","15-867-0042","HLXU-654201-6"
into
"10560","10471","0001","03401015",202.80,"","","5158670042HLXU-654201-620101214001","15-867-0042","HLXU-654201-6"
so it removed all spaces from inside the quotes.

The second seemed better in that regard
"10560","10471","0001","03401015",202.80,"","","5158670042 HLXU-654201-6       20101214001","15-867-0042","HLXU-654201-6"

However both left the negative signs at the end of the numbers.
"10447","857260895 CR","0001","03346004",393.75-,"","","404647375                              003","04-647375",""
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
farzanjCommented:
Ok,

Try this one.  Any luck?


perl -pe 's/([,])\s+(-?[.\d]+)(-)?\s*([,])/$1$3$2$4/g'

Open in new window

0
 
kweiseAuthor Commented:
That one seems to work. I have to run out on an errand right now, but I will do some more testing on it to be sure. Thanks for all your help. Will award the points when I return and re-check it in an hour or so.
0
 
farzanjCommented:
Sure, not a problem
0
 
kweiseAuthor Commented:
Again, fast help, thanks! This one seems to do the trick, worked on a second file also!
0
 
farzanjCommented:
Great!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now