Replacing multiple embedded commas in a CSV file with sed

Dan Kaib
Dan Kaib used Ask the Experts™
on
I have a CSV file with fields containing multiple embedded commas in some fields that I need to change to a different character.

The line below is from a CSV file.

1511,Charleston Newspapers Inc      ,0108,Hamburg PA,"03,11,2017",03/11/2017 ,03
/2017,-1.71,03/13/2017 ,Unpaid,,,,,C210\quser,C210\quser,C210\quser,,No Approval
 Needed,^M

Using the sed command:

sed 's/\("[^,"]*\),\([^"]*"\)/\1;\2/g' IN.CSV > OUT.CSV

The above sed command changed "03,11,2017" to "03;11,2017".  I need to change all the commas in "03,11,2017" to semicolons "03;11;2017".

I am running AIX 7.1

Any suggestions would be greatly appreciated.

TIA,
Dan
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2013
Top Expert 2013

Commented:
Why not just

sed 's/\("[0-9]*\),\([0-9]*\),\([0-9]*"\)/\1;\2;\3/' IN.CSV > OUT.CSV

Author

Commented:
Hi woolmilkporc,

Thank you for the reply.  That does work if there are 2 commas embedded in the field but not if there is only one or more than two.
I was hoping to change all the commas in the embedded field no matter how many exist in the field.

Thanks again,
Dan
Most Valuable Expert 2013
Top Expert 2013

Commented:
This works if the field in question is the only one surrounded by double quotes:

awk -F"\"" '{OFS="\""; gsub(",",";",$2); print}' IN.CSV > OUT.CSV
Fundamentals of JavaScript

Learn the fundamentals of the popular programming language JavaScript so that you can explore the realm of web development.

Author

Commented:
Hi woolmilkporc,

Thank you for the reply.   That works GREAT for a single embedded field but not for multiple embedded fields.
Do you know of a way to handle multiple embedded fields?
The end-users have the potential to put a comma in many different fields; I'm trying to eliminate the commas before they create a problem.
There are users at 110+ stores entering data.

Thanks for your help,
Dan

Author

Commented:
Hi woolmilkporc,

The sed command in my question handles multiple embedded fields but not multiple embedded commas within the field.
Not sure if that helps any.

Dan
Most Valuable Expert 2013
Top Expert 2013
Commented:
Here is a better solution:

awk -F"\"" '{OFS="\""; for (k=2; k<=NF; k+=2) gsub(",",";",$k); print}' IN.CSV
Most Valuable Expert 2013
Top Expert 2013

Commented:
Please note that I edited my comment above!

Author

Commented:
Hi woolmilkporc,

Thank you for the reply.

The only way I found to handle multiple embedded fields with multiple commas is similar to your last awk solution:

sed 's/\("[^,"]*\),\([^"]*"\)/\1;\2/g; s/\("[^,"]*\),\([^"]*"\)/\1;\2/g' IN.CSV > OUT.CSV

This takes out 2 commas from multiple embedded fields.

You have been a GREAT help and provided many options.

Thank you for everything,
Dan
Most Valuable Expert 2013
Top Expert 2013

Commented:
Seems you didn't notice the changes I made to my comment! I know, I should have been faster or I should have posted a new one, sorry!

Author

Commented:
Hi woolmilkporc,

My fault, I was too quick to close the question figuring there was nothing else to do.
I did see your change right after I closed the question.
I tested it and it is the perfect solution.

I tried to send you a message but I don't think I did it correctly.
It doesn't seem to like the Recipient woolmilkporc.

Thank you again,
Dan

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial