Manipulating CSV files (conditional editing of individual columns) with awk, sed, or other standard linux tools?
Posted on 2009-04-26
I have a whole slew of CSV files (originally Excel), trying to standardize them before importing into a database. Two transformations/clean-ups have me stumped:
1. Column 4 sometimes has a single value, or two values with the word "and" between them. Trying to figure out how to either split this into two columns (if the word "and" is present), or create a blank column if not. I.e.:
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD and TOYOTA should become
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, HYUNDAI should become
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, , HYUNDAI
2. Column 5 has a date RANGE, (start date / end date) in a variety of formats; sometimes dd-dd mon yy, sometimes dd-mon-yy - dd-mon-yy, or (for 1-day ranges) just a single dd-mon-yy. Want to split this into two columns, and for single-day ranges just copy the date to both columns.
Any help greatly appreciated!!!