[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 640
  • Last Modified:

Manipulating CSV files (conditional editing of individual columns) with awk, sed, or other standard linux tools?

I have a whole slew of CSV files (originally Excel), trying to standardize them before importing into a database.  Two transformations/clean-ups have me stumped:

1.  Column 4 sometimes has a single value, or two values with the word "and" between them.  Trying to figure out how to either split this into two columns (if the word "and" is present), or create a blank column if not.  I.e.:

LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD and TOYOTA should become
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA

and

LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, HYUNDAI should become
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, , HYUNDAI

2.  Column 5 has a date RANGE, (start date / end date) in a variety of formats; sometimes dd-dd mon yy, sometimes dd-mon-yy - dd-mon-yy, or (for 1-day ranges) just a single dd-mon-yy.  Want to split this into two columns, and for single-day ranges just copy the date to both columns.

Any help greatly appreciated!!!
0
simmoja3
Asked:
simmoja3
  • 3
  • 2
2 Solutions
 
louislietaerCommented:
tested solution for 1)

perl -pi -e 's/ and /,/g' yourfile
perl -pi -e 's/ AND /,/g' yourfile
perl -pi -e 's/,,/,/g' yourfile
0
 
louislietaerCommented:
for point 2) can you prodide a sample
0
 
simmoja3Author Commented:
Regarding your first... it seems like the third line would REMOVE the empty column, when in fact I need that empty column if there's no "AND" present.  Or am I misunderstanding (certainly possible!)?

For #2:

LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA, 20-24 MAR 09 becomes
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA, 20 MAR 09, 24 MAR 09

LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA, 20 FEB 09 becomes
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA, 20 FEB 09, 20 FEB 09

LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA, 20 FEB 09 - 04 MAR 09 becomes
LAST_NAME, FIRST_NAME, SSN_LAST_FOUR, FORD, TOYOTA, 20 FEB 09, 04 MAR 09

Thanks for the help!  Greatly appreciated.
0
Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

 
louislietaerCommented:
For point 1) you are right the for the thrid line, you need is a bit more tricky I will work on it.

Is it ok if the result look like :

LAST_NAME, FIRST_NAME, SSN_LAST_FOUR,HYUNDAI,
0
 
leonstrykerCommented:
Of course there is always the option of transforming the data in Excel itself. Let me know if you are interested in this solution.
0
 
leonstrykerCommented:
This question has been classified as abandoned and is being closed as part of the Cleanup Program. See my comment at the end of the question for more details.
0

Featured Post

Configuration Guide and Best Practices

Read the guide to learn how to orchestrate Data ONTAP, create application-consistent backups and enable fast recovery from NetApp storage snapshots. Version 9.5 also contains performance and scalability enhancements to meet the needs of the largest enterprise environments.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now