mikedgibson
asked on
Perl CSV Manipulation - Split Single Record into Multiple Records
This is a followup to a previous question I asked http://www.experts-exchang e.com/Prog ramming/La nguages/Sc ripting/Pe rl/Q_26845 234.html
I have other data files that are part of the data migration that do not need the data to be cleansed but need to split the Notes column based on a timestamp RegEx.
For example, my first data file is like this
"Company Name", "Notes"
"Company A","September 17, 2010 3:28 PM This is the first note.<BR>September 17, 2010 4:28 PM This is the second note."
"Company B","May 1, 2010 4:18 PM This is the first note.<BR>May 15, 2010 1:12 PM This is the second note.<BR>May 31, 2010 1:00 PM This is the third note."
I need this to be in the format
"Company A","September 17, 2010 3:28 PM This is the first note."
"Company A","September 17, 2010 4:28 PM This is the second note."
"Company B","May 1, 2010 4:18 PM This is the first note."
"Company B","May 15, 2010 1:12 PM This is the second note."
"Company B","May 31, 2010 1:00 PM This is the third note."
I also have another data file that is similar but has more columns.
"Firstname", "Lastname", "Company Name", "Notes"
"First","Last","Company A","September 17, 2010 3:28 PM This is the first note.<BR>September 17, 2010 4:28 PM This is the second note."
"First","Last","Company B","May 1, 2010 4:18 PM This is the first note.<BR>May 15, 2010 1:12 PM This is the second note.<BR>May 31, 2010 1:00 PM This is the third note."
I need this output to be like
"First","Last","Company A","September 17, 2010 3:28 PM This is the first note."
"First","Last","Company A","September 17, 2010 4:28 PM This is the second note."
"First","Last","Company B","May 1, 2010 4:18 PM This is the first note."
"First","Last","Company B","May 15, 2010 1:12 PM This is the second note."
"First","Last","Company B","May 31, 2010 1:00 PM This is the third note."
Attached are the data samples in .csv format.
It would be great if this could be done in a single script that does both formats.
sample1.csv
sample2.csv
I have other data files that are part of the data migration that do not need the data to be cleansed but need to split the Notes column based on a timestamp RegEx.
For example, my first data file is like this
"Company Name", "Notes"
"Company A","September 17, 2010 3:28 PM This is the first note.<BR>September 17, 2010 4:28 PM This is the second note."
"Company B","May 1, 2010 4:18 PM This is the first note.<BR>May 15, 2010 1:12 PM This is the second note.<BR>May 31, 2010 1:00 PM This is the third note."
I need this to be in the format
"Company A","September 17, 2010 3:28 PM This is the first note."
"Company A","September 17, 2010 4:28 PM This is the second note."
"Company B","May 1, 2010 4:18 PM This is the first note."
"Company B","May 15, 2010 1:12 PM This is the second note."
"Company B","May 31, 2010 1:00 PM This is the third note."
I also have another data file that is similar but has more columns.
"Firstname", "Lastname", "Company Name", "Notes"
"First","Last","Company A","September 17, 2010 3:28 PM This is the first note.<BR>September 17, 2010 4:28 PM This is the second note."
"First","Last","Company B","May 1, 2010 4:18 PM This is the first note.<BR>May 15, 2010 1:12 PM This is the second note.<BR>May 31, 2010 1:00 PM This is the third note."
I need this output to be like
"First","Last","Company A","September 17, 2010 3:28 PM This is the first note."
"First","Last","Company A","September 17, 2010 4:28 PM This is the second note."
"First","Last","Company B","May 1, 2010 4:18 PM This is the first note."
"First","Last","Company B","May 15, 2010 1:12 PM This is the second note."
"First","Last","Company B","May 31, 2010 1:00 PM This is the third note."
Attached are the data samples in .csv format.
It would be great if this could be done in a single script that does both formats.
sample1.csv
sample2.csv
Hi. I'll look at it.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
#!/usr/bin/perl
use strict;
use warnings;
die "usage: $0 <csvfile>" if ! @ARGV;
my $csvfile = shift;
open my $csv_fh, '<', $csvfile or die "can't open '$csvfile' $!";
my @header = split(/,\s?/, <$csv_fh>);
chomp $header[-1];
s/"//g for @header;
while ( my $line = <$csv_fh> ) {
chomp $line;
next if $line =~ /^\s*$/;
my %csv_fields;
@csv_fields{@header} = split(/,\s?/, $line, scalar @header);
s/"//g for @csv_fields{@header};
$csv_fields{'Notes'} = [ split(/<br>/i, $csv_fields{'Notes'}) ];
for my $i ( 0..$#{ $csv_fields{'Notes'} } ) {
print qq("$csv_fields{$_}",) for @header[0..$#header -1];
print qq("$csv_fields{'Notes'}->[$i]"\n);
}
}