dlnewman70
asked on
Compress Rows of Similar Data into One File Using Perl
We have a data file with the following rows of data:
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
We need to have some way to process the file so that it combines similar rows of data into one line. So the end result would look like
A, B, C, 3
A1, B, C, 2
Because it adds the 1st and 3rd rows of data together.
We have a file with over two million rows of data but many of them could be combined into one row and are trying to come up with an automated way using Perl to reduce the size of the file before we load it into a database.
Any help would be appreciated or even a suggestion on approach to doing this.
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
We need to have some way to process the file so that it combines similar rows of data into one line. So the end result would look like
A, B, C, 3
A1, B, C, 2
Because it adds the 1st and 3rd rows of data together.
We have a file with over two million rows of data but many of them could be combined into one row and are trying to come up with an automated way using Perl to reduce the size of the file before we load it into a database.
Any help would be appreciated or even a suggestion on approach to doing this.
If you sort the file first, do you get a result that causes all of the "similar" rows to be together? Is a sorted result acceptable?
perl -lne '1..1 and print and next; /(.*),(.*)/ and $r{$1}+=$2;END{print "$_, $r{$_}" for keys %r}' <<END
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
END
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
END
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
my @cols = split /\|/;
my $key = join '|', @cols[3,6];
$totals_hash{$key} += $cols[9];
Does not match the format you reported:
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
Also, you probably want to pop $afterfile off of @ARGV so that it will not be read as part of <>
And Fcntl ':flock'; is unused, did you intend to lock something?
my $key = join '|', @cols[3,6];
$totals_hash{$key} += $cols[9];
Does not match the format you reported:
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
Also, you probably want to pop $afterfile off of @ARGV so that it will not be read as part of <>
And Fcntl ':flock'; is unused, did you intend to lock something?
ASKER
Solved the challenge myself.