Solved

Compress Rows of Similar Data into One File Using Perl

Posted on 2015-02-04
5
85 Views
Last Modified: 2015-02-09
We have a data file with the following rows of data:

Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2

We need to have some way to process the file so that it combines similar rows of data into one line. So the end result would look like

A, B, C, 3
A1, B, C, 2

Because it adds the 1st and 3rd rows of data together.

We have a file with over two million rows of data but many of them could be combined into one row and are trying to come up with an automated way using Perl to reduce the size of the file before we load it into a database.

Any help would be appreciated or even a suggestion on approach to doing this.
0
Comment
Question by:dlnewman70
  • 2
  • 2
5 Comments
 
LVL 20

Expert Comment

by:jmcg
ID: 40588739
If you sort the file first, do you get a result that causes all of the "similar" rows to be together? Is a sorted result acceptable?
0
 
LVL 84

Expert Comment

by:ozo
ID: 40588758
perl -lne '1..1 and print and next; /(.*),(.*)/ and $r{$1}+=$2;END{print "$_, $r{$_}" for keys %r}' <<END
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
END
0
 

Accepted Solution

by:
dlnewman70 earned 0 total points
ID: 40588942
#use warnings;
#use strict;

use Fcntl ':flock'; # contains LOCK_EX (2) and LOCK_UN (8) constants

$afterfile = $ARGV[1];
 
open (OUTFILE,">>", $afterfile);

my %totals_hash;

while (<>)
{
  chomp;
  my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
}

foreach (sort keys %totals_hash)
{
  print OUTFILE $_, '|', $totals_hash{$_}, "\n";

}

close(OUTFILE);
0
 
LVL 84

Expert Comment

by:ozo
ID: 40588978
my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
Does not match the format you reported:
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2


Also, you probably want to pop $afterfile off of @ARGV so that it will not be read as part of <>

And Fcntl ':flock';  is unused, did you intend to lock something?
0
 

Author Closing Comment

by:dlnewman70
ID: 40597966
Solved the challenge myself.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Hi friends,  in this video  I'll show you how new windows 10 user can learn the using of windows 10. Thank you.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now