Solved

Compress Rows of Similar Data into One File Using Perl

Posted on 2015-02-04
5
81 Views
Last Modified: 2015-02-09
We have a data file with the following rows of data:

Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2

We need to have some way to process the file so that it combines similar rows of data into one line. So the end result would look like

A, B, C, 3
A1, B, C, 2

Because it adds the 1st and 3rd rows of data together.

We have a file with over two million rows of data but many of them could be combined into one row and are trying to come up with an automated way using Perl to reduce the size of the file before we load it into a database.

Any help would be appreciated or even a suggestion on approach to doing this.
0
Comment
Question by:dlnewman70
  • 2
  • 2
5 Comments
 
LVL 20

Expert Comment

by:jmcg
Comment Utility
If you sort the file first, do you get a result that causes all of the "similar" rows to be together? Is a sorted result acceptable?
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
perl -lne '1..1 and print and next; /(.*),(.*)/ and $r{$1}+=$2;END{print "$_, $r{$_}" for keys %r}' <<END
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
END
0
 

Accepted Solution

by:
dlnewman70 earned 0 total points
Comment Utility
#use warnings;
#use strict;

use Fcntl ':flock'; # contains LOCK_EX (2) and LOCK_UN (8) constants

$afterfile = $ARGV[1];
 
open (OUTFILE,">>", $afterfile);

my %totals_hash;

while (<>)
{
  chomp;
  my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
}

foreach (sort keys %totals_hash)
{
  print OUTFILE $_, '|', $totals_hash{$_}, "\n";

}

close(OUTFILE);
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
Does not match the format you reported:
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2


Also, you probably want to pop $afterfile off of @ARGV so that it will not be read as part of <>

And Fcntl ':flock';  is unused, did you intend to lock something?
0
 

Author Closing Comment

by:dlnewman70
Comment Utility
Solved the challenge myself.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
perl match and sort unique result 2 122
perl script help 12 101
SIMPLE Perl Regex 1 144
Move Function in Perl Script 2 58
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now