Solved

Compress Rows of Similar Data into One File Using Perl

Posted on 2015-02-04
5
90 Views
Last Modified: 2015-02-09
We have a data file with the following rows of data:

Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2

We need to have some way to process the file so that it combines similar rows of data into one line. So the end result would look like

A, B, C, 3
A1, B, C, 2

Because it adds the 1st and 3rd rows of data together.

We have a file with over two million rows of data but many of them could be combined into one row and are trying to come up with an automated way using Perl to reduce the size of the file before we load it into a database.

Any help would be appreciated or even a suggestion on approach to doing this.
0
Comment
Question by:dlnewman70
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 20

Expert Comment

by:jmcg
ID: 40588739
If you sort the file first, do you get a result that causes all of the "similar" rows to be together? Is a sorted result acceptable?
0
 
LVL 84

Expert Comment

by:ozo
ID: 40588758
perl -lne '1..1 and print and next; /(.*),(.*)/ and $r{$1}+=$2;END{print "$_, $r{$_}" for keys %r}' <<END
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
END
0
 

Accepted Solution

by:
dlnewman70 earned 0 total points
ID: 40588942
#use warnings;
#use strict;

use Fcntl ':flock'; # contains LOCK_EX (2) and LOCK_UN (8) constants

$afterfile = $ARGV[1];
 
open (OUTFILE,">>", $afterfile);

my %totals_hash;

while (<>)
{
  chomp;
  my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
}

foreach (sort keys %totals_hash)
{
  print OUTFILE $_, '|', $totals_hash{$_}, "\n";

}

close(OUTFILE);
0
 
LVL 84

Expert Comment

by:ozo
ID: 40588978
my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
Does not match the format you reported:
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2


Also, you probably want to pop $afterfile off of @ARGV so that it will not be read as part of <>

And Fcntl ':flock';  is unused, did you intend to lock something?
0
 

Author Closing Comment

by:dlnewman70
ID: 40597966
Solved the challenge myself.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question