Solved

Compress Rows of Similar Data into One File Using Perl

Posted on 2015-02-04
5
88 Views
Last Modified: 2015-02-09
We have a data file with the following rows of data:

Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2

We need to have some way to process the file so that it combines similar rows of data into one line. So the end result would look like

A, B, C, 3
A1, B, C, 2

Because it adds the 1st and 3rd rows of data together.

We have a file with over two million rows of data but many of them could be combined into one row and are trying to come up with an automated way using Perl to reduce the size of the file before we load it into a database.

Any help would be appreciated or even a suggestion on approach to doing this.
0
Comment
Question by:dlnewman70
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 20

Expert Comment

by:jmcg
ID: 40588739
If you sort the file first, do you get a result that causes all of the "similar" rows to be together? Is a sorted result acceptable?
0
 
LVL 84

Expert Comment

by:ozo
ID: 40588758
perl -lne '1..1 and print and next; /(.*),(.*)/ and $r{$1}+=$2;END{print "$_, $r{$_}" for keys %r}' <<END
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2
END
0
 

Accepted Solution

by:
dlnewman70 earned 0 total points
ID: 40588942
#use warnings;
#use strict;

use Fcntl ':flock'; # contains LOCK_EX (2) and LOCK_UN (8) constants

$afterfile = $ARGV[1];
 
open (OUTFILE,">>", $afterfile);

my %totals_hash;

while (<>)
{
  chomp;
  my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
}

foreach (sort keys %totals_hash)
{
  print OUTFILE $_, '|', $totals_hash{$_}, "\n";

}

close(OUTFILE);
0
 
LVL 84

Expert Comment

by:ozo
ID: 40588978
my @cols = split /\|/;

  my $key = join '|', @cols[3,6];

  $totals_hash{$key} += $cols[9];
Does not match the format you reported:
Field1, Field2, Field3, Data
A, B, C, 1
A1, B, C, 2
A, B, C, 2


Also, you probably want to pop $afterfile off of @ARGV so that it will not be read as part of <>

And Fcntl ':flock';  is unused, did you intend to lock something?
0
 

Author Closing Comment

by:dlnewman70
ID: 40597966
Solved the challenge myself.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Strange perl issue 6 132
perl to convert excel to csv 3 308
Get a Perl script to return content from a module 7 83
html form to write data to csv 24 168
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question