found same lines in perl

Hi experts,

 I have an input file:

 1,2,1
 2,1,2
 2,1,2
 3,3,1
 3,3,1
 3,3,1

how do I write a perl program to remove the duplicate one ? and then add the count in the fourth column. So the output would look like:

1,2,1,1
2,1,2,2
3,3,1,3

 many thanks.
LVL 1
meow00Asked:
Who is Participating?
 
Perl_DiverCommented:
if order of original file is desired:

my %results = ();
my @order = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   push @order,$_ unless exists $results{$_};
   $results{$_}++;
}
close(FH);
foreach my $key (@order) {
   print "$key,$results{$key}\n";
}

that can also be done with an array of hashes or an array of arrays too but the above is very simple to understand. Use the @order array just to maintain the order of the lines in the file and the hash %results to get the correct result for each line.

I take it this does not matter: Second, what if duplicate lines appear out of order in the file?
0
 
Perl_DiverCommented:
my %results = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   $results{$_}++;
}
close(FH);
foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

you might want that sorted somehow, but you didn't say or what the sort criteria is if any.
0
 
Perl_DiverCommented:
sorry small error here:

foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

$keys should be $key:

foreach my $key (keys %results) {
   print "$key,$results{$key}\n";
}
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
GnarOlakCommented:
Perl_Diver's solution looks pretty good except for two possible snags.  First, do you care about the order of the lines in the output?  There is no guarantee that the lines will be in the same order on output as they are on input.

My test output this:
3,3,1,3
2,1,2,2
1,2,1,1

Second, what if duplicate lines appear out of order in the file?

Like this:

1,2,1
2,1,2
2,1,2
1,2,1
3,3,1
3,3,1
3,3,1

Would the two 1,2,1 lines be considered duplicates or not?

If neither of these are problems then Perl_Diver has given you what you need.
0
 
Perl_DiverCommented:
good points GnarOlak
0
 
meow00Author Commented:
mmm ... actually they are not in orders ... anyway to fix it ?
0
 
GnarOlakCommented:
If you need to keep separate runs of the same line distinct then something like this would do the trick:

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
$count = 1;
while (<FH>)
{
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
}

0
 
GnarOlakCommented:
One missing line from the end of that last post:


print "$prev_line,$count\n";


That needs to go after the last closing } to write out the last values.
0
 
GnarOlakCommented:
Damn Mondays.  That last post wasn't right.  This does what I want.

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
chomp $prev_line;
$count = 1;

while (<FH>)
{
    chomp;
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
GnarOlakCommented:
And just in case you would rather do everything in the loop here is another version.

open(FH,'input.txt') or die "$!";
my $count = 0;
while (<FH>)
{
    chomp;
    if ($prev_line && $_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $prev_line = $_ if (! $prev_line);
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
Perl_DiverCommented:
thanks for the grade and the points.
0
 
GnarOlakCommented:
Thanks from me also.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.