Solved

found same lines in perl

Posted on 2006-06-09
12
275 Views
Last Modified: 2008-03-06
Hi experts,

 I have an input file:

 1,2,1
 2,1,2
 2,1,2
 3,3,1
 3,3,1
 3,3,1

how do I write a perl program to remove the duplicate one ? and then add the count in the fourth column. So the output would look like:

1,2,1,1
2,1,2,2
3,3,1,3

 many thanks.
0
Comment
Question by:meow00
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
12 Comments
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16872639
my %results = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   $results{$_}++;
}
close(FH);
foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

you might want that sorted somehow, but you didn't say or what the sort criteria is if any.
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16872646
sorry small error here:

foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

$keys should be $key:

foreach my $key (keys %results) {
   print "$key,$results{$key}\n";
}
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16873294
Perl_Diver's solution looks pretty good except for two possible snags.  First, do you care about the order of the lines in the output?  There is no guarantee that the lines will be in the same order on output as they are on input.

My test output this:
3,3,1,3
2,1,2,2
1,2,1,1

Second, what if duplicate lines appear out of order in the file?

Like this:

1,2,1
2,1,2
2,1,2
1,2,1
3,3,1
3,3,1
3,3,1

Would the two 1,2,1 lines be considered duplicates or not?

If neither of these are problems then Perl_Diver has given you what you need.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16873936
good points GnarOlak
0
 
LVL 1

Author Comment

by:meow00
ID: 16874026
mmm ... actually they are not in orders ... anyway to fix it ?
0
 
LVL 8

Accepted Solution

by:
Perl_Diver earned 300 total points
ID: 16874349
if order of original file is desired:

my %results = ();
my @order = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   push @order,$_ unless exists $results{$_};
   $results{$_}++;
}
close(FH);
foreach my $key (@order) {
   print "$key,$results{$key}\n";
}

that can also be done with an array of hashes or an array of arrays too but the above is very simple to understand. Use the @order array just to maintain the order of the lines in the file and the hash %results to get the correct result for each line.

I take it this does not matter: Second, what if duplicate lines appear out of order in the file?
0
 
LVL 6

Assisted Solution

by:GnarOlak
GnarOlak earned 50 total points
ID: 16885225
If you need to keep separate runs of the same line distinct then something like this would do the trick:

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
$count = 1;
while (<FH>)
{
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
}

0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885230
One missing line from the end of that last post:


print "$prev_line,$count\n";


That needs to go after the last closing } to write out the last values.
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885269
Damn Mondays.  That last post wasn't right.  This does what I want.

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
chomp $prev_line;
$count = 1;

while (<FH>)
{
    chomp;
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885352
And just in case you would rather do everything in the loop here is another version.

open(FH,'input.txt') or die "$!";
my $count = 0;
while (<FH>)
{
    chomp;
    if ($prev_line && $_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $prev_line = $_ if (! $prev_line);
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16890722
thanks for the grade and the points.
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16893793
Thanks from me also.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question