Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

found same lines in perl

Posted on 2006-06-09
12
Medium Priority
?
281 Views
Last Modified: 2008-03-06
Hi experts,

 I have an input file:

 1,2,1
 2,1,2
 2,1,2
 3,3,1
 3,3,1
 3,3,1

how do I write a perl program to remove the duplicate one ? and then add the count in the fourth column. So the output would look like:

1,2,1,1
2,1,2,2
3,3,1,3

 many thanks.
0
Comment
Question by:meow00
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
12 Comments
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16872639
my %results = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   $results{$_}++;
}
close(FH);
foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

you might want that sorted somehow, but you didn't say or what the sort criteria is if any.
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16872646
sorry small error here:

foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

$keys should be $key:

foreach my $key (keys %results) {
   print "$key,$results{$key}\n";
}
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16873294
Perl_Diver's solution looks pretty good except for two possible snags.  First, do you care about the order of the lines in the output?  There is no guarantee that the lines will be in the same order on output as they are on input.

My test output this:
3,3,1,3
2,1,2,2
1,2,1,1

Second, what if duplicate lines appear out of order in the file?

Like this:

1,2,1
2,1,2
2,1,2
1,2,1
3,3,1
3,3,1
3,3,1

Would the two 1,2,1 lines be considered duplicates or not?

If neither of these are problems then Perl_Diver has given you what you need.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16873936
good points GnarOlak
0
 
LVL 1

Author Comment

by:meow00
ID: 16874026
mmm ... actually they are not in orders ... anyway to fix it ?
0
 
LVL 8

Accepted Solution

by:
Perl_Diver earned 1200 total points
ID: 16874349
if order of original file is desired:

my %results = ();
my @order = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   push @order,$_ unless exists $results{$_};
   $results{$_}++;
}
close(FH);
foreach my $key (@order) {
   print "$key,$results{$key}\n";
}

that can also be done with an array of hashes or an array of arrays too but the above is very simple to understand. Use the @order array just to maintain the order of the lines in the file and the hash %results to get the correct result for each line.

I take it this does not matter: Second, what if duplicate lines appear out of order in the file?
0
 
LVL 6

Assisted Solution

by:GnarOlak
GnarOlak earned 200 total points
ID: 16885225
If you need to keep separate runs of the same line distinct then something like this would do the trick:

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
$count = 1;
while (<FH>)
{
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
}

0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885230
One missing line from the end of that last post:


print "$prev_line,$count\n";


That needs to go after the last closing } to write out the last values.
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885269
Damn Mondays.  That last post wasn't right.  This does what I want.

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
chomp $prev_line;
$count = 1;

while (<FH>)
{
    chomp;
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885352
And just in case you would rather do everything in the loop here is another version.

open(FH,'input.txt') or die "$!";
my $count = 0;
while (<FH>)
{
    chomp;
    if ($prev_line && $_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $prev_line = $_ if (! $prev_line);
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16890722
thanks for the grade and the points.
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16893793
Thanks from me also.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question