Solved

found same lines in perl

Posted on 2006-06-09
12
238 Views
Last Modified: 2008-03-06
Hi experts,

 I have an input file:

 1,2,1
 2,1,2
 2,1,2
 3,3,1
 3,3,1
 3,3,1

how do I write a perl program to remove the duplicate one ? and then add the count in the fourth column. So the output would look like:

1,2,1,1
2,1,2,2
3,3,1,3

 many thanks.
0
Comment
Question by:meow00
  • 6
  • 5
12 Comments
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16872639
my %results = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   $results{$_}++;
}
close(FH);
foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

you might want that sorted somehow, but you didn't say or what the sort criteria is if any.
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16872646
sorry small error here:

foreach my $key (keys %results) {
   print "$keys,$results{$key}\n";
}

$keys should be $key:

foreach my $key (keys %results) {
   print "$key,$results{$key}\n";
}
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16873294
Perl_Diver's solution looks pretty good except for two possible snags.  First, do you care about the order of the lines in the output?  There is no guarantee that the lines will be in the same order on output as they are on input.

My test output this:
3,3,1,3
2,1,2,2
1,2,1,1

Second, what if duplicate lines appear out of order in the file?

Like this:

1,2,1
2,1,2
2,1,2
1,2,1
3,3,1
3,3,1
3,3,1

Would the two 1,2,1 lines be considered duplicates or not?

If neither of these are problems then Perl_Diver has given you what you need.
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16873936
good points GnarOlak
0
 
LVL 1

Author Comment

by:meow00
ID: 16874026
mmm ... actually they are not in orders ... anyway to fix it ?
0
 
LVL 8

Accepted Solution

by:
Perl_Diver earned 300 total points
ID: 16874349
if order of original file is desired:

my %results = ();
my @order = ();
open(FH,'input.txt') or die "$!";
while(<FH>) {
   chomp;
   push @order,$_ unless exists $results{$_};
   $results{$_}++;
}
close(FH);
foreach my $key (@order) {
   print "$key,$results{$key}\n";
}

that can also be done with an array of hashes or an array of arrays too but the above is very simple to understand. Use the @order array just to maintain the order of the lines in the file and the hash %results to get the correct result for each line.

I take it this does not matter: Second, what if duplicate lines appear out of order in the file?
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 6

Assisted Solution

by:GnarOlak
GnarOlak earned 50 total points
ID: 16885225
If you need to keep separate runs of the same line distinct then something like this would do the trick:

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
$count = 1;
while (<FH>)
{
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
}

0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885230
One missing line from the end of that last post:


print "$prev_line,$count\n";


That needs to go after the last closing } to write out the last values.
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885269
Damn Mondays.  That last post wasn't right.  This does what I want.

my $prev_line;
my $count = 0;

open(FH,'input.txt') or die "$!";
$prev_line = <FH>;
chomp $prev_line;
$count = 1;

while (<FH>)
{
    chomp;
    if ($_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16885352
And just in case you would rather do everything in the loop here is another version.

open(FH,'input.txt') or die "$!";
my $count = 0;
while (<FH>)
{
    chomp;
    if ($prev_line && $_ ne $prev_line)
    {
        print "$prev_line,$count\n";
        $prev_line = $_;
        $count = 1;
    }
    else
    {
        $prev_line = $_ if (! $prev_line);
        $count++;
    }
}

print "$prev_line,$count\n";

close FH;
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 16890722
thanks for the grade and the points.
0
 
LVL 6

Expert Comment

by:GnarOlak
ID: 16893793
Thanks from me also.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now