Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 173
  • Last Modified:

Explain this line of code: push @lines, $_ unless $h{$_}++;

In a previous question I needed to remove duplicate lines from a file. Ren_b provided a solution that has been working fine. I would like a detailed explanation of the line that got everything working. Here is the code:

open GEN, "gen.txt";
my @lines;
my %h;
while(<GEN>){
  push @lines, $_ unless $h{$_}++;
}
close GEN;

open GEN, ">gen.txt";
print GEN @lines;
close GEN;

My best guess- read each line of GEN, and add it to the array @lines if it does not match $h. I then print the array to GEN. I don't understand what is happening after the unless, is a hash being created at the same time the array is and if the hash sees the same key again does the autoincrement cause it to skip it.
0
omcr
Asked:
omcr
  • 4
  • 3
  • 3
  • +1
3 Solutions
 
ozoCommented:
add it to the array @lines if it is not true in hash %h,
after checking whether it is true, increment its value in %h, so that it will be true tne next time you see it.
0
 
ozoCommented:
you could also do
perl -i -ne 'print if !$h{$_}++' gen.txt
0
 
godspropyCommented:
push @lines, $_ unless $h{$_}++;

Perl actually increments the variable before the condition is processed. Therefore the hash $h{$_} is created and incremented to 0 on the first occurance of $_. On the next occurance of the same $_ the value in the hash %h is incremented to 1.  The unless keyword is basically identical to 'if (! $h{$_})', it returns true for false values. So, this line only pushes the value to the array @lines if it is the first occurance of $_.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
TintinCommented:
Might also be useful to mention the FAQ here:

$ perldoc -q duplicate

Found in /usr/perl5/5.6.1/lib/pod/perlfaq4.pod
     How can I remove duplicate elements from a list or array?

     There are several possible ways, depending on whether the
     array is ordered and whether you wish to preserve the
     ordering.

             a)  If @in is sorted, and you want @out to be
                 sorted:  (this assumes all true values in the
                 array)

                     $prev = "not equal to $in[0]";
                     @out = grep($_ ne $prev && ($prev = $_, 1), @in);

                 This is nice in that it doesn't use much extra
                 memory, simulating uniq(1)'s behavior of
                 removing only adjacent duplicates.  The ", 1"
                 guarantees that the expression is true (so that
                 grep picks it up) even if the $_ is 0, "", or
                 undef.

             b)  If you don't know whether @in is sorted:

                     undef %saw;
                     @out = grep(!$saw{$_}++, @in);

             c)  Like (b), but @in contains only small integers:

                     @out = grep(!$saw[$_]++, @in);

             d)  A way to do (b) without any loops or greps:

                     undef %saw;
                     @saw{@in} = ();
                     @out = sort keys %saw;  # remove sort if undesired

             e)  Like (d), but @in contains only small positive
                 integers:

                     undef @ary;
                     @ary[@in] = @in;
                     @out = grep {defined} @ary;

             But perhaps you should have been using a hash all
             along, eh?
0
 
omcrAuthor Commented:
So first run:  Add $_ to hash and increment to zero. Add $_ to the array.
Then it comes in again (duplicate): Already exists in the hash, increment to 1, this makes the '$h{$_}' true, the unless see's that its true and causes it to return false which prevents it from doing the push.

Is this right ???      
0
 
TintinCommented:
Not quite right.

The increment of the hash and population of the array happen at the same time.

So the first time though, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).  The next time around with a duplicate value, the hash is already set to true.
0
 
omcrAuthor Commented:
From godspropy's post
"Therefore the hash $h{$_} is created and incremented to 0 on the first occurance of $_. On the next occurance of the same $_ the value in the hash %h is incremented to 1."

From tintin's post
" So the first time though, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).

Are you both speaking of the same thing ?

0
 
TintinCommented:
No.

The hash doesn't get incremented to 0, it gets incremented to 1.
0
 
godspropyCommented:
Tintin was correct. When used in a condition an auto-incremented variable returns the current value and then increments (and undef=0. So, it returns 0 for the undefined variable on its first use. After its first use it is auto-incremented to 1. On its second use it returns the existing value of 1 and then increments...
0
 
omcrAuthor Commented:
Ok
The first time through, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).
Then it comes in again (duplicate): Already exists in the hash, increment to N, this makes the '$h{$_}' true, the 'unless' see's that its true and causes it to return false which prevents it from doing the push.

How does that description look ?
0
 
ozoCommented:
The first time through, $_ gets added to the array and the hash with the key $_
has value NULL (false), after the false value is tested, it gets incremented to 1 (TRUE).
the ! sees the false value from before the increment, and causes the push
Then it comes in again (duplicate): Already exists in the hash, with the (TRUE) value that was set the first time through. after the true value has been tested, it gets incremented to 2 (also true)
the ! sees the true value and prevents it from doing the push.
0
 
omcrAuthor Commented:
Thanks everyone, I think I've got it now. Good discussion and thanks for the patience.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 3
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now