Link to home
Start Free TrialLog in
Avatar of omcr
omcr

asked on

Explain this line of code: push @lines, $_ unless $h{$_}++;

In a previous question I needed to remove duplicate lines from a file. Ren_b provided a solution that has been working fine. I would like a detailed explanation of the line that got everything working. Here is the code:

open GEN, "gen.txt";
my @lines;
my %h;
while(<GEN>){
  push @lines, $_ unless $h{$_}++;
}
close GEN;

open GEN, ">gen.txt";
print GEN @lines;
close GEN;

My best guess- read each line of GEN, and add it to the array @lines if it does not match $h. I then print the array to GEN. I don't understand what is happening after the unless, is a hash being created at the same time the array is and if the hash sees the same key again does the autoincrement cause it to skip it.
Avatar of ozo
ozo
Flag of United States of America image

add it to the array @lines if it is not true in hash %h,
after checking whether it is true, increment its value in %h, so that it will be true tne next time you see it.
you could also do
perl -i -ne 'print if !$h{$_}++' gen.txt
SOLUTION
Avatar of godspropy
godspropy

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tintin
Tintin

Might also be useful to mention the FAQ here:

$ perldoc -q duplicate

Found in /usr/perl5/5.6.1/lib/pod/perlfaq4.pod
     How can I remove duplicate elements from a list or array?

     There are several possible ways, depending on whether the
     array is ordered and whether you wish to preserve the
     ordering.

             a)  If @in is sorted, and you want @out to be
                 sorted:  (this assumes all true values in the
                 array)

                     $prev = "not equal to $in[0]";
                     @out = grep($_ ne $prev && ($prev = $_, 1), @in);

                 This is nice in that it doesn't use much extra
                 memory, simulating uniq(1)'s behavior of
                 removing only adjacent duplicates.  The ", 1"
                 guarantees that the expression is true (so that
                 grep picks it up) even if the $_ is 0, "", or
                 undef.

             b)  If you don't know whether @in is sorted:

                     undef %saw;
                     @out = grep(!$saw{$_}++, @in);

             c)  Like (b), but @in contains only small integers:

                     @out = grep(!$saw[$_]++, @in);

             d)  A way to do (b) without any loops or greps:

                     undef %saw;
                     @saw{@in} = ();
                     @out = sort keys %saw;  # remove sort if undesired

             e)  Like (d), but @in contains only small positive
                 integers:

                     undef @ary;
                     @ary[@in] = @in;
                     @out = grep {defined} @ary;

             But perhaps you should have been using a hash all
             along, eh?
Avatar of omcr

ASKER

So first run:  Add $_ to hash and increment to zero. Add $_ to the array.
Then it comes in again (duplicate): Already exists in the hash, increment to 1, this makes the '$h{$_}' true, the unless see's that its true and causes it to return false which prevents it from doing the push.

Is this right ???      
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of omcr

ASKER

From godspropy's post
"Therefore the hash $h{$_} is created and incremented to 0 on the first occurance of $_. On the next occurance of the same $_ the value in the hash %h is incremented to 1."

From tintin's post
" So the first time though, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).

Are you both speaking of the same thing ?

No.

The hash doesn't get incremented to 0, it gets incremented to 1.
Tintin was correct. When used in a condition an auto-incremented variable returns the current value and then increments (and undef=0. So, it returns 0 for the undefined variable on its first use. After its first use it is auto-incremented to 1. On its second use it returns the existing value of 1 and then increments...
Avatar of omcr

ASKER

Ok
The first time through, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).
Then it comes in again (duplicate): Already exists in the hash, increment to N, this makes the '$h{$_}' true, the 'unless' see's that its true and causes it to return false which prevents it from doing the push.

How does that description look ?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of omcr

ASKER

Thanks everyone, I think I've got it now. Good discussion and thanks for the patience.