asked on

Explain this line of code: push @lines, $_ unless $h{$_}++;

In a previous question I needed to remove duplicate lines from a file. Ren_b provided a solution that has been working fine. I would like a detailed explanation of the line that got everything working. Here is the code:

open GEN, "gen.txt";
my @lines;
my %h;
while(<GEN>){
push @lines, $_ unless $h{$_}++;
}
close GEN;

open GEN, ">gen.txt";
print GEN @lines;
close GEN;

My best guess- read each line of GEN, and add it to the array @lines if it does not match $h. I then print the array to GEN. I don't understand what is happening after the unless, is a hash being created at the same time the array is and if the hash sees the same key again does the autoincrement cause it to skip it.

ozo

add it to the array @lines if it is not true in hash %h,
after checking whether it is true, increment its value in %h, so that it will be true tne next time you see it.

ozo

you could also do
perl -i -ne 'print if !$h{$_}++' gen.txt

SOLUTION

godspropy

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Tintin

Might also be useful to mention the FAQ here:

$ perldoc -q duplicate

Found in /usr/perl5/5.6.1/lib/pod/perlfaq4.pod
How can I remove duplicate elements from a list or array?

There are several possible ways, depending on whether the
array is ordered and whether you wish to preserve the
ordering.

a) If @in is sorted, and you want @out to be
sorted: (this assumes all true values in the
array)

$prev = "not equal to $in[0]";
@out = grep($_ ne $prev && ($prev = $_, 1), @in);

This is nice in that it doesn't use much extra
memory, simulating uniq(1)'s behavior of
removing only adjacent duplicates. The ", 1"
guarantees that the expression is true (so that
grep picks it up) even if the $_ is 0, "", or
undef.

b) If you don't know whether @in is sorted:

undef %saw;
@out = grep(!$saw{$_}++, @in);

c) Like (b), but @in contains only small integers:

@out = grep(!$saw[$_]++, @in);

d) A way to do (b) without any loops or greps:

undef %saw;
@saw{@in} = ();
@out = sort keys %saw; # remove sort if undesired

e) Like (d), but @in contains only small positive
integers:

undef @ary;
@ary[@in] = @in;
@out = grep {defined} @ary;

But perhaps you should have been using a hash all
along, eh?

omcr

ASKER

So first run: Add $_ to hash and increment to zero. Add $_ to the array.
Then it comes in again (duplicate): Already exists in the hash, increment to 1, this makes the '$h{$_}' true, the unless see's that its true and causes it to return false which prevents it from doing the push.

Is this right ???

SOLUTION

Tintin

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

omcr

ASKER

From godspropy's post
"Therefore the hash $h{$_} is created and incremented to 0 on the first occurance of $_. On the next occurance of the same $_ the value in the hash %h is incremented to 1."

From tintin's post
" So the first time though, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).

Are you both speaking of the same thing ?

Tintin

No.

The hash doesn't get incremented to 0, it gets incremented to 1.

godspropy

Tintin was correct. When used in a condition an auto-incremented variable returns the current value and then increments (and undef=0. So, it returns 0 for the undefined variable on its first use. After its first use it is auto-incremented to 1. On its second use it returns the existing value of 1 and then increments...

omcr

ASKER

Ok
The first time through, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).
Then it comes in again (duplicate): Already exists in the hash, increment to N, this makes the '$h{$_}' true, the 'unless' see's that its true and causes it to return false which prevents it from doing the push.

How does that description look ?

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

omcr

ASKER

Thanks everyone, I think I've got it now. Good discussion and thanks for the patience.