omcr
asked on
Explain this line of code: push @lines, $_ unless $h{$_}++;
In a previous question I needed to remove duplicate lines from a file. Ren_b provided a solution that has been working fine. I would like a detailed explanation of the line that got everything working. Here is the code:
open GEN, "gen.txt";
my @lines;
my %h;
while(<GEN>){
push @lines, $_ unless $h{$_}++;
}
close GEN;
open GEN, ">gen.txt";
print GEN @lines;
close GEN;
My best guess- read each line of GEN, and add it to the array @lines if it does not match $h. I then print the array to GEN. I don't understand what is happening after the unless, is a hash being created at the same time the array is and if the hash sees the same key again does the autoincrement cause it to skip it.
open GEN, "gen.txt";
my @lines;
my %h;
while(<GEN>){
push @lines, $_ unless $h{$_}++;
}
close GEN;
open GEN, ">gen.txt";
print GEN @lines;
close GEN;
My best guess- read each line of GEN, and add it to the array @lines if it does not match $h. I then print the array to GEN. I don't understand what is happening after the unless, is a hash being created at the same time the array is and if the hash sees the same key again does the autoincrement cause it to skip it.
you could also do
perl -i -ne 'print if !$h{$_}++' gen.txt
perl -i -ne 'print if !$h{$_}++' gen.txt
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Might also be useful to mention the FAQ here:
$ perldoc -q duplicate
Found in /usr/perl5/5.6.1/lib/pod/p erlfaq4.po d
How can I remove duplicate elements from a list or array?
There are several possible ways, depending on whether the
array is ordered and whether you wish to preserve the
ordering.
a) If @in is sorted, and you want @out to be
sorted: (this assumes all true values in the
array)
$prev = "not equal to $in[0]";
@out = grep($_ ne $prev && ($prev = $_, 1), @in);
This is nice in that it doesn't use much extra
memory, simulating uniq(1)'s behavior of
removing only adjacent duplicates. The ", 1"
guarantees that the expression is true (so that
grep picks it up) even if the $_ is 0, "", or
undef.
b) If you don't know whether @in is sorted:
undef %saw;
@out = grep(!$saw{$_}++, @in);
c) Like (b), but @in contains only small integers:
@out = grep(!$saw[$_]++, @in);
d) A way to do (b) without any loops or greps:
undef %saw;
@saw{@in} = ();
@out = sort keys %saw; # remove sort if undesired
e) Like (d), but @in contains only small positive
integers:
undef @ary;
@ary[@in] = @in;
@out = grep {defined} @ary;
But perhaps you should have been using a hash all
along, eh?
$ perldoc -q duplicate
Found in /usr/perl5/5.6.1/lib/pod/p
How can I remove duplicate elements from a list or array?
There are several possible ways, depending on whether the
array is ordered and whether you wish to preserve the
ordering.
a) If @in is sorted, and you want @out to be
sorted: (this assumes all true values in the
array)
$prev = "not equal to $in[0]";
@out = grep($_ ne $prev && ($prev = $_, 1), @in);
This is nice in that it doesn't use much extra
memory, simulating uniq(1)'s behavior of
removing only adjacent duplicates. The ", 1"
guarantees that the expression is true (so that
grep picks it up) even if the $_ is 0, "", or
undef.
b) If you don't know whether @in is sorted:
undef %saw;
@out = grep(!$saw{$_}++, @in);
c) Like (b), but @in contains only small integers:
@out = grep(!$saw[$_]++, @in);
d) A way to do (b) without any loops or greps:
undef %saw;
@saw{@in} = ();
@out = sort keys %saw; # remove sort if undesired
e) Like (d), but @in contains only small positive
integers:
undef @ary;
@ary[@in] = @in;
@out = grep {defined} @ary;
But perhaps you should have been using a hash all
along, eh?
ASKER
So first run: Add $_ to hash and increment to zero. Add $_ to the array.
Then it comes in again (duplicate): Already exists in the hash, increment to 1, this makes the '$h{$_}' true, the unless see's that its true and causes it to return false which prevents it from doing the push.
Is this right ???
Then it comes in again (duplicate): Already exists in the hash, increment to 1, this makes the '$h{$_}' true, the unless see's that its true and causes it to return false which prevents it from doing the push.
Is this right ???
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
From godspropy's post
"Therefore the hash $h{$_} is created and incremented to 0 on the first occurance of $_. On the next occurance of the same $_ the value in the hash %h is incremented to 1."
From tintin's post
" So the first time though, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).
Are you both speaking of the same thing ?
"Therefore the hash $h{$_} is created and incremented to 0 on the first occurance of $_. On the next occurance of the same $_ the value in the hash %h is incremented to 1."
From tintin's post
" So the first time though, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).
Are you both speaking of the same thing ?
No.
The hash doesn't get incremented to 0, it gets incremented to 1.
The hash doesn't get incremented to 0, it gets incremented to 1.
Tintin was correct. When used in a condition an auto-incremented variable returns the current value and then increments (and undef=0. So, it returns 0 for the undefined variable on its first use. After its first use it is auto-incremented to 1. On its second use it returns the existing value of 1 and then increments...
ASKER
Ok
The first time through, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).
Then it comes in again (duplicate): Already exists in the hash, increment to N, this makes the '$h{$_}' true, the 'unless' see's that its true and causes it to return false which prevents it from doing the push.
How does that description look ?
The first time through, $_ gets added to the array and the hash with the key $_ gets incremented to 1 (TRUE).
Then it comes in again (duplicate): Already exists in the hash, increment to N, this makes the '$h{$_}' true, the 'unless' see's that its true and causes it to return false which prevents it from doing the push.
How does that description look ?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks everyone, I think I've got it now. Good discussion and thanks for the patience.
after checking whether it is true, increment its value in %h, so that it will be true tne next time you see it.