bt707
asked on
Perl - check duplicates in file and case
I'm using this I grabbed from perlmonks that's does what I need, which checks a file and prints out any duplicate lines.
How can I do a lower case to do the checks with but still keep the same case for my output?
Thanks
my %duplicates;
while (<>) {
chomp;
$duplicates{$_}++;
}
foreach my $key (keys %duplicates) {
if ($duplicates{$key} > 1) {
delete $duplicates{$key};
print "$key\n";
}
}
Just one issue, I need to match lines in the file that is the same but may have different case, I can do a lower case on the file then run it but I need to keep the case.How can I do a lower case to do the checks with but still keep the same case for my output?
Thanks
ASKER
I had already tried using a lc but I just get errors from that, putting it in like that I just get an error of:
Useless use of lc in void context at ./dup_lines.pl line 10.
#! /usr/bin/perl
use strict;
use warnings;
my %duplicates;
while (<>) {
chomp;
lc;
$duplicates{$_}++;
}
foreach my $key (keys %duplicates) {
if ($duplicates{$key} > 1) {
delete $duplicates{$key};
print "$key\n";
}
}
What am I missing?
Useless use of lc in void context at ./dup_lines.pl line 10.
#! /usr/bin/perl
use strict;
use warnings;
my %duplicates;
while (<>) {
chomp;
lc;
$duplicates{$_}++;
}
foreach my $key (keys %duplicates) {
if ($duplicates{$key} > 1) {
delete $duplicates{$key};
print "$key\n";
}
}
What am I missing?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Bare lc will not work. You need to add:
$_ = lc; # which should act the same as $_ = lc $_
This method will also not preserve case (one of the things you asked for).
$_ = lc; # which should act the same as $_ = lc $_
This method will also not preserve case (one of the things you asked for).
ASKER
Thanks that worked fine, not sure why I did not get the other to work, maybe me but this one worked fine.
Thanks,
Thanks,
hile (<>) {
chomp;
push @{$duplicates{lc}},$_;
}
foreach my $key (keys %duplicates) {
if (@{$duplicates{$key}} > 1) {
delete $duplicates{$key}->[0];
print "$duplicates{$key}->[0]\n" ;
}
}
chomp;
push @{$duplicates{lc}},$_;
}
foreach my $key (keys %duplicates) {
if (@{$duplicates{$key}} > 1) {
delete $duplicates{$key}->[0];
print "$duplicates{$key}->[0]\n"
}
}
Interesting. I would have sworn "push @{$duplicates{$key}}" failed on an undefined $duplicates{$key} but I just re-tested and it works fine.
As such, you can omit the "... = [] unless $duplicates{$key}" line (and as ozo said, you can then combine the "$key = lc $_" and push lines).
As usual, ozo has provided a good concise answer (though I would not do his delete/print part like he did but that's just preference).
As such, you can omit the "... = [] unless $duplicates{$key}" line (and as ozo said, you can then combine the "$key = lc $_" and push lines).
As usual, ozo has provided a good concise answer (though I would not do his delete/print part like he did but that's just preference).
Sorry, I didn't see http:#a38817078 when I posted.
I was trying to duplicate the behaviour of the routine in the original question,
which seemed to be deleting only one of each duplicate name, so I just deleted the first.
It could easily be changed to be the last, or all but the first/last, or all.
If the intent is to re-write the file with duplicates eliminated, that might be done with
$^I=".bak";
$duplicates{+lc}++ or print while <>;
(and I see I omitted the + in my previous post, not to mention the w in while)
I was trying to duplicate the behaviour of the routine in the original question,
which seemed to be deleting only one of each duplicate name, so I just deleted the first.
It could easily be changed to be the last, or all but the first/last, or all.
If the intent is to re-write the file with duplicates eliminated, that might be done with
$^I=".bak";
$duplicates{+lc}++ or print while <>;
(and I see I omitted the + in my previous post, not to mention the w in while)
ASKER
Thanks for all the info, I had got what I what I need from the one I accepted by changing a a few things just so I got the output I now needed to see, but just learned several things from the comments which is very much appreciated.
Thanks to all.
Thanks to all.
while (<>) {
chomp;
$duplicates{lc($_)} = $_;
}
Then if it already exists in the map, you should print it otherwise not