asked on

Perl - check duplicates in file and case

I'm using this I grabbed from perlmonks that's does what I need, which checks a file and prints out any duplicate lines.

my %duplicates;

while (<>) {
    chomp;
    $duplicates{$_}++;
}

foreach my $key (keys %duplicates) {
    if ($duplicates{$key} > 1) {
        delete $duplicates{$key};
        print "$key\n";
    }
}

Open in new window

Just one issue, I need to match lines in the file that is the same but may have different case, I can do a lower case on the file then run it but I need to keep the case.

How can I do a lower case to do the checks with but still keep the same case for my output?

Thanks

farzanj

First you need to make a map.

while (<>) {
chomp;
$duplicates{lc($_)} = $_;
}
Then if it already exists in the map, you should print it otherwise not

bt707

ASKER

I had already tried using a lc but I just get errors from that, putting it in like that I just get an error of:

Useless use of lc in void context at ./dup_lines.pl line 10.

#! /usr/bin/perl

use strict;
use warnings;

my %duplicates;

while (<>) {
chomp;
lc;
$duplicates{$_}++;
}

foreach my $key (keys %duplicates) {
if ($duplicates{$key} > 1) {
delete $duplicates{$key};
print "$key\n";
}
}

What am I missing?

ASKER CERTIFIED SOLUTION

wilcoxon

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

wilcoxon

Bare lc will not work. You need to add:

$_ = lc; # which should act the same as $_ = lc $_

This method will also not preserve case (one of the things you asked for).

bt707

ASKER

Thanks that worked fine, not sure why I did not get the other to work, maybe me but this one worked fine.

Thanks,

ozo

hile (<>) {
chomp;
push @{$duplicates{lc}},$_;
}

foreach my $key (keys %duplicates) {
if (@{$duplicates{$key}} > 1) {
delete $duplicates{$key}->[0];
print "$duplicates{$key}->[0]\n";
}
}

wilcoxon

Interesting. I would have sworn "push @{$duplicates{$key}}" failed on an undefined $duplicates{$key} but I just re-tested and it works fine.

As such, you can omit the "... = [] unless $duplicates{$key}" line (and as ozo said, you can then combine the "$key = lc $_" and push lines).

As usual, ozo has provided a good concise answer (though I would not do his delete/print part like he did but that's just preference).

ozo

Sorry, I didn't see http:#a38817078 when I posted.
I was trying to duplicate the behaviour of the routine in the original question,
which seemed to be deleting only one of each duplicate name, so I just deleted the first.
It could easily be changed to be the last, or all but the first/last, or all.

If the intent is to re-write the file with duplicates eliminated, that might be done with

$^I=".bak";
$duplicates{+lc}++ or print while <>;

(and I see I omitted the + in my previous post, not to mention the w in while)

bt707

ASKER

Thanks for all the info, I had got what I what I need from the one I accepted by changing a a few things just so I got the output I now needed to see, but just learned several things from the comments which is very much appreciated.

Thanks to all.