• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 541
  • Last Modified:

Perl - check duplicates in file and case

I'm using this I grabbed from perlmonks that's does what I need, which checks a file and prints out any duplicate lines.

my %duplicates;

while (<>) {
    chomp;
    $duplicates{$_}++;
}

foreach my $key (keys %duplicates) {
    if ($duplicates{$key} > 1) {
        delete $duplicates{$key};
        print "$key\n";
    }
}

Open in new window

Just one issue, I need to match lines in the file that is the same but may have different case, I can do a lower case on the file then run it but I need to keep the case.

How can I do a lower case to do the checks with but still keep the same case for my output?

Thanks
0
bt707
Asked:
bt707
  • 3
  • 3
  • 2
  • +1
1 Solution
 
farzanjCommented:
First you need to make a map.

while (<>) {
    chomp;
    $duplicates{lc($_)} = $_;
}
Then if it already exists in the map, you should print it otherwise not
0
 
bt707Author Commented:
I had already tried using a lc but I just get errors from that, putting it in like that I just get an error of:

Useless use of lc in void context at ./dup_lines.pl line 10.

#! /usr/bin/perl

use strict;
use warnings;

my %duplicates;

while (<>) {
    chomp;
    lc;
    $duplicates{$_}++;
}

foreach my $key (keys %duplicates) {
    if ($duplicates{$key} > 1) {
        delete $duplicates{$key};
        print "$key\n";
    }
}

What am I missing?
0
 
wilcoxonCommented:
I would do something like this...

This will print out the lowercased key followed by the the lines that matched it (indented by tabs).

my %duplicates;

while (<>) {
    chomp;
    my $key = lc $_;
    $duplicates{$key} = [] unless $duplicates{$key};
    push @{$duplicates{$key}}, $_;
}

foreach my $key (keys %duplicates) {
    if (@{$duplicates{$key}} > 1) {
        print "$key:\n\t", join("\n\t", @{$duplicates{$key}}), "\n";
        delete $duplicates{$key};
    }
}

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
wilcoxonCommented:
Bare lc will not work.  You need to add:

$_ = lc;  # which should act the same as $_ = lc $_

This method will also not preserve case (one of the things you asked for).
0
 
bt707Author Commented:
Thanks that worked fine, not sure why I did not get the other to work, maybe me but this one worked fine.

Thanks,
0
 
ozoCommented:
hile (<>) {
    chomp;
    push @{$duplicates{lc}},$_;
}

foreach my $key (keys %duplicates) {
    if (@{$duplicates{$key}} > 1) {
        delete $duplicates{$key}->[0];
        print "$duplicates{$key}->[0]\n";
    }
}
0
 
wilcoxonCommented:
Interesting.  I would have sworn "push @{$duplicates{$key}}" failed on an undefined $duplicates{$key} but I just re-tested and it works fine.

As such, you can omit the "... = [] unless $duplicates{$key}" line (and as ozo said, you can then combine the "$key = lc $_" and push lines).

As usual, ozo has provided a good concise answer (though I would not do his delete/print part like he did but that's just preference).
0
 
ozoCommented:
Sorry, I didn't see http:#a38817078 when I posted.
I was trying to duplicate the behaviour of the routine in the original question,
which seemed to be deleting only one of each duplicate name, so I just deleted the first.
It could easily be changed to be the last, or all but the first/last, or all.

If the intent is to re-write the file with duplicates eliminated, that might be done with

$^I=".bak";
$duplicates{+lc}++ or print while <>;

(and I see I omitted the  + in my previous post, not to mention the w in while)
0
 
bt707Author Commented:
Thanks for all the info, I had got what I what I need from the one I accepted by changing a a few things just so I got the output I now needed to see, but just learned several things from the comments which is very much appreciated.

Thanks to all.
0

Featured Post

[Webinar On Demand] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

  • 3
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now