• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 516
  • Last Modified:

How to check for duplicate data records using Perl?

How can I check if data records are duplicated using Perl?

For example, I have the following format of duplicated data records and need to check this.

__Data__
@scsi_test
time: 2:30 p.m.
1. loop 10 - used data path 1
2. loop 10 - used data path 2
3. loop 10 - used data path 3

__Data__
@scsi_test
time: 2:30 p.m.
1. loop 10 - used data path 1
2. loop 10 - used data path 2
3. loop 10 - used data path 3

0
areyouready344
Asked:
areyouready344
  • 2
1 Solution
 
sjklein42Commented:
(1) What output do you want - the input echoed, but with the duplicate records removed?

(2) I think this is right, but are we looking for entire "sets" of data that match based on the timestamp and the detail lines following it?

Please provide a more complete input test data so we can be sure it works the way you want.
0
 
areyouready344Author Commented:
Looking for  a complete duplicate of all lines within a record (including timestamp and lines). Output should say no duplicate or duplicate records and display which ones are duplicated..

Thanks,
0
 
wilcoxonCommented:
This should work...
#!/usr/local/bin/perl

use strict;
use warnings;

$/ = '__Data__';
my (%seen, %dupe);
while (<>) {
    chomp;
    s{\n+$}{};
    $dupe{$_}++ if $seen{$_};
    $seen{$_}++;
}

if (%dupe) {
    print "duplicate records:\n";
    foreach my $set (sort keys %dupe) {
        print "$set\n";
    }
} else {
    print "no duplicates\n";
}

Open in new window

0
 
areyouready344Author Commented:
thanks again Wilcoxo...
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now