How to check for duplicate data records using Perl?

How can I check if data records are duplicated using Perl?

For example, I have the following format of duplicated data records and need to check this.

__Data__
@scsi_test
time: 2:30 p.m.
1. loop 10 - used data path 1
2. loop 10 - used data path 2
3. loop 10 - used data path 3

__Data__
@scsi_test
time: 2:30 p.m.
1. loop 10 - used data path 1
2. loop 10 - used data path 2
3. loop 10 - used data path 3

areyouready344Asked:
Who is Participating?
 
wilcoxonConnect With a Mentor Commented:
This should work...
#!/usr/local/bin/perl

use strict;
use warnings;

$/ = '__Data__';
my (%seen, %dupe);
while (<>) {
    chomp;
    s{\n+$}{};
    $dupe{$_}++ if $seen{$_};
    $seen{$_}++;
}

if (%dupe) {
    print "duplicate records:\n";
    foreach my $set (sort keys %dupe) {
        print "$set\n";
    }
} else {
    print "no duplicates\n";
}

Open in new window

0
 
sjklein42Commented:
(1) What output do you want - the input echoed, but with the duplicate records removed?

(2) I think this is right, but are we looking for entire "sets" of data that match based on the timestamp and the detail lines following it?

Please provide a more complete input test data so we can be sure it works the way you want.
0
 
areyouready344Author Commented:
Looking for  a complete duplicate of all lines within a record (including timestamp and lines). Output should say no duplicate or duplicate records and display which ones are duplicated..

Thanks,
0
 
areyouready344Author Commented:
thanks again Wilcoxo...
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.