How to check for duplicate data records using Perl?

Posted on 2011-03-04
Last Modified: 2012-05-11
How can I check if data records are duplicated using Perl?

For example, I have the following format of duplicated data records and need to check this.

time: 2:30 p.m.
1. loop 10 - used data path 1
2. loop 10 - used data path 2
3. loop 10 - used data path 3

time: 2:30 p.m.
1. loop 10 - used data path 1
2. loop 10 - used data path 2
3. loop 10 - used data path 3

Question by:areyouready344
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
LVL 16

Expert Comment

ID: 35037423
(1) What output do you want - the input echoed, but with the duplicate records removed?

(2) I think this is right, but are we looking for entire "sets" of data that match based on the timestamp and the detail lines following it?

Please provide a more complete input test data so we can be sure it works the way you want.

Author Comment

ID: 35039728
Looking for  a complete duplicate of all lines within a record (including timestamp and lines). Output should say no duplicate or duplicate records and display which ones are duplicated..

LVL 26

Accepted Solution

wilcoxon earned 500 total points
ID: 35040444
This should work...

use strict;
use warnings;

$/ = '__Data__';
my (%seen, %dupe);
while (<>) {
    $dupe{$_}++ if $seen{$_};

if (%dupe) {
    print "duplicate records:\n";
    foreach my $set (sort keys %dupe) {
        print "$set\n";
} else {
    print "no duplicates\n";

Open in new window


Author Comment

ID: 35193912
thanks again Wilcoxo...

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question