Solved

Delete/merge some records based on a count

Posted on 2008-06-16
3
236 Views
Last Modified: 2012-06-21
I am trying to delete/merge some records from my dataset as there are a nuber of duplicates but I'm struggling to figure out how to do this. I have the following code:

SELECT     fnam03, snam03, had103, hpc103, hpc203, dtob03, COUNT(*) AS No_of_records
FROM         dbo.ptp03
GROUP BY fnam03, snam03, had103, hpc103, hpc203, dtob03
HAVING      (COUNT(*) > 1) and snam03 is not null
order by fnam03, snam03

and this allows me to detect the duplicates in my dataset (about 849 from about 12000). However I also need to be able to perform some other calculations prior to the merging taking place:

If there are 2 records (as most of the duplicates are) then I need to check a field called RGNO03 to see if either record has this populated. If one of them does then I wish to keep this record and discard the other. If RGNO03 is NULL in both records then it should check RCNO03 and the record with the highest number in this field should be retained and the lower record discarded. However, if RGNO03 is popluated in both records then no de-duplication should take place.

BTW RGNO03 and RCNO03 are both in the PTP03 table (the same table as the rest of the query fields)

The same principals should be applied to records that have a count value of 3 or 4 (there aren't that many of them so I could do this in stages if needed).

Anyone got any suggestions on a possible way forward on this one?
0
Comment
Question by:Steven O'Neill
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 51

Expert Comment

by:Mark Wills
ID: 21813916
OK, might be a bit convaluted...

but, Keep if RGN003 is populated. If not (in all instances) keep the highest RCN003.


so, probably easiest to do this in two parts... First being get rid of the null RGNo03 where there is a populated one, Second being keep the highest RCNo03 when all RGNo03 are null.

One question first...  is it an "oh" then zero three ?
Next question, is RCNo03 a unique identifier per record, and if not, is there one ?
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 21820496
Hi Mark
Thanx for offering to help out here. To answer you're questions:

1. Yes it is rgno03 (with an oh) and rcno03 (again with an oh);
2. No rcno03 isn't unique but we do have a uniquie identifier called TempID. It's an integer.
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 500 total points
ID: 21823371
OK, first step is to create a backup prior to doing anything - found this is the first step toward success - like an insurance policy that you never want to cash in - and follows along the lines or "murphy's law"...

Then we will remove those NULL rgno03 where there is a populated rgno03, That will leave us with either populated rgn03's or, NULL rgno03's and if the latter, then we will end up keeping the tempid which represents the highest rcno03...

If that sounds OK, then look at the following... And as a first pass would suggest you change the "delete" to a " select *" to visualize what is happening...
select * into ptp03_backup_dupes from ptp03
 
delete from ptp03
where rgno03 is NULL 
and exists (select null from ptp03 p where p.fnam03=ptp03.fnam03 and p.snam03=ptp03.snam03 and p.had103=ptp03.had103 and p.hpc203=ptp03.hpc203 and p.dtob03=ptp03.dtob03 and p.rgno03 is not null)
 
delete from ptp03
where rgno03 is NULL 
and not exists (select null from ptp03 p where p.fnam03=ptp03.fnam03 and p.snam03=ptp03.snam03 and p.had103=ptp03.had103 and p.hpc203=ptp03.hpc203 and p.dtob03=ptp03.dtob03 and p.rgno03 is not null)
and rcno03 <>  (select top 1 rcno03 from ptp03 p where p.fnam03=ptp03.fnam03 and p.snam03=ptp03.snam03 and p.had103=ptp03.had103 and p.hpc203=ptp03.hpc203 and p.dtob03=ptp03.dtob03 order by rcno03 desc, tempid desc)

Open in new window

0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction This article will provide a solution for an error that might occur installing a new SQL 2005 64-bit cluster. This article will assume that you are fully prepared to complete the installation and describes the error as it occurred durin…
When writing XML code a very difficult part is when we like to remove all the elements or attributes from the XML that have no data. I would like to share a set of recursive MSSQL stored procedures that I have made to remove those elements from …
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question