Solved

Delete/merge some records based on a count

Posted on 2008-06-16
3
231 Views
Last Modified: 2012-06-21
I am trying to delete/merge some records from my dataset as there are a nuber of duplicates but I'm struggling to figure out how to do this. I have the following code:

SELECT     fnam03, snam03, had103, hpc103, hpc203, dtob03, COUNT(*) AS No_of_records
FROM         dbo.ptp03
GROUP BY fnam03, snam03, had103, hpc103, hpc203, dtob03
HAVING      (COUNT(*) > 1) and snam03 is not null
order by fnam03, snam03

and this allows me to detect the duplicates in my dataset (about 849 from about 12000). However I also need to be able to perform some other calculations prior to the merging taking place:

If there are 2 records (as most of the duplicates are) then I need to check a field called RGNO03 to see if either record has this populated. If one of them does then I wish to keep this record and discard the other. If RGNO03 is NULL in both records then it should check RCNO03 and the record with the highest number in this field should be retained and the lower record discarded. However, if RGNO03 is popluated in both records then no de-duplication should take place.

BTW RGNO03 and RCNO03 are both in the PTP03 table (the same table as the rest of the query fields)

The same principals should be applied to records that have a count value of 3 or 4 (there aren't that many of them so I could do this in stages if needed).

Anyone got any suggestions on a possible way forward on this one?
0
Comment
Question by:Steven O'Neill
  • 2
3 Comments
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
OK, might be a bit convaluted...

but, Keep if RGN003 is populated. If not (in all instances) keep the highest RCN003.


so, probably easiest to do this in two parts... First being get rid of the null RGNo03 where there is a populated one, Second being keep the highest RCNo03 when all RGNo03 are null.

One question first...  is it an "oh" then zero three ?
Next question, is RCNo03 a unique identifier per record, and if not, is there one ?
0
 
LVL 2

Author Comment

by:Steven O'Neill
Comment Utility
Hi Mark
Thanx for offering to help out here. To answer you're questions:

1. Yes it is rgno03 (with an oh) and rcno03 (again with an oh);
2. No rcno03 isn't unique but we do have a uniquie identifier called TempID. It's an integer.
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 500 total points
Comment Utility
OK, first step is to create a backup prior to doing anything - found this is the first step toward success - like an insurance policy that you never want to cash in - and follows along the lines or "murphy's law"...

Then we will remove those NULL rgno03 where there is a populated rgno03, That will leave us with either populated rgn03's or, NULL rgno03's and if the latter, then we will end up keeping the tempid which represents the highest rcno03...

If that sounds OK, then look at the following... And as a first pass would suggest you change the "delete" to a " select *" to visualize what is happening...
select * into ptp03_backup_dupes from ptp03
 

delete from ptp03

where rgno03 is NULL 

and exists (select null from ptp03 p where p.fnam03=ptp03.fnam03 and p.snam03=ptp03.snam03 and p.had103=ptp03.had103 and p.hpc203=ptp03.hpc203 and p.dtob03=ptp03.dtob03 and p.rgno03 is not null)
 

delete from ptp03

where rgno03 is NULL 

and not exists (select null from ptp03 p where p.fnam03=ptp03.fnam03 and p.snam03=ptp03.snam03 and p.had103=ptp03.had103 and p.hpc203=ptp03.hpc203 and p.dtob03=ptp03.dtob03 and p.rgno03 is not null)

and rcno03 <>  (select top 1 rcno03 from ptp03 p where p.fnam03=ptp03.fnam03 and p.snam03=ptp03.snam03 and p.had103=ptp03.had103 and p.hpc203=ptp03.hpc203 and p.dtob03=ptp03.dtob03 order by rcno03 desc, tempid desc)

Open in new window

0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

If you having speed problem in loading SQL Server Management Studio, try to uncheck these options in your internet browser (IE -> Internet Options / Advanced / Security):    . Check for publisher's certificate revocation    . Check for server ce…
INTRODUCTION: While tying your database objects into builds and your enterprise source control system takes a third-party product (like Visual Studio Database Edition or Red-Gate's SQL Source Control), you can achieve some protection using a sing…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now