De-Dupe Table

I can write a query which finds the duplicated records in a table.  Obviously this brings back all the duplicated data this is the code.....

It is the next bit I am unsure of - I want to be able to delete one of the two records identified, but am unsure of how to do it.  If any of you lovely experts can assist I would be very grateful.


SELECT TPRCUSTOMERID, count(tprcustomerid) AS DupsCount FROM DGT_EXTRA GROUP BY TPRCUSTOMERID HAVING COUNT(TPRCUSTOMERID) > 1

Open in new window

LVL 1
ITgirlAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

BrandonGalderisiCommented:
If you have a surrogate key (and identity column), you can do something like this.

select *
from DGT_Extra e
inner join (SELECT TPRCUSTOMERID, Min(Id) AS FirstID FROM DGT_EXTRA GROUP BY TPRCUSTOMERID HAVING COUNT(TPRCUSTOMERID) > 1) a
 on e.TPRCustomerID = a.TPRCustomerID
where e.Id > a.FirstID

The above will get you all of the Duplicate TPRCustomerID records which have a higher Id value than the min.

From there, change the SELECT * to "DELETE e".

Just verify with the select that it will delete the expected results.  If you don't have an Id column, perhaps there is another unique column like an insertedDate which can serve the same purpose.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Kevin CrossChief Technology OfficerCommented:
Are these duplicates identical? Or is there a possibility of differing data on each row containing the duplicate TPRCUSTOMERID value?

If the latter, then which row do you want to keep, i.e., is there another field like an IDENTITY or DateAdded (DateModified) that identifies the first or most current row that you can use to select one?

For the former, deleting either row would be fine then right.

In either case, the ultimate solution can be to use a common table expression. What will change is what you ORDER BY to determine RN = 1, which is the row that is retained.

;WITH cte AS (
SELECT *
     , ROW_NUMBER() OVER(PARTITION BY TPRCUSTOMERID ORDER BY TPRCUSTOMERID) RN
FROM DGT_EXTRA 
)
SELECT TPRCUSTOMERID
FROM cte
WHERE RN > 1
;

Open in new window


This will show you the values that have more than one row. You can change the final query to:
;WITH ...
DELETE
FROM cte
WHERE RN > 1
;

Which will remove all the rows except the first. Therefore, back to my other point, determining first row is the critical piece to the puzzle. In the ROW_NUMBER() OVER() part of the code, change the ORDER BY so that the ranking is as you want it and you should be set.

Hope that helps!

Best regards and happy coding,

Kevin
0
Kevin CrossChief Technology OfficerCommented:
Argh. Too slow. Brandon beat me to it, but hopefully it gives you some alternatives. :)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2005

From novice to tech pro — start learning today.