Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 199
  • Last Modified:

De-Dupe Table

I can write a query which finds the duplicated records in a table.  Obviously this brings back all the duplicated data this is the code.....

It is the next bit I am unsure of - I want to be able to delete one of the two records identified, but am unsure of how to do it.  If any of you lovely experts can assist I would be very grateful.


SELECT TPRCUSTOMERID, count(tprcustomerid) AS DupsCount FROM DGT_EXTRA GROUP BY TPRCUSTOMERID HAVING COUNT(TPRCUSTOMERID) > 1

Open in new window

0
ITgirl
Asked:
ITgirl
  • 2
1 Solution
 
BrandonGalderisiCommented:
If you have a surrogate key (and identity column), you can do something like this.

select *
from DGT_Extra e
inner join (SELECT TPRCUSTOMERID, Min(Id) AS FirstID FROM DGT_EXTRA GROUP BY TPRCUSTOMERID HAVING COUNT(TPRCUSTOMERID) > 1) a
 on e.TPRCustomerID = a.TPRCustomerID
where e.Id > a.FirstID

The above will get you all of the Duplicate TPRCustomerID records which have a higher Id value than the min.

From there, change the SELECT * to "DELETE e".

Just verify with the select that it will delete the expected results.  If you don't have an Id column, perhaps there is another unique column like an insertedDate which can serve the same purpose.
0
 
Kevin CrossChief Technology OfficerCommented:
Are these duplicates identical? Or is there a possibility of differing data on each row containing the duplicate TPRCUSTOMERID value?

If the latter, then which row do you want to keep, i.e., is there another field like an IDENTITY or DateAdded (DateModified) that identifies the first or most current row that you can use to select one?

For the former, deleting either row would be fine then right.

In either case, the ultimate solution can be to use a common table expression. What will change is what you ORDER BY to determine RN = 1, which is the row that is retained.

;WITH cte AS (
SELECT *
     , ROW_NUMBER() OVER(PARTITION BY TPRCUSTOMERID ORDER BY TPRCUSTOMERID) RN
FROM DGT_EXTRA 
)
SELECT TPRCUSTOMERID
FROM cte
WHERE RN > 1
;

Open in new window


This will show you the values that have more than one row. You can change the final query to:
;WITH ...
DELETE
FROM cte
WHERE RN > 1
;

Which will remove all the rows except the first. Therefore, back to my other point, determining first row is the critical piece to the puzzle. In the ROW_NUMBER() OVER() part of the code, change the ORDER BY so that the ranking is as you want it and you should be set.

Hope that helps!

Best regards and happy coding,

Kevin
0
 
Kevin CrossChief Technology OfficerCommented:
Argh. Too slow. Brandon beat me to it, but hopefully it gives you some alternatives. :)
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now