Solved

deduplicate SQL 2005 table on one field

Posted on 2014-01-06
2
394 Views
Last Modified: 2014-01-06
hello all! I have a table with 3 fields, let's just call them field1, email, and field3. So I have duplicate email addresses throughout the table. I want to deduplicate this table based on email. I don't care what records are removed as long as the table is trimmed down to non-duplicate email addresses. Is there a way I can do this? Thanks!
0
Comment
Question by:mrosier
2 Comments
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 39759617
Yes, there are several ways to accomplish this.  One way, leveraging the ROW_NUMBER() and OVER() clause in SQL 2005, is to rank your records 1 to n within a derived query or common table expression, then DELETE records with rank > 1 in outer query.

Using this method, you can use the ordering of the other pieces of information to determine the retention.  For example, if field1 is an integer that increments, you may want the lowest ID value (i.e., the first entry).  Therefore, you could ORDER BY field1.  If you want the last, you could ORDER BY field1 DESC.  If it truly does not matter, you can make the ORDER BY random.

For example:
;WITH cte(field1, email, field3, [rank]) AS (
    SELECT field1, email, field3
         , ROW_NUMBER() OVER(PARTITION BY email ORDER BY field1)
    FROM your_table_name
)
DELETE
FROM cte
WHERE [rank] > 1
;

Open in new window


I hope that helps!

P.S. I would run the inner query first, so you understand what the ranking does.  Second, you can change "DELETE" to "SELECT field1, email, field3, [rank]" so you can see what records will be removed before issuing the deletion.

Best regards,

Kevin
0
 

Author Comment

by:mrosier
ID: 39759667
excellent, it really is truly a non-issue as to what records stay as long as email addresses in the table are unique. And that is how it usually is with data deduplication for me. I appreciate the help, thanks!
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SSRS report parameters set after publishing to report manager 1 55
How to place a condition in a filter criteria in t-sql (#2)? 10 46
Query to Add Late Tolerance 10 67
Query 14 55
Introduction: When running hybrid database environments, you often need to query some data from a remote db of any type, while being connected to your MS SQL Server database. Problems start when you try to combine that with some "user input" pass…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question