Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

deduplicate SQL 2005 table on one field

Posted on 2014-01-06
2
Medium Priority
?
415 Views
Last Modified: 2014-01-06
hello all! I have a table with 3 fields, let's just call them field1, email, and field3. So I have duplicate email addresses throughout the table. I want to deduplicate this table based on email. I don't care what records are removed as long as the table is trimmed down to non-duplicate email addresses. Is there a way I can do this? Thanks!
0
Comment
Question by:mrosier
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 60

Accepted Solution

by:
Kevin Cross earned 2000 total points
ID: 39759617
Yes, there are several ways to accomplish this.  One way, leveraging the ROW_NUMBER() and OVER() clause in SQL 2005, is to rank your records 1 to n within a derived query or common table expression, then DELETE records with rank > 1 in outer query.

Using this method, you can use the ordering of the other pieces of information to determine the retention.  For example, if field1 is an integer that increments, you may want the lowest ID value (i.e., the first entry).  Therefore, you could ORDER BY field1.  If you want the last, you could ORDER BY field1 DESC.  If it truly does not matter, you can make the ORDER BY random.

For example:
;WITH cte(field1, email, field3, [rank]) AS (
    SELECT field1, email, field3
         , ROW_NUMBER() OVER(PARTITION BY email ORDER BY field1)
    FROM your_table_name
)
DELETE
FROM cte
WHERE [rank] > 1
;

Open in new window


I hope that helps!

P.S. I would run the inner query first, so you understand what the ranking does.  Second, you can change "DELETE" to "SELECT field1, email, field3, [rank]" so you can see what records will be removed before issuing the deletion.

Best regards,

Kevin
0
 

Author Comment

by:mrosier
ID: 39759667
excellent, it really is truly a non-issue as to what records stay as long as email addresses in the table are unique. And that is how it usually is with data deduplication for me. I appreciate the help, thanks!
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've encountered valid database schemas that do not have a primary key.  For example, I use LogParser from Microsoft to push IIS logs into a SQL database table for processing and analysis.  However, occasionally due to user error or a scheduled task…
Data architecture is an important aspect in Software as a Service (SaaS) delivery model. This article is a study on the database of a single-tenant application that could be extended to support multiple tenants. The application is web-based develope…
Video by: ITPro.TV
In this episode Don builds upon the troubleshooting techniques by demonstrating how to properly monitor a vSphere deployment to detect problems before they occur. He begins the show using tools found within the vSphere suite as ends the show demonst…
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question