Solved

deduplicate SQL 2005 table on one field

Posted on 2014-01-06
2
407 Views
Last Modified: 2014-01-06
hello all! I have a table with 3 fields, let's just call them field1, email, and field3. So I have duplicate email addresses throughout the table. I want to deduplicate this table based on email. I don't care what records are removed as long as the table is trimmed down to non-duplicate email addresses. Is there a way I can do this? Thanks!
0
Comment
Question by:mrosier
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 60

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 39759617
Yes, there are several ways to accomplish this.  One way, leveraging the ROW_NUMBER() and OVER() clause in SQL 2005, is to rank your records 1 to n within a derived query or common table expression, then DELETE records with rank > 1 in outer query.

Using this method, you can use the ordering of the other pieces of information to determine the retention.  For example, if field1 is an integer that increments, you may want the lowest ID value (i.e., the first entry).  Therefore, you could ORDER BY field1.  If you want the last, you could ORDER BY field1 DESC.  If it truly does not matter, you can make the ORDER BY random.

For example:
;WITH cte(field1, email, field3, [rank]) AS (
    SELECT field1, email, field3
         , ROW_NUMBER() OVER(PARTITION BY email ORDER BY field1)
    FROM your_table_name
)
DELETE
FROM cte
WHERE [rank] > 1
;

Open in new window


I hope that helps!

P.S. I would run the inner query first, so you understand what the ranking does.  Second, you can change "DELETE" to "SELECT field1, email, field3, [rank]" so you can see what records will be removed before issuing the deletion.

Best regards,

Kevin
0
 

Author Comment

by:mrosier
ID: 39759667
excellent, it really is truly a non-issue as to what records stay as long as email addresses in the table are unique. And that is how it usually is with data deduplication for me. I appreciate the help, thanks!
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I am showing a way to read/import the excel data in table using SQL server 2005... Suppose there is an Excel file "Book1" at location "C:\temp" with column "First Name" and "Last Name". Now to import this Excel data into the table, we will use…
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software, including Percona Server for MySQL (https://www.percona.com/software/mysql-database/percona-server) and MongoDB (https://www.percona.com/software/mongo-…
This is my first video review of Microsoft Bookings, I will be doing a part two with a bit more information, but wanted to get this out to you folks.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question