How duplicate duplicate data in ms sql

Posted on 2009-12-22
Medium Priority
Last Modified: 2012-05-08
Hello all,

is there a way to delete duplicate data in ms sql ?
Question by:abennett10
LVL 75

Expert Comment

by:Aneesh Retnakaran
ID: 26107112
LVL 22

Accepted Solution

8080_Diver earned 1000 total points
ID: 26107609
is there a way to delete duplicate data in ms sql ?
If the data is absolutely duplicated and there is not some sort of unique identifier available, then answer is "yes, but the process can be ugly."  Otherwise, use the unique identifier to delete the desired rows (although, based on there being a question here, I rather suspect that the former case is the one under discussion. ;-)
If you have absolutely duplicate rows, i.e. there is nothing to uniquely identify one from the other, then you have a couple of options, neither of which is particularly handy or pretty.
Option 1: Two Stage Data Copy
  1. Script out the table in question and then (after suitably modifying the table name anywhere it is mentioned in the script) create a second identical (but empty) table into which you can copy data.
  2. Add an Identity column (i.e. an INT with its Identity property set to Yes) to the new table.
  3. Now, script the old table as a SELECT statement and add the word "DISTINCT" between the word SELECT and the first column name.  (I would execute that query to see how many rows it produces and compare it to the number of rows in the table . . . it should be smaller.)
  4. Now script the new table as an INSERT statement.
  5. Next, replace everything in the INSERT script from the word VALUES to the end of the script with the SELECT script from the OLD table.
  6. Now, execute the reconfigured INSERT script.  This should move only distinct rows to the new table.
  7. Once that is done, TRUNCATE the old table.
  8. Then add an Identity column to the OLD table.
  9. Finally, copy the data from the NEW table to the OLD table.

Option 2: Delete based on Identity and Minumum ID
  1. Add an Identity column (MyID) to the table in question.
  2. Create a query along the lines of:
    DELETE FROM yourtable
    FROM yourtablename
    GROUP BY {list all column names except the identity column});
  3. Execute the query.

Author Closing Comment

ID: 31669112
Thanks for your help. I did a distinct into a temporary table. That fixed it. but I like your solution.

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: When running hybrid database environments, you often need to query some data from a remote db of any type, while being connected to your MS SQL Server database. Problems start when you try to combine that with some "user input" pass…
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
this video summaries big data hadoop online training demo (http://onlineitguru.com/big-data-hadoop-online-training-placement.html) , and covers basics in big data hadoop .
Screencast - Getting to Know the Pipeline

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question