SQL Duplicate Query

Posted on 2014-08-11
Last Modified: 2014-09-10
I have a table with duplicates rows other than an auto-generated identity field.

eg. MyTable_ID, MyTable_COL1, MyTable_COL2, MyTable_COL3.

I would like to get the values for MyTable_ID for each row where MyTable_COL1, MyTable_COL2, MyTable_COL3 are duplicates.

Any ideas?
Question by:DaveChoiceTech
    LVL 142

    Assisted Solution

    by:Guy Hengel [angelIII / a3]
    LVL 14

    Assisted Solution

    by:Vikas Garg

    :with CTE as
    select MyTable_ID, MyTable_COL1, MyTable_COL2, MyTable_COL3,row_number() over (PARTITION BY MyTable_COL1, MyTable_COL2, MyTable_COL3 order by MyTable_ID) RN from MyTable


    this will give only duplicate records which if you want to delete you can write

    delete FROM CTE WHERE rn > 1

    Author Comment

    Vikas, your answer is close but not quite what I'm looking for. I need both MyTable_ID's where the other columns are duplicates. In your query the first of these rows will have an RN of and the duplicate(s)will have RN of 2 or greater. If I row has duplicates I need the MyTable_ID's for the first row as well as the other duplicates.
    LVL 14

    Expert Comment

    by:Vikas Garg
    Remove the where clause and you will get all the records and from the rn column field you can come to know the original which has rn=1 and duplicates which has rn > 1

    Author Comment

    Yes but it will also include the records with rn=1 which do not have duplicates in the remaining colums. I need only the rows that have an rn=1 and thier duplicates with rn> 1
    LVL 47

    Expert Comment

    remove the comments from the first 2 lines for a delete, otherwise run a a select to test
    adjust the top(n) to suit, this might be useful if there are lots to delete
    --DELETE TOP (10000) FROM m
    --      FROM MyTable m
    SELECT * FROM MyTable
          INNER JOIN (
                      MIN(MyTable_ID) AS MyTable_ID
                    , MyTable_COL1
                    , MyTable_COL2
                    , MyTable_COL3
                FROM MyTable WITH (NOLOCK)
                GROUP BY
                    , MyTable_COL2
                    , MyTable_COL3
                HAVING COUNT(*) > 1
          ) AS dups
                ON m.MyTable_ID > dups.MyTable_ID  --IMPORTANT: use only greater than here
                AND m.MyTable_COL1 = dups.MyTable_COL1
                AND m.MyTable_COL2 = dups.MyTable_COL2
                AND m.MyTable_COL3 = dups.MyTable_COL3

    Open in new window

    LVL 142

    Accepted Solution

    You can self-join the cte

    Selected ...
    From cte c1
    Join cte c2
     On c2.col1 = c1.col1
     And c2.rn > 1
    Where c1.rn =1

    Author Closing Comment

    Brilliant! Thanks a lot.
    LVL 65

    Expert Comment

    by:Jim Horn
    Guys - I just knocked an article titled SQL Server Delete Duplicate Rows Solutions that covers the above comments and a wompload of others regarding duplicate rows.

    If you have some time, please give it a read and let me know if you like it.


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Highfive + Dolby Voice = No More Audio Complaints!

    Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

    Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
    The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
    Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
    Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

    761 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    15 Experts available now in Live!

    Get 1:1 Help Now