• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 375
  • Last Modified:

SQL Duplicate Query

I have a table with duplicates rows other than an auto-generated identity field.

eg. MyTable_ID, MyTable_COL1, MyTable_COL2, MyTable_COL3.

I would like to get the values for MyTable_ID for each row where MyTable_COL1, MyTable_COL2, MyTable_COL3 are duplicates.

Any ideas?
0
DaveChoiceTech
Asked:
DaveChoiceTech
  • 3
  • 2
  • 2
  • +2
3 Solutions
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
0
 
Vikas GargBusiness Intelligence DeveloperCommented:
Hi,

:with CTE as
(
select MyTable_ID, MyTable_COL1, MyTable_COL2, MyTable_COL3,row_number() over (PARTITION BY MyTable_COL1, MyTable_COL2, MyTable_COL3 order by MyTable_ID) RN from MyTable
)

SELECT * FROM CTE WHERE rn > 1

this will give only duplicate records which if you want to delete you can write

delete FROM CTE WHERE rn > 1
0
 
DaveChoiceTechAuthor Commented:
Vikas, your answer is close but not quite what I'm looking for. I need both MyTable_ID's where the other columns are duplicates. In your query the first of these rows will have an RN of and the duplicate(s)will have RN of 2 or greater. If I row has duplicates I need the MyTable_ID's for the first row as well as the other duplicates.
0
NEW Veeam Backup for Microsoft Office 365 1.5

With Office 365, it’s your data and your responsibility to protect it. NEW Veeam Backup for Microsoft Office 365 eliminates the risk of losing access to your Office 365 data.

 
Vikas GargBusiness Intelligence DeveloperCommented:
Remove the where clause and you will get all the records and from the rn column field you can come to know the original which has rn=1 and duplicates which has rn > 1
0
 
DaveChoiceTechAuthor Commented:
Yes but it will also include the records with rn=1 which do not have duplicates in the remaining colums. I need only the rows that have an rn=1 and thier duplicates with rn> 1
0
 
PortletPaulCommented:
remove the comments from the first 2 lines for a delete, otherwise run a a select to test
adjust the top(n) to suit, this might be useful if there are lots to delete
--DELETE TOP (10000) FROM m
--      FROM MyTable m
SELECT * FROM MyTable
      INNER JOIN (
            SELECT
                  MIN(MyTable_ID) AS MyTable_ID
                , MyTable_COL1
                , MyTable_COL2
                , MyTable_COL3
            FROM MyTable WITH (NOLOCK)
            GROUP BY
                  MyTable_COL1
                , MyTable_COL2
                , MyTable_COL3
            HAVING COUNT(*) > 1
      ) AS dups
            ON m.MyTable_ID > dups.MyTable_ID  --IMPORTANT: use only greater than here
            AND m.MyTable_COL1 = dups.MyTable_COL1
            AND m.MyTable_COL2 = dups.MyTable_COL2
            AND m.MyTable_COL3 = dups.MyTable_COL3

Open in new window

0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
You can self-join the cte

...
Selected ...
From cte c1
Join cte c2
 On c2.col1 = c1.col1
...
 And c2.rn > 1
Where c1.rn =1
0
 
DaveChoiceTechAuthor Commented:
Brilliant! Thanks a lot.
0
 
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
Guys - I just knocked an article titled SQL Server Delete Duplicate Rows Solutions that covers the above comments and a wompload of others regarding duplicate rows.

If you have some time, please give it a read and let me know if you like it.

Thanks.
Jimbo
0

Featured Post

 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

  • 3
  • 2
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now