• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 72
  • Last Modified:

delete duplicate data

I need to delete duplicate data
i.e. if i have 2 rows exactly the same, delete one of the rows

I have a table called Tbl_Data
It has these columns

ID
GPSDateTime (datetime)
ReportID (int)
DeviceID (int)

If all 3 columns match data (except ID) I want to delete one of the rows so i'm left with unique data rows instead duplicates

how might i Do this?
0
websss
Asked:
websss
  • 2
  • 2
1 Solution
 
Scott PletcherSenior DBACommented:
For best performance, if the table has an index with all three of those columns in it, and is keyed by one or more of them, start with column first in the PARTITION BY.
For example, say there was an index on ( ReportID, GPSDateTime ) that included ( DeviceID ), then you would ORDER BY ReportID, GPSDateTime, DeviceID.  The idea is to use any existing "pre-sorting" as much as possible.


;WITH cte_dups AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY GPSDateTime, ReportID, DeviceID) AS row_num
    FROM Tbl_Data
)
DELETE FROM cte_dups
WHERE row_num > 1
0
 
websssAuthor Commented:
Thanks Scott

I'm getting the errror:

Msg 4112, Level 15, State 1, Line 2
The function 'ROW_NUMBER' must have an OVER clause with ORDER BY.
0
 
websssAuthor Commented:
Got it thanks
0
 
Scott PletcherSenior DBACommented:
Yeah, sorry, I left out the ORDER BY, which must appear, even if it's meaningless:

;WITH cte_dups AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY GPSDateTime, ReportID, DeviceID ORDER BY GPSDateTime) AS row_num
    FROM Tbl_Data
)
DELETE FROM cte_dups
WHERE row_num > 1
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now