Solved

Delete duplicate records in SQL database using stored procedure

Posted on 2007-12-04
12
1,696 Views
Last Modified: 2008-09-11
I have a SQL server 2000 database that has over a million records in it.  I need to create a stored procedure that i can run that we check for which records have duplicates and then delete all of the duplicates except for one.  The criteria for the duplicates is based on five different columns.  I am hoping somebody can give me some guideance on how to do this.  
0
Comment
Question by:Mike_Stevens
12 Comments
 
LVL 17

Expert Comment

by:xDJR1875
ID: 20405242
Can you provide the structure of the table? Is there a unique key?
0
 

Author Comment

by:Mike_Stevens
ID: 20405276

tblStatus
Column: RecordNumber -  Unique Key
Column: IDNum
Column: Code
Column: Date
Column: Time
Column: State

All of the columns except RecordNumber need to be used in criteria.  Sample below

RecordNumber  IDNum  Code  Date         Time   State
------------------  ---------  ------  -------        ------   --------
100                   12345     A     12122007  1630    CA
101                   12345     A     12122007  1630    CA
102                   12345     A     12122007  1630    CA
103                   12345     A     12122007  1630    CA
104                   12345     A     12122007  1630    CA

Using the sample data above, procedure needs to delete all but one of the records.

0
 
LVL 3

Expert Comment

by:randomjames
ID: 20405356
I would create a new table with the unique index on those fields, export the data from your old table into this new one (which will only take one of each duplicate because of the index), then clear out your old table and move the records from your new table back to the old table.
0
 
LVL 17

Accepted Solution

by:
xDJR1875 earned 500 total points
ID: 20405880
Here is some code you can run iteratively until the rows affected = 0

SET ROWCOUNT 1

DELETE  dbo.tblStatus

--SELECT top 1 * 

FROM dbo.tblStatus

WHERE Exists (Select Count(*), IDNum, Code, [Date], [Time], [State]

                              from dbo.tblStatus

                             Group By IDNum, Code, [Date], [Time], [State]

                             Having Count(*) > 1)

Open in new window

0
 
LVL 17

Expert Comment

by:xDJR1875
ID: 20405903
And here is a link with additional information.
http://support.microsoft.com/kb/139444
and one that shows how to do it with a little bit more programming, but all at once.
http://www.sql-server-performance.com/articles/dev/dv_delete_duplicates_p1.aspx

0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 11

Expert Comment

by:yuching
ID: 20408127
Code provided by xDJR1875: will delete all record which have duplicate data but Mike_Stevens intended to leave a row for the duplicate data.

You can try as below, it will only left the max record number for the duplicates data.

DELETE  FROM tblStatus
WHERE  NOT EXISTS (
        SELECT MaxRecordNumber, IDNum, Code, Date, Time, State
        FROM
        (
              SELECT Max(RecordNumber) AS MaxRecordNumber, IDNum, Code, Date, Time, State
              FROM  dbo.tblStatus
              GROUP BY IDNum, Code, Date, Time, State
          ) A
         WHERE A.MaxRecordNumber = tblStatus.RecordNumber AND
                     A.IDNum = tblStatus.IDNum AND A.Code = tblStatus.Code AND
                     A.Date = tblStatus.Date AND A.Time = tblStatus.Time AND
                     A.State = tblStatus.State
)
0
 
LVL 17

Expert Comment

by:xDJR1875
ID: 20408152
Actually, with the SET ROWCOUNT = 1
The code does not delete all duplicate records.
0
 
LVL 11

Expert Comment

by:yuching
ID: 20408626
Sorry, my mistake, but for the example of data, the duplicate rows can be more than 2, set rowcount =1  will only delete one of the records which having duplicate.
0
 
LVL 17

Expert Comment

by:xDJR1875
ID: 20409336
You are correct that is why I suggest Mike run it iteratively until rows affected = 0.

It is not the most elegant of solutions, but maybe one of  the simplest.
0
 

Expert Comment

by:yordan_georgiev
ID: 21502162
DELETE
    FROM    MyTable
    WHERE    ID NOT IN
    (
    SELECT    MAX(ID)
        FROM        MyTable
        GROUP BY    DuplicatevalueColumn1, DuplicateValueColumn2,
                DuplicateValueColumn2)

Reference:Pinal Dave
http://blog.sqlauthority.com/2007/03/01/sql-server-delete-duplicate-records-rows/
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Converting Teradata SQL to Oracle SQL (exadata) 3 28
TSQL previous 5 23
Update data using formula 22 19
SQL bit field not working as expected 3 16
I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Viewers will learn how the fundamental information of how to create a table.

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now