Solved

SQL query to detect and delete duplicate records

Posted on 2004-08-05
5
4,601 Views
Last Modified: 2012-08-13
I need a SQL query to detect and delete duplicate records, that is records where firstname and lastname are identical (if the case is different, would still be duplicate).  Detection and deletion could be in separate steps.
Thanks
0
Comment
Question by:MichaelMullin
5 Comments
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 125 total points
ID: 11730240
you need the Primary key of the row as well

select 'matched to ', t.pk, d.*
from  Table as D
inner Join Table as T
on D.Pk < T.PK
and D.Firstname=T.FirstName
and D.Lastname=T.lastName

you need criteria to decide which one to delete  

Delete from Table
Where Exists (Select T.pk from table as T
                      Where T.pk > Table.pk
                         and T.firstname=Table.firstname
                        and t.lastname = table.lastname)


I hope this isn't HOMEWORK ?    
0
 
LVL 17

Expert Comment

by:BillAn1
ID: 11730281
If no primary key, try something like this :

SELECT DISTINCT firstname, lastname
INTO #temp_table
FROM source_table

DELETE FROM source_table

INSERT INTO source_table
SELECT * FROM #temp_table

0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 11730356
do you have a case sensistivity problem ?

if so convert both names to upper case and then do the test....

0
 
LVL 34

Expert Comment

by:arbert
ID: 11730487
Agree with lowfat--if you have something you can use for a key, you're better off to use that method.  If not, you need to use a method like BillAn1 suggested (I would just truncate the table instead of deleting the old rows--also, if you have a lot of data, this can be very slow) or a cursor.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 11730947
Here is a sample using a cursor but that does not require a separate table or a dump/reload:



DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT [firstName], [lastName], COUNT(*) AS numDups
FROM yourTable
GROUP BY [firstName], [lastName]
HAVING COUNT(*) > 1
DECLARE @firstName VARCHAR(30) --Change to match datatype on your table
DECLARE @lastName VARCHAR(30) --Change to match datatype on your table
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @firstName, @lastName, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
      SET @numDups = @numDups - 1 --delete all but 1 of the duplicates
      SET ROWCOUNT @numDups
      DELETE FROM yourTable
      WHERE [firstName] = @firstName AND [lastName] = @lastName
      FETCH NEXT FROM dupsCsr INTO @firstName, @lastName, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr

SET ROWCOUNT 0 --restore default
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question