Solved

SQL query to detect and delete duplicate records

Posted on 2004-08-05
5
4,603 Views
Last Modified: 2012-08-13
I need a SQL query to detect and delete duplicate records, that is records where firstname and lastname are identical (if the case is different, would still be duplicate).  Detection and deletion could be in separate steps.
Thanks
0
Comment
Question by:MichaelMullin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 125 total points
ID: 11730240
you need the Primary key of the row as well

select 'matched to ', t.pk, d.*
from  Table as D
inner Join Table as T
on D.Pk < T.PK
and D.Firstname=T.FirstName
and D.Lastname=T.lastName

you need criteria to decide which one to delete  

Delete from Table
Where Exists (Select T.pk from table as T
                      Where T.pk > Table.pk
                         and T.firstname=Table.firstname
                        and t.lastname = table.lastname)


I hope this isn't HOMEWORK ?    
0
 
LVL 17

Expert Comment

by:BillAn1
ID: 11730281
If no primary key, try something like this :

SELECT DISTINCT firstname, lastname
INTO #temp_table
FROM source_table

DELETE FROM source_table

INSERT INTO source_table
SELECT * FROM #temp_table

0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 11730356
do you have a case sensistivity problem ?

if so convert both names to upper case and then do the test....

0
 
LVL 34

Expert Comment

by:arbert
ID: 11730487
Agree with lowfat--if you have something you can use for a key, you're better off to use that method.  If not, you need to use a method like BillAn1 suggested (I would just truncate the table instead of deleting the old rows--also, if you have a lot of data, this can be very slow) or a cursor.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 11730947
Here is a sample using a cursor but that does not require a separate table or a dump/reload:



DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT [firstName], [lastName], COUNT(*) AS numDups
FROM yourTable
GROUP BY [firstName], [lastName]
HAVING COUNT(*) > 1
DECLARE @firstName VARCHAR(30) --Change to match datatype on your table
DECLARE @lastName VARCHAR(30) --Change to match datatype on your table
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @firstName, @lastName, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
      SET @numDups = @numDups - 1 --delete all but 1 of the duplicates
      SET ROWCOUNT @numDups
      DELETE FROM yourTable
      WHERE [firstName] = @firstName AND [lastName] = @lastName
      FETCH NEXT FROM dupsCsr INTO @firstName, @lastName, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr

SET ROWCOUNT 0 --restore default
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question