Solved

SQL Command to delete all duplicates except 1

Posted on 2007-11-27
10
1,289 Views
Last Modified: 2008-02-01
As you can see from the cmd - i want to delete all duplicates however i want to keep at least one?

e.g. if i have 3 duplicate then i only want to delete 2 etc

NB: Please ignore the declare - its not relevant and i'm unable to edit code snippit
declare @source UniqueIdentifier

select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 

	delete FROM 

	[user]

	where email in 

	(

		SELECT email FROM [user]

		GROUP BY email

		HAVING (COUNT(email) > 1 )     

	)

Open in new window

0
Comment
Question by:paulCardiff
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 20361337
which of the x rows do you want to keep?

delete u
FROM [user] u
where exists ( SELECT NULL FROM [user] i
            WHERE i.email = u.email
              AND i.somekeyfield < u.somekeyfield 
            )

Open in new window

0
 
LVL 25

Expert Comment

by:imitchie
ID: 20361424
try this pattern

delete FROM [user]
where email in
(
SELECT email FROM [user] GROUP BY email HAVING (COUNT(email) > 1 )    
)
AND NOT [ID] IN
(
SELECT MAX(ID) FROM [user] GROUP BY email HAVING (COUNT(email) > 1 )    
)
0
 

Author Comment

by:paulCardiff
ID: 20361431
Sorry let me explain by problem in a bit more depth  i'm using the following logic i.e. i'm importing rows and using a source field to identify new rows

so i'll want to keep any existing rows, i thought i could this with the following but i found that this was
just deleteing all duplicate new rows what i want to say is ....

1) Delete all new rows (e.g. @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d') if an existing email address is present
2) If no matching email address is already present then only delete all duplicates new rows except for one - and that one can be any one



declare @source UniqueIdentifier

select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 

	SELECT * FROM 

	[user]

	where source=@source and email in 

	(

		SELECT email FROM [user]

		GROUP BY email

		HAVING (COUNT(email) > 1 )     

	)

Open in new window

0
 

Author Comment

by:paulCardiff
ID: 20361451
btw: " SELECT * " is meant to be DELETE
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 20361480

delete u

FROM [user] u

WHERE u.source=@source 

AND exists ( SELECT NULL FROM [user] i

            WHERE i.email = u.email

              AND i.somekeyfield < u.somekeyfield 

              AND i.source = @source

            )

Open in new window

0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 

Author Comment

by:paulCardiff
ID: 20361548
Thanks for that, and please feel free to correct me, but using some sample data i've got e.g.

USERID                              EMAIL                             SOURCE
1                                        a@a.com                        dd59f087-cd6f-412b-baef-5226c7069c0d
2                                        a@a.com                        4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d
3                                        a@a.com                        4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d

This cmd will only delete UserId 3, whereas in this instance i need both 2 & 3 deleted - however if 1 wasn't present then this would be fine
0
 

Author Comment

by:paulCardiff
ID: 20361841
As the primary key has identify = true one possibility is if i said something like

Delete all duplicates except for the lowest userId

Is this possible and if so can any one suggest the appropiate syntax for it please?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 20361941
<<Is this possible and if so can any one suggest the appropiate syntax for it please?>>

delete from yourtable where USERID in (select min(USERID) from yourtable group by EMAIL)
0
 
LVL 25

Expert Comment

by:imitchie
ID: 20361942
declare @source UniqueIdentifier
select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 
DELETE FROM [user]
where source=@source AND
(
ID < (SELECT MAX(ID) FROM [user] GROUP BY email)
OR NOT EXISTS
(SELECT MAX(ID) FROM [user] GROUP BY email and source <> @source)
)
0
 
LVL 25

Accepted Solution

by:
imitchie earned 500 total points
ID: 20361964
let me rephrase that.
it should now say
1. delete only where the source = @source, never delete existing data
2. handles multiple emails (2xbob, 4xfred, 6xpeter within single @source)
3. ALL @source duplicates are removed if existing email is present
declare @source UniqueIdentifier

select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 

DELETE FROM [user]

where source=@source and email in 

(

SELECT email FROM [user]

GROUP BY email

HAVING (COUNT(email) > 1 )     

)

AND NOT [UserID] IN

(

SELECT MIN(UserID) FROM [user] GROUP BY email HAVING (COUNT(email) > 1 )    

)

Open in new window

0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now