Solved

SQL Command to delete all duplicates except 1

Posted on 2007-11-27
10
1,313 Views
Last Modified: 2008-02-01
As you can see from the cmd - i want to delete all duplicates however i want to keep at least one?

e.g. if i have 3 duplicate then i only want to delete 2 etc

NB: Please ignore the declare - its not relevant and i'm unable to edit code snippit
declare @source UniqueIdentifier
select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 
	delete FROM 
	[user]
	where email in 
	(
		SELECT email FROM [user]
		GROUP BY email
		HAVING (COUNT(email) > 1 )     
	)

Open in new window

0
Comment
Question by:paulCardiff
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 20361337
which of the x rows do you want to keep?

delete u
FROM [user] u
where exists ( SELECT NULL FROM [user] i
            WHERE i.email = u.email
              AND i.somekeyfield < u.somekeyfield 
            )

Open in new window

0
 
LVL 25

Expert Comment

by:imitchie
ID: 20361424
try this pattern

delete FROM [user]
where email in
(
SELECT email FROM [user] GROUP BY email HAVING (COUNT(email) > 1 )    
)
AND NOT [ID] IN
(
SELECT MAX(ID) FROM [user] GROUP BY email HAVING (COUNT(email) > 1 )    
)
0
 

Author Comment

by:paulCardiff
ID: 20361431
Sorry let me explain by problem in a bit more depth  i'm using the following logic i.e. i'm importing rows and using a source field to identify new rows

so i'll want to keep any existing rows, i thought i could this with the following but i found that this was
just deleteing all duplicate new rows what i want to say is ....

1) Delete all new rows (e.g. @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d') if an existing email address is present
2) If no matching email address is already present then only delete all duplicates new rows except for one - and that one can be any one



declare @source UniqueIdentifier
select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 
	SELECT * FROM 
	[user]
	where source=@source and email in 
	(
		SELECT email FROM [user]
		GROUP BY email
		HAVING (COUNT(email) > 1 )     
	)

Open in new window

0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 

Author Comment

by:paulCardiff
ID: 20361451
btw: " SELECT * " is meant to be DELETE
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 20361480

delete u
FROM [user] u
WHERE u.source=@source 
AND exists ( SELECT NULL FROM [user] i
            WHERE i.email = u.email
              AND i.somekeyfield < u.somekeyfield 
              AND i.source = @source
            )

Open in new window

0
 

Author Comment

by:paulCardiff
ID: 20361548
Thanks for that, and please feel free to correct me, but using some sample data i've got e.g.

USERID                              EMAIL                             SOURCE
1                                        a@a.com                        dd59f087-cd6f-412b-baef-5226c7069c0d
2                                        a@a.com                        4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d
3                                        a@a.com                        4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d

This cmd will only delete UserId 3, whereas in this instance i need both 2 & 3 deleted - however if 1 wasn't present then this would be fine
0
 

Author Comment

by:paulCardiff
ID: 20361841
As the primary key has identify = true one possibility is if i said something like

Delete all duplicates except for the lowest userId

Is this possible and if so can any one suggest the appropiate syntax for it please?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 20361941
<<Is this possible and if so can any one suggest the appropiate syntax for it please?>>

delete from yourtable where USERID in (select min(USERID) from yourtable group by EMAIL)
0
 
LVL 25

Expert Comment

by:imitchie
ID: 20361942
declare @source UniqueIdentifier
select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 
DELETE FROM [user]
where source=@source AND
(
ID < (SELECT MAX(ID) FROM [user] GROUP BY email)
OR NOT EXISTS
(SELECT MAX(ID) FROM [user] GROUP BY email and source <> @source)
)
0
 
LVL 25

Accepted Solution

by:
imitchie earned 500 total points
ID: 20361964
let me rephrase that.
it should now say
1. delete only where the source = @source, never delete existing data
2. handles multiple emails (2xbob, 4xfred, 6xpeter within single @source)
3. ALL @source duplicates are removed if existing email is present
declare @source UniqueIdentifier
select @source='4fb2a2a4-db60-460b-bf5c-7ddf2872fe1d'
 
DELETE FROM [user]
where source=@source and email in 
(
SELECT email FROM [user]
GROUP BY email
HAVING (COUNT(email) > 1 )     
)
AND NOT [UserID] IN
(
SELECT MIN(UserID) FROM [user] GROUP BY email HAVING (COUNT(email) > 1 )    
)

Open in new window

0

Featured Post

Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question