?
Solved

hOW TO DELETE DUPLICATE RECORDS?

Posted on 2011-03-18
17
Medium Priority
?
268 Views
Last Modified: 2012-05-11
I have a table with 2 col.  One is the self-inc PK, which I want to keep, 2nd is varchar (string).  What's the query like for removing rows that have duplicate string in  2nd column.  
I saw something like this on some web site but not sure if this is correct
delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)
0
Comment
Question by:lapucca
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 3
  • +2
17 Comments
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35169750
that won't do.

should be something like this


DELETE FROM [TABLE]
WHERE ID NOT IN (
	SELECT MIN(ID) FROM [TABLE]
	GROUP BY [Name]
)

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35169773
@JoeNuvo

I believe you need to include the HAVING clause  = )
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35169774
Actually...  N/M    I must be tired!
0
DFW AZURE MEETUP TONIGHT FRI 6PM

We will be discussing what Azure Stack is, how does it fit into the suit of offerings that Azure has currently, and where can it fit into your organizations technology stack. We will also be discussing limitations of the platform while covering various applicable scenarios.

 
LVL 3

Expert Comment

by:ssisworo
ID: 35169846

delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)

yes... this is work.
but you need repeatly run until... no duplicate
because it just delete the MAX ID where having duplicate NAME
if you have 4 ID with duplicate NAME... you need run this query 3 times,
and no effect for the fourth run and so on
0
 

Author Comment

by:lapucca
ID: 35170106
Is that right?  I would have to run this several times?
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170154
yes, of course. I often run a query like this over and over again to clean the data.
if you're not sure, backup first.

here my sample (i run four times) :
QUERY
 query
RESULT
 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170157
The code in your question will delete "1 row" of duplicate record regardless of how many ID.
from below example
1, John
2, Mary
3, John
4, Danny
5, Mary
6, John

There are 3 John, 2 Mary, 1 Danny  which need to delete 2 John and 1 Mary
you have to run your given code 4 rounds.

Why 4 rounds?  Because you will know it's time to stop when it delete nothing (0 row affected)
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170162
so, if you really like to delete 1 row at a time.

this code will do

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE] WHERE NAME IN (SELECT [NAME] FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1)
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window

0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170169
But if you want a little better of code, here is it

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window


What is the different? this code will delete duplicate of all name, but just 1 per each name.
So, from old example, need to delete 2 John and 1 Marry.  This code will stop at round 3.
(Round to run = Max Duplicate)
0
 

Author Comment

by:lapucca
ID: 35170170
No, I am looking to run the query once and removes all duplicate rows.  Forget the query I found from some site.  Can someone help me for a query that would do this.  Thank you.
0
 
LVL 23

Expert Comment

by:OP_Zaharin
ID: 35170174
hi lapucca. if you can share with us the data structure that you currently have and show the example how it's duplicate, some of us might be able to point you to the right track :)
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170184
see...
1. no effect for the fourth run
2. ID = 7 deleted because double data
    here the explain :
     delete from TABLE1 where ID in
     (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1
      GROUP BY NAME    HAVING COUNT(*)>1))
    a. (SELECT NAME FROM TABLE1 GROUP BY NAME    HAVING COUNT(*)>1) ===> four
    b. (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN ('four') ===> 7
    c. delete from TABLE1 where ID in ('7')
    d. ID = 7 deleted from TABLE1



0
 
LVL 11

Accepted Solution

by:
JoeNuvo earned 1000 total points
ID: 35170187
best method, should be my comment ID 35170169
since it will utilized index when perform delete.
It may loop few times, however, you just call it once.

But, if you want it to run without a loop
then my 1st comment (ID 35169750) will do it for you.

just replace [Table], [ID], [Name]  with your actual table/field name

0
 
LVL 3

Assisted Solution

by:ssisworo
ssisworo earned 1000 total points
ID: 35170203
here... for only 1 run :

--------------------------

delete from TABLE1 where ID in
(SELECT (ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1)) and
ID not in
(SELECT MIN(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1) GROUP BY NAME)


----------------------------------------------------------------------

QUERY :
 QUERY
RESULT :
 result
0
 

Author Closing Comment

by:lapucca
ID: 35170217
Thanks everyone.  I'm sorry for not being very clear from the beginning.
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170223

yeah.... the query from JoeNuvo is much better :

here, the screen shoot

QUERY :

 QUERY
RESULT :

 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170233
Hi ssisworo,

My query may more simple, but performance wise, maybe your "Assist Solution" will be better.
Since it got more change of utilized index, and also reduce number of concern ID.

but I'm not going to confirm it ^_^
0

Featured Post

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question