Solved

hOW TO DELETE DUPLICATE RECORDS?

Posted on 2011-03-18
17
263 Views
Last Modified: 2012-05-11
I have a table with 2 col.  One is the self-inc PK, which I want to keep, 2nd is varchar (string).  What's the query like for removing rows that have duplicate string in  2nd column.  
I saw something like this on some web site but not sure if this is correct
delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)
0
Comment
Question by:lapucca
  • 6
  • 5
  • 3
  • +2
17 Comments
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35169750
that won't do.

should be something like this


DELETE FROM [TABLE]
WHERE ID NOT IN (
	SELECT MIN(ID) FROM [TABLE]
	GROUP BY [Name]
)

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 35169773
@JoeNuvo

I believe you need to include the HAVING clause  = )
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 35169774
Actually...  N/M    I must be tired!
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35169846

delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)

yes... this is work.
but you need repeatly run until... no duplicate
because it just delete the MAX ID where having duplicate NAME
if you have 4 ID with duplicate NAME... you need run this query 3 times,
and no effect for the fourth run and so on
0
 

Author Comment

by:lapucca
ID: 35170106
Is that right?  I would have to run this several times?
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170154
yes, of course. I often run a query like this over and over again to clean the data.
if you're not sure, backup first.

here my sample (i run four times) :
QUERY
 query
RESULT
 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170157
The code in your question will delete "1 row" of duplicate record regardless of how many ID.
from below example
1, John
2, Mary
3, John
4, Danny
5, Mary
6, John

There are 3 John, 2 Mary, 1 Danny  which need to delete 2 John and 1 Mary
you have to run your given code 4 rounds.

Why 4 rounds?  Because you will know it's time to stop when it delete nothing (0 row affected)
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170162
so, if you really like to delete 1 row at a time.

this code will do

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE] WHERE NAME IN (SELECT [NAME] FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1)
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window

0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170169
But if you want a little better of code, here is it

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window


What is the different? this code will delete duplicate of all name, but just 1 per each name.
So, from old example, need to delete 2 John and 1 Marry.  This code will stop at round 3.
(Round to run = Max Duplicate)
0
 

Author Comment

by:lapucca
ID: 35170170
No, I am looking to run the query once and removes all duplicate rows.  Forget the query I found from some site.  Can someone help me for a query that would do this.  Thank you.
0
 
LVL 23

Expert Comment

by:OP_Zaharin
ID: 35170174
hi lapucca. if you can share with us the data structure that you currently have and show the example how it's duplicate, some of us might be able to point you to the right track :)
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170184
see...
1. no effect for the fourth run
2. ID = 7 deleted because double data
    here the explain :
     delete from TABLE1 where ID in
     (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1
      GROUP BY NAME    HAVING COUNT(*)>1))
    a. (SELECT NAME FROM TABLE1 GROUP BY NAME    HAVING COUNT(*)>1) ===> four
    b. (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN ('four') ===> 7
    c. delete from TABLE1 where ID in ('7')
    d. ID = 7 deleted from TABLE1



0
 
LVL 11

Accepted Solution

by:
JoeNuvo earned 250 total points
ID: 35170187
best method, should be my comment ID 35170169
since it will utilized index when perform delete.
It may loop few times, however, you just call it once.

But, if you want it to run without a loop
then my 1st comment (ID 35169750) will do it for you.

just replace [Table], [ID], [Name]  with your actual table/field name

0
 
LVL 3

Assisted Solution

by:ssisworo
ssisworo earned 250 total points
ID: 35170203
here... for only 1 run :

--------------------------

delete from TABLE1 where ID in
(SELECT (ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1)) and
ID not in
(SELECT MIN(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1) GROUP BY NAME)


----------------------------------------------------------------------

QUERY :
 QUERY
RESULT :
 result
0
 

Author Closing Comment

by:lapucca
ID: 35170217
Thanks everyone.  I'm sorry for not being very clear from the beginning.
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170223

yeah.... the query from JoeNuvo is much better :

here, the screen shoot

QUERY :

 QUERY
RESULT :

 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170233
Hi ssisworo,

My query may more simple, but performance wise, maybe your "Assist Solution" will be better.
Since it got more change of utilized index, and also reduce number of concern ID.

but I'm not going to confirm it ^_^
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Delete from table 6 36
SQL syntax error in VBA 11 33
Oracle TEXT search question 9 29
Help with SQL field formatting 3 12
In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now