Solved

hOW TO DELETE DUPLICATE RECORDS?

Posted on 2011-03-18
17
267 Views
Last Modified: 2012-05-11
I have a table with 2 col.  One is the self-inc PK, which I want to keep, 2nd is varchar (string).  What's the query like for removing rows that have duplicate string in  2nd column.  
I saw something like this on some web site but not sure if this is correct
delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)
0
Comment
Question by:lapucca
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 3
  • +2
17 Comments
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35169750
that won't do.

should be something like this


DELETE FROM [TABLE]
WHERE ID NOT IN (
	SELECT MIN(ID) FROM [TABLE]
	GROUP BY [Name]
)

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35169773
@JoeNuvo

I believe you need to include the HAVING clause  = )
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35169774
Actually...  N/M    I must be tired!
0
Containers & Docker to Create a Powerful Team

Containers are an incredibly powerful technology that can provide you and/or your engineering team with huge productivity gains. Using containers, you can deploy, back up, replicate, and move apps and their dependencies quickly and easily.

 
LVL 3

Expert Comment

by:ssisworo
ID: 35169846

delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)

yes... this is work.
but you need repeatly run until... no duplicate
because it just delete the MAX ID where having duplicate NAME
if you have 4 ID with duplicate NAME... you need run this query 3 times,
and no effect for the fourth run and so on
0
 

Author Comment

by:lapucca
ID: 35170106
Is that right?  I would have to run this several times?
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170154
yes, of course. I often run a query like this over and over again to clean the data.
if you're not sure, backup first.

here my sample (i run four times) :
QUERY
 query
RESULT
 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170157
The code in your question will delete "1 row" of duplicate record regardless of how many ID.
from below example
1, John
2, Mary
3, John
4, Danny
5, Mary
6, John

There are 3 John, 2 Mary, 1 Danny  which need to delete 2 John and 1 Mary
you have to run your given code 4 rounds.

Why 4 rounds?  Because you will know it's time to stop when it delete nothing (0 row affected)
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170162
so, if you really like to delete 1 row at a time.

this code will do

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE] WHERE NAME IN (SELECT [NAME] FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1)
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window

0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170169
But if you want a little better of code, here is it

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window


What is the different? this code will delete duplicate of all name, but just 1 per each name.
So, from old example, need to delete 2 John and 1 Marry.  This code will stop at round 3.
(Round to run = Max Duplicate)
0
 

Author Comment

by:lapucca
ID: 35170170
No, I am looking to run the query once and removes all duplicate rows.  Forget the query I found from some site.  Can someone help me for a query that would do this.  Thank you.
0
 
LVL 23

Expert Comment

by:OP_Zaharin
ID: 35170174
hi lapucca. if you can share with us the data structure that you currently have and show the example how it's duplicate, some of us might be able to point you to the right track :)
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170184
see...
1. no effect for the fourth run
2. ID = 7 deleted because double data
    here the explain :
     delete from TABLE1 where ID in
     (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1
      GROUP BY NAME    HAVING COUNT(*)>1))
    a. (SELECT NAME FROM TABLE1 GROUP BY NAME    HAVING COUNT(*)>1) ===> four
    b. (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN ('four') ===> 7
    c. delete from TABLE1 where ID in ('7')
    d. ID = 7 deleted from TABLE1



0
 
LVL 11

Accepted Solution

by:
JoeNuvo earned 250 total points
ID: 35170187
best method, should be my comment ID 35170169
since it will utilized index when perform delete.
It may loop few times, however, you just call it once.

But, if you want it to run without a loop
then my 1st comment (ID 35169750) will do it for you.

just replace [Table], [ID], [Name]  with your actual table/field name

0
 
LVL 3

Assisted Solution

by:ssisworo
ssisworo earned 250 total points
ID: 35170203
here... for only 1 run :

--------------------------

delete from TABLE1 where ID in
(SELECT (ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1)) and
ID not in
(SELECT MIN(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1) GROUP BY NAME)


----------------------------------------------------------------------

QUERY :
 QUERY
RESULT :
 result
0
 

Author Closing Comment

by:lapucca
ID: 35170217
Thanks everyone.  I'm sorry for not being very clear from the beginning.
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170223

yeah.... the query from JoeNuvo is much better :

here, the screen shoot

QUERY :

 QUERY
RESULT :

 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170233
Hi ssisworo,

My query may more simple, but performance wise, maybe your "Assist Solution" will be better.
Since it got more change of utilized index, and also reduce number of concern ID.

but I'm not going to confirm it ^_^
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Data architecture is an important aspect in Software as a Service (SaaS) delivery model. This article is a study on the database of a single-tenant application that could be extended to support multiple tenants. The application is web-based develope…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question