Solved

hOW TO DELETE DUPLICATE RECORDS?

Posted on 2011-03-18
17
265 Views
Last Modified: 2012-05-11
I have a table with 2 col.  One is the self-inc PK, which I want to keep, 2nd is varchar (string).  What's the query like for removing rows that have duplicate string in  2nd column.  
I saw something like this on some web site but not sure if this is correct
delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)
0
Comment
Question by:lapucca
  • 6
  • 5
  • 3
  • +2
17 Comments
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35169750
that won't do.

should be something like this


DELETE FROM [TABLE]
WHERE ID NOT IN (
	SELECT MIN(ID) FROM [TABLE]
	GROUP BY [Name]
)

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35169773
@JoeNuvo

I believe you need to include the HAVING clause  = )
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35169774
Actually...  N/M    I must be tired!
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 3

Expert Comment

by:ssisworo
ID: 35169846

delete from Table
where ID in
(SELECT MAX(ID) FROM TABLE WHERE NAME IN (SELECT NAME FROM TABLE
GROUP BY NAME
HAVING COUNT(*)>1)
)

yes... this is work.
but you need repeatly run until... no duplicate
because it just delete the MAX ID where having duplicate NAME
if you have 4 ID with duplicate NAME... you need run this query 3 times,
and no effect for the fourth run and so on
0
 

Author Comment

by:lapucca
ID: 35170106
Is that right?  I would have to run this several times?
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170154
yes, of course. I often run a query like this over and over again to clean the data.
if you're not sure, backup first.

here my sample (i run four times) :
QUERY
 query
RESULT
 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170157
The code in your question will delete "1 row" of duplicate record regardless of how many ID.
from below example
1, John
2, Mary
3, John
4, Danny
5, Mary
6, John

There are 3 John, 2 Mary, 1 Danny  which need to delete 2 John and 1 Mary
you have to run your given code 4 rounds.

Why 4 rounds?  Because you will know it's time to stop when it delete nothing (0 row affected)
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170162
so, if you really like to delete 1 row at a time.

this code will do

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE] WHERE NAME IN (SELECT [NAME] FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1)
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window

0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170169
But if you want a little better of code, here is it

Declare @loop int
SET @loop = 0

WHILE @loop = 0
BEGIN
	delete from [Table]
	where ID in
	(SELECT MAX(ID) FROM [TABLE]
	GROUP BY [NAME]
	HAVING COUNT(*)>1
	)

	IF @@ROWCOUNT = 0 SET @loop = 1
END

Open in new window


What is the different? this code will delete duplicate of all name, but just 1 per each name.
So, from old example, need to delete 2 John and 1 Marry.  This code will stop at round 3.
(Round to run = Max Duplicate)
0
 

Author Comment

by:lapucca
ID: 35170170
No, I am looking to run the query once and removes all duplicate rows.  Forget the query I found from some site.  Can someone help me for a query that would do this.  Thank you.
0
 
LVL 23

Expert Comment

by:OP_Zaharin
ID: 35170174
hi lapucca. if you can share with us the data structure that you currently have and show the example how it's duplicate, some of us might be able to point you to the right track :)
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170184
see...
1. no effect for the fourth run
2. ID = 7 deleted because double data
    here the explain :
     delete from TABLE1 where ID in
     (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1
      GROUP BY NAME    HAVING COUNT(*)>1))
    a. (SELECT NAME FROM TABLE1 GROUP BY NAME    HAVING COUNT(*)>1) ===> four
    b. (SELECT MAX(ID) FROM TABLE1 WHERE NAME IN ('four') ===> 7
    c. delete from TABLE1 where ID in ('7')
    d. ID = 7 deleted from TABLE1



0
 
LVL 11

Accepted Solution

by:
JoeNuvo earned 250 total points
ID: 35170187
best method, should be my comment ID 35170169
since it will utilized index when perform delete.
It may loop few times, however, you just call it once.

But, if you want it to run without a loop
then my 1st comment (ID 35169750) will do it for you.

just replace [Table], [ID], [Name]  with your actual table/field name

0
 
LVL 3

Assisted Solution

by:ssisworo
ssisworo earned 250 total points
ID: 35170203
here... for only 1 run :

--------------------------

delete from TABLE1 where ID in
(SELECT (ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1)) and
ID not in
(SELECT MIN(ID) FROM TABLE1 WHERE NAME IN (SELECT NAME FROM TABLE1 GROUP BY NAME HAVING COUNT(*)>1) GROUP BY NAME)


----------------------------------------------------------------------

QUERY :
 QUERY
RESULT :
 result
0
 

Author Closing Comment

by:lapucca
ID: 35170217
Thanks everyone.  I'm sorry for not being very clear from the beginning.
0
 
LVL 3

Expert Comment

by:ssisworo
ID: 35170223

yeah.... the query from JoeNuvo is much better :

here, the screen shoot

QUERY :

 QUERY
RESULT :

 RESULT
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 35170233
Hi ssisworo,

My query may more simple, but performance wise, maybe your "Assist Solution" will be better.
Since it got more change of utilized index, and also reduce number of concern ID.

but I'm not going to confirm it ^_^
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
New and Previous Values in a Query 7 30
SQL: launch actions one before the other 10 24
Find results from sql within a time span 11 46
Trouble with <> 2 20
Data architecture is an important aspect in Software as a Service (SaaS) delivery model. This article is a study on the database of a single-tenant application that could be extended to support multiple tenants. The application is web-based develope…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question