Solved

delete duplicates of 3 column combination

Posted on 2012-04-06
25
530 Views
Last Modified: 2012-08-14
select COUNT(*),COL1,COL2,COL3 from tblOCDELStudent_1011 group by COL1,COL2,COL3 having COUNT(*) >1

gives records that have the combination of COL1,COL2,COL3 more than once..

how can we eliminate duplicates that all of them will have exactly one count only.. that is, the above query should give no results at all, and so we can delete ..

if the count is 3 it should have 2 deletes.. and if 4, then 3 and so forth..
0
Comment
Question by:25112
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
  • 4
  • +3
25 Comments
 
LVL 32

Expert Comment

by:awking00
ID: 37816195
Can you provide a description of the tblOCDELStudent_1011 table?
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 37816196
Is there an identity column or other unique value on each row?

If not, you might have to use a cursor with TOP.
0
 
LVL 13

Expert Comment

by:Ashok
ID: 37816198
Try

delete t
from tblOCDELStudent_1011 t
where  pk = (select top 1 pk, COUNT(*),COL1,COL2,COL3 from tblOCDELStudent_1011 group by COL1,COL2,COL3 having COUNT(*) >1);
0
Transaction Monitoring Vs. Real User Monitoring

Synthetic Transaction Monitoring Vs. Real User Monitoring: When To Use Each Approach? In this article, we will discuss two major monitoring approaches: Synthetic Transaction and Real User Monitoring.

 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 112 total points
ID: 37816200
If there IS an identify / unique value:

delete
from tblOCDELStudent_1011
from tblOCDELStudent_1011 oc
inner join  (
    select COUNT(*),COL1,COL2,COL3, MIN(uniqueCol) AS uniqueCol
    from tblOCDELStudent_1011
    group by COL1,COL2,COL3
    having COUNT(*) >1
) AS dups ON
    dups.col1 = oc.col1 AND
    dups.col2 = oc.col2 AND
    dups.col3 = oc.col3 AND
    dups.uniqueCol < oc.uniqueCol
0
 
LVL 25

Expert Comment

by:jogos
ID: 37816218
If you have a primary key PK_ID then it could be something like this
--delete tblOCDELStudent_1011 
select *  -- first test it as select, then change it to delete
from  tblOCDELStudent_1011  as t1
inner join (
select COL1,COL2,COL3,min(PK_ID)  as MIN_PK_ID
from tblOCDELStudent_1011 
group by COL1,COL2,COL3 having COUNT(*) >1
) as t2 on t1.col1 = t2.col1
 t1.col2 = t2.col2
 t1.col3 = t2.col3
 t1.PK_ID <> t2.MIN_PK_ID

Open in new window

0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 112 total points
ID: 37816233
If there is NOT an identify / unique value:

declare csrDups cursor fast_forward for
select COUNT(*),COL1,COL2,COL3
from tblOCDELStudent_1011
group by COL1,COL2,COL3
having COUNT(*) >1

open crsDups

while 1 = 1
begin
    fetch next from csrDups INTO @count, @col1, @col2, @col3
    delete top (@count - 1)
    from tblOCDELStudent_1011
    where col1 = @col1 and col2 = @col2 and col3 = @col3
end --while

deallocate csrDups
0
 
LVL 25

Assisted Solution

by:jogos
jogos earned 56 total points
ID: 37816254
obvious some AND's missing
--delete tblOCDELStudent_1011 
select *  -- first test it as select, then change it to delete
from  tblOCDELStudent_1011  as t1
inner join (
select COL1,COL2,COL3,min(PK_ID)  as MIN_PK_ID
from tblOCDELStudent_1011 
group by COL1,COL2,COL3 having COUNT(*) >1
) as t2 on t1.col1 = t2.col1
AND  t1.col2 = t2.col2
AND  t1.col3 = t2.col3
AND  t1.PK_ID <> t2.MIN_PK_ID

Open in new window

0
 
LVL 32

Assisted Solution

by:awking00
awking00 earned 111 total points
ID: 37816262
If there is no identifying field (i.e. primary key), you can also accomplish what you need by the following:

create table tmp as
select <column_list> from
(select <column_list>, row_number() over (partition by col1, col2, col3 order by 1) rn
 from tblOCDELStudent_1011)  as x
where x.rn = 1;

truncate table tblOCDELStudent_1011

insert into tblOCDELStudent_1011 select * from tmp

drop table tmp
0
 
LVL 13

Assisted Solution

by:Ashok
Ashok earned 166 total points
ID: 37816348
Here is tested solution in SQL Server 2008.....

DECLARE @Totals As Int;
set @Totals  =
(
select Top 1 Dups.cnt As Cnt from
DailyLunch a
inner join
(
select Count(*) cnt, Menu_Date from DailyLunch
group by Menu_Date
having COUNT(*) >1
) As Dups ON
dups.Menu_Date = a.Menu_Date
)
DECLARE @Dt AS DateTime;
set @Dt =
(
select Top 1 Dups.Menu_Date As MnuDt from
DailyLunch a
inner join
(
select Count(*) cnt, Menu_Date from DailyLunch
group by Menu_Date
having COUNT(*) >1
) As Dups ON
dups.Menu_Date = a.Menu_Date
)
Delete Top (@Totals-1) from DailyLunch
where Menu_Date = @Dt;

Note: By the way, I have only 1 column in Group by, but just change it to include 3 columns there.

HTH
Ashok
0
 
LVL 5

Author Comment

by:25112
ID: 37816441
let me try your solutions..

the pk is the 3 columns ... col1/col2/col3 combination.. (of course, PK is not enforced yet.. that is the goal of this project )
0
 
LVL 5

Author Comment

by:25112
ID: 37816444
the table structure is

COL1 varchar      12
COL2 datetime      
COL3 varchar      4
COL4 numeric      9
COL5 char      1
COL6 varchar      4
COL7 varchar      4
COL8 varchar      4
COL9 char      1
COL10 varchar      30
COL11 char      1
0
 
LVL 5

Author Comment

by:25112
ID: 37816483
awking00, your syntax gives this issue..
Windowed functions do not support integer indices as ORDER BY clause expressions.


scott, i am getting stuck in declaring your variables right.. i am working on it..

ashok- your last query - is that with PK or without..?
0
 
LVL 32

Assisted Solution

by:awking00
awking00 earned 111 total points
ID: 37816496
Change order by 1 to order by col1, col2, col3
0
 
LVL 13

Assisted Solution

by:Ashok
Ashok earned 166 total points
ID: 37816553
ashok- your last query - is that with PK or without..?

without PK

when you have
Group by Col1, Col2, Col3

it will find duplicates based on these 3 columns.

By the way, my solution works, but it will NOT delete all duplicates in one SHOT.  Still learning everyday.

To use my solution, you may have to write Stored Procedure and use Cursor (loop), etc.

Ashok
0
 
LVL 13

Assisted Solution

by:Ashok
Ashok earned 166 total points
ID: 37816574
Here is my revised version (which has combined two SELECTs in one!)

DECLARE @Totals As Int;
DECLARE @Dt AS DateTime;
WITH x AS
(
select Top 1 Dups.cnt As Cnt, Dups.Menu_Date As MnuDt from
DailyLunch a
inner join
(
select Count(*) cnt, Menu_Date from DailyLunch
group by Menu_Date
having COUNT(*) > 1
) As Dups ON
dups.Menu_Date = a.Menu_Date
)
SELECT
    @Totals = Cnt,
    @Dt = MnuDt
FROM x
-- select @Totals, @Dt      << Uncomment only for Testing purpose

Delete Top (@Totals-1) from DailyLunch
    where Menu_Date = @Dt;

By the way, my solution works, but it will NOT delete all duplicates in one SHOT.  Still learning everyday.
To use my solution, you may have to write Stored Procedure and use Cursor (loop), etc.

HTH
Ashok
0
 
LVL 15

Expert Comment

by:Deepak Chauhan
ID: 37816831
I think it will return the expected result ,,,,, you can analysis data by this.

select col1, col2, col3 from (
select row_number() over( partition by col1 order by col1
) as row,col1, col2, col3
from table_A) a --where row >1

and after that  if you are satisfied then....

delete from (
select row_number() over( partition by col1 order by col1
) as row,col1, col2, col3
from table_A) a where row >1
0
 
LVL 13

Expert Comment

by:Ashok
ID: 37816896
Deepak,

select col1, col2, col3 from (
select row_number() over( partition by col1 order by col1
) as row, col1, col2, col3
from table_A) a where row > 1;

and after that  if you are satisfied then....

delete a from (
select row_number() over( partition by col1 order by col1
) as row, col1, col2, col3
from table_A) a where row > 1;

You are missing "a" near the delete as shown above.

HTH
Ashok
0
 
LVL 15

Assisted Solution

by:Deepak Chauhan
Deepak Chauhan earned 55 total points
ID: 37817049
Ashok

As according you above.
You are missing "a" near the delete as shown above.

But my dear adding a column name or  with Delete Keyword can cause an error

Msg 208, Level 16, State 1, Line 1
Invalid object name  

i think delete syntax  is "(DELETE from tablename where value =,<,>,in,between) like this.
0
 
LVL 32

Expert Comment

by:awking00
ID: 37817100
Did you try my method with the change in the order by clause?
0
 
LVL 5

Author Comment

by:25112
ID: 37817245
ashok/awking00

your syntax worked.. many thanks.. i tweaked scott's also to work.. (I did it in transaction and rolled back to test)
0
 
LVL 5

Author Comment

by:25112
ID: 37817250
thanks for the idea Deepak
0
 
LVL 13

Expert Comment

by:Ashok
ID: 37817452
Deepak,

delete from (
select row_number() over( partition by col1 order by col1
) as row,col1, col2, col3
from table_A) a where row >1;

Above does NOT work.
Do not say "I think....".

I tested it in SQL 2008 Management Studio.
After adding "b", it worked.

If you test it, you will know.

Thanks,
Ashok
0
 
LVL 13

Expert Comment

by:Ashok
ID: 37817479
By the way if you also add "a" to select like this....

select a.Menu_Date from
(
select row_number() over(partition by Menu_Date order by Menu_Date
) as row, Menu_Date
from DailyLunch) a where row > 1;

It works.

Ashok
0
 
LVL 15

Expert Comment

by:Deepak Chauhan
ID: 37817663
Ashok

you r right

it is working with 'a'

i apologize
sorry for being mistake
0
 
LVL 13

Expert Comment

by:Ashok
ID: 37818003
Deepak,

It's ok.  We all make mistakes.
Sometimes it is better to test before making any judgements.

Thanks,
Ashok
0

Featured Post

The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on. …
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question