Link to home
Start Free TrialLog in
Avatar of Auerelio Vasquez
Auerelio VasquezFlag for United States of America

asked on

Eliminating Duplicate records

User generated image
I have a table, it has a few columns, the two i'm concerned about at EventId and TopicID

If an event has the same topic more than once, I want to delete it based on the event ID.....

The event can have more than one topic, it may have topic 1,2,3....N

but it' can't have that topic more than once.
SOLUTION
Avatar of Mike Eghtebas
Mike Eghtebas
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of arnold
you can use partition on and add a column/count/index that you can use in a delete rule

select eventid, topicid , row_number() over (order by eventid desc, topicid desc) as Occurance
from tablename group by eventid,topicid

The above will convert a two column table to a three column table
such that set with duplicates will have a row_number larger than 1.

delete from the above results based on
Your table design is an issue and the way you are currently setup.
creating a unique index that eventid+topicid has to be unique.

If you want your topic ID to be a MasK that is easily discernible what is included is to use is as a sum of base2 references
1 - topic
2 - topic1
4 - topic2
8- topic3
.
.

128 - topic7  

Your business process and your data entry is the one that should enforce the policy.

in your sitaution where
The other option is to coalesce (inputs into topicid) but that would require processing to make sure to extract the data back out .
;with cte As (
Select EventId, TopicID
,ROW_NUMBER() Over(Partition by TopicID  Order By EventId ) As rn
From @temp)
Select * from cte
where rn=1;
EventId  TopicID
561	       1	
561	       2	
563	       3	
563	       4	
565	       5	
565	       6	
565	       7	
565	       8	
565	       9	
565	       10	
565	       11	
565	       12	
568	       16	--<-- duplicate TopicID has been filtered out.

Open in new window



After you verify the outcome, the solution could be easily changed to delete query. Do you want them to be filtered out or deleted. I will ready the original question again to make sure. brb
I knew forgot to post the link to row_number usage: possibly as well as partition directive...
https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql

The other is you would a delete from based on the row_number() column's value being greater than 1 gerater or equal to 2.
Yes, to delete run:
;with cte As (
Select EventId, TopicID
,ROW_NUMBER() Over(Partition by TopicID  Order By EventId ) As rn 
From @temp)
delete from cte
where rn>1;

Open in new window


This will keep the earliest Eventid but delete othere. If you want to keep the latest Eventid, then change
Order By EventId)  to Order By EventId desc)
Avatar of Auerelio Vasquez

ASKER

Mike Eghtebas

can this be done, where I may not know the EventID ??? I want to do this for the entire table, not just eventid 565
The solution is not specific to EventID 565. This is why I added some other EventIDs so I could verify.
It seems the delete statement deleted all the rows
Topic ID, can repeat, and eventID can have more than one topic, it just can't have the same topic twice

like 565 can have topic 1,7,9, 11 but it can't have any of those more than once.
I used this:

;with cte As (
Select EventId, TopicID
,ROW_NUMBER() Over(Partition by TopicID  Order By EventId ) As rn
From eventTopics)
delete from cte
where rn>1;

where my table name is EventTopics it deleted almost every row in the table because RN was always> 1
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Awesome Solution
OH no, I meant to assing to multiple comments