ccleebelt
asked on
Find and Delete Duplicate Records
I have a table called group_members:
group_membership_id int PK
contact_id int
group_id int
I have a problem where there has been duplication in the data where I have the following
group_membership_id contact_id group_id
1 100 1000
2 100 1001
3 101 1000
4 101 1001
5 100 1000
6 101 1000
Notice how rows 5 and 6 are dupes of 1 and 2.
I need to write a query to find all of these dupe and delete them.
group_membership_id int PK
contact_id int
group_id int
I have a problem where there has been duplication in the data where I have the following
group_membership_id contact_id group_id
1 100 1000
2 100 1001
3 101 1000
4 101 1001
5 100 1000
6 101 1000
Notice how rows 5 and 6 are dupes of 1 and 2.
I need to write a query to find all of these dupe and delete them.
ASKER
line 1 and 2 have the same contact_id, group_id pairs as 5 and 6.
Ideally, we find a way to delete 5 and 6 based on them being the same as 1 and 2.
Ideally, we find a way to delete 5 and 6 based on them being the same as 1 and 2.
Ahhh... 1&3 not 1&2
So is it correct to say that the combination of the two columns should be unique? In other words, there will never be more than one entry with contact_id=100 AND group_id=1000 ?
How are you connecting to the SQL Server? Do you have MS Access available?
How many columns doe this table have? Just the three?
How many rows does the table have?
Backup your database before you do anything...
So is it correct to say that the combination of the two columns should be unique? In other words, there will never be more than one entry with contact_id=100 AND group_id=1000 ?
How are you connecting to the SQL Server? Do you have MS Access available?
How many columns doe this table have? Just the three?
How many rows does the table have?
Backup your database before you do anything...
ASKER
Oops. Yeah, nice catch.
There should never be more than one row with any contact_id, group_id pair.
I connect through Management Studio
These are the only columns
About 300000 rows in the table. Of which about 20% are dupes a such as 5 and 6 above.
There should never be more than one row with any contact_id, group_id pair.
I connect through Management Studio
These are the only columns
About 300000 rows in the table. Of which about 20% are dupes a such as 5 and 6 above.
Try this:
This will create a new table <new_table> (call it whatever you want) which will contain only the first occurrence of the duplicate entries. You can review this for correctness. If this looks good, rename the existing table (do not delete it yet) and then rename the new table to the original table name.
Be aware that the column group_membership_id may end up wrong for some records because you cannot know in what order the duplicates fall. If you are lucky the duplicates were created all at the end of the table, in which case you will be all good. You may have to doctor some of this column manually...
SELECT DISTINCT 'group_membership_id', 'contact_id', 'group_id' INTO <new_table> FROM <your_table>
This will create a new table <new_table> (call it whatever you want) which will contain only the first occurrence of the duplicate entries. You can review this for correctness. If this looks good, rename the existing table (do not delete it yet) and then rename the new table to the original table name.
Be aware that the column group_membership_id may end up wrong for some records because you cannot know in what order the duplicates fall. If you are lucky the duplicates were created all at the end of the table, in which case you will be all good. You may have to doctor some of this column manually...
You can also use this shorter version since there are only the three columns:
SELECT DISTINCT * INTO <new_table> FROM <your_table>
ASKER
I ran this and it creates a exact replica of the original table with the exact same duplicates.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
worked. Thanks!
Shazaaaam!
ASKER
This is interesting though....I need to get the group_membership_id column on the new table and then make it the PK.
SQL will not allow me to do that.
Any pointers?
SQL will not allow me to do that.
Any pointers?
You may have to remove the PK from the temp table (original that was renamed) first since the key cannot be named the same.
With this little bit of detail, you are going to have a difficult time finding unique ways to express what a duplicate is.