Link to home
Start Free TrialLog in
Avatar of daffeyduck
daffeyduck

asked on

Eliminate duplicate records? Identity column used.

I have a large database that was setup with a record_id that was suppose to eliminate duplicates. However, when they received new updates someone didn't do it correctly and now there are duplicates with the same name and address, etc fields and different record_id's. How would be the easiest way to remove the full duplicate records keeping the higher record_id record from the duplicates?

This was what someone had said to do, was create a new database with the below query and then delete the old database and rename the new one and recreate the index's.

INSERT INTO Sample_BackUp
SELECT *
FROM Sample

WHERE record_id NOT IN ( SELECT MAX( record_id )
                   FROM [Sample]
                  GROUP BY [Sample].name,[Sample].address )

The other issue is the record_id column is an Identity column with Identity Seed 1, and Identity Increment 1. If I delete the  duplicate records will all the records change there record_id numbers. All of the records that I keep in the new dedupped database need to retain there original record_id number.

Please any assistance would be greatly appreciated. Looking for the quickest and easiest way.

Thank you.....
daffeyduck
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of daffeyduck
daffeyduck

ASKER

Wouldn't this example delete all records except for the duplicates with the highest record_id?
The simplest way to eliminate the duplicate records is to SELECT DISTINCT into a temporary table, truncate the original table and SELECT the records back into the original table. That query looks like this:

SELECT DISTINCT * INTO Sample_Backup
FROM Sample

TRUNCATE TABLE Sample

INSERT SAMPLE
SELECT * FROM Sample_Backup

DROP TABLE Sample_Backup


Hi,

d3lph1:
As I understand the question, this wont do what is wanted as the duplicate records have different ID's.

My guess is that you need to delete the duplicate iwth the HIGHEST ID as this is the last record inserted, while the other record may have linked records. It is unlikely that the last copy of the duplicate will have many linked records.

Steps
1. Identify duplicated rows

2. Search on the duplicate to see if linked rows exist.

3. If linked rows exist for more than one duplicate, then update the links to point to only one of the duplicates

4. Delete the other row.

5. Check for orphans.

I'd try this on a backup copy of your database to check and tune the code before doing it live.

Can you post the database schema so I can have a crack at writing the SQL.

Regards
  David




Hi David,

Thank you so very much for offering your assistance, I was able to eliminate the dups, last night, late.

Thanks again David......
daffeyduck
Thank you very much, it worked great....

I really appreciate your assistance...
daffeyduck