SQL Delete Duplicate Data

In two tables in my database there are duplicate data.  I found the root cause of the problem and working on fixing what is causing the issue.  I need to delete the duplicate data.  There are three tables that join together.

Person, Employee, Login

Per.perid
Emp.perid
Login.perid

What I need to do is delete the duplicate data.  

First, if the person has a login I want to keep that record and delete the duplicate.
Second, if the person does not have a login I just want to delete one of the records.

The duplicates exist in the Person and Employee table.  

So for person table
Name            perid
John smith   101
John smith   102

employee table
empid        perid
001             101
002             102
LVL 2
CipherISAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
Not really following your example here as you'll need to provide more details on what is a 'duplicate', but let me know if my aricle on SQL Server Delete Duplicate Rows Solutions helps.
0
CipherISAuthor Commented:
The article doesn't help that works for one table.

I have two tables that I need to delete duplicates which are joined together by a key.

Person.perid -> employee.perid

There are duplicate records in the person table and duplicate records in the employee table.

So - John Smith will be in the person table twice and each person record will have its own employee record.
0
awking00Commented:
Could you provide some more sample data for all three tables that include the various possibilities (non-duplicates, duplicates in either or both tables, with and without a login, etc) and what you wish to see after the delete query completes?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

CipherISAuthor Commented:
I've attached a snapshot of sample data.  This record does not have a login.
0
CipherISAuthor Commented:
duplicate records
0
Vikas GargBusiness Intelligence DeveloperCommented:
Hello,

You can try this

declare @tbl table (per_id int)

insert into @tbl
select per_id from
(
SELECT per_id,ROW_NUMBER()OVER(PARTITION BY PER_FIRST,PER_LAST ORDER BY PER_ID)RN FROM PERSON
) tbl
where rn > 1

delete from employee where per_id in (select per_id from @tbl)

delete from PERSON where per_id in (select per_id from @tbl)

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
arnoldCommented:
Does your employee table to your person have a foreign constraint/trigger dealing with perid


Select name, max(perid) from person_table where count(perid) >1 group by name
Each time your run this query if there are duplicates, a row will be returned for the person.
If that is valid,

Creating on delete trigger in person table that will cascade to the employee table will achieve what you want meaning when an entry is deleted from person table, it will delete the perid matching row from the employee

Delete from person_table where perid in (Select max(perid) from person_table where count(perid) >1 group by name)

The other option is to create a temp table that will have the opposite of the above using the min(perid) to store
Select into #temptable (select name,min(perid) from person_table group by name)

The only concern I have is that if a person does not have a duplicate entry, it will not have a record in the new table to use as select * from person_table where perid not in (select perid from #temptable)
See what it returns, the same query can be done on the employee_table.
Keeping the min might be maintain integrity in the employee table, but expediency, flipping the option to max, will create the temp with all persons, then the employee table might have missing rows.

Setting a unique index on perid in the employee table and then using the
Insert into employee_table (perid) values select perid from person_table
Duplicates will be prevented from being inserted with a duplicate error.
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
actually, the article mentioned should help, but adding to that you could combine it with the OUTPUT clause:
https://msdn.microsoft.com/en-us/library/ms177564.aspx
to log the deleted records into a table, and delete in the other table (as needed) from there...
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.