SQL - remove duplicates where not in another table - SQL Server 2005

Hello experts,

I have the below table structure, with duplicates:

table: person
pk_record_id  pat_id                 last_name          first_name
newid()          B0024456             Smith              Rachael
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John
newid()          B0024458             Carlson            John
newid()          B0024457             Joneh              Mike

and I would like to end up with:

table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

However, I want to make sure and not remove items from another table where the pk_record_id exists there:

table: other
pk_record_id  person_id            last_name          first_name
newid()          B0024456             Smith              Rachael

Ultimately giving us:
table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

something like:


DELETE t
FROM   person t
WHERE  NOT EXISTS( select null from person x
where x.person_id = t.person_id
and t.person_id not in (select person_id from other) )

Thoughts?

Thanks!
robthomas09Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

johanntagleCommented:
and I would like to end up with:
table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

<snip>

Ultimately giving us:
table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

The above tables are identical, so I don't really know how the entries in the other table come to play.  The newid() means you'll be assigning new ids to the cleaned-up rows?  Then how about:

create table person_temp
as select distinct pat_id, last_name, first_name from person;

truncate table person;

insert into person select newid(), pat_id, last_name, first_name from person_temp;

Open in new window


Note: I don't use MS-SQL, so adjust syntax as needed.  Also, maybe person_temp can just be a temporary (in memory) table, if it's not too big.


0
johanntagleCommented:
Oh I get it now.  Not having actual values for pk_record_id in your example got me confused.  The ff should do it:

delete from person where pk_record_id in 
(
  select p.pk_record_id 
  from person p inner join
    ( /* get those with with duplicates */
     select pat_id, last_name, first_name from
       (
       select pat_id, last_name, first_name, count(*) as record_count
        from person p
       group by pat_id, last_name, first_name
       having count(*)>1
       )
    ) dup
      on (p.pat_id=dup.pat_id and p.last_name=dup.last_name and p.first_name=dup.first_name)
    /* exclude those part of other table */
  left join other o on (p.pk_record_id = o.pk_record_id)
  where o.pk_record_id is null;
)   

Open in new window


   
Note: untested.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
SharathData EngineerCommented:
Try CTE to delete the dupes.
;WITH CTE 
     AS (SELECT *, 
                ROW_NUMBER() 
                  OVER(PARTITION BY pat_id ORDER BY last_name) rn 
           FROM person) 
DELETE FROM CTE 
      WHERE rn > 1

Open in new window

0
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

Alpesh PatelAssistant ConsultantCommented:
Select * from other where id in (Select ID from person where pk_record_id > 'ID' and last_name='Smith'          and first_name ='Rachael')
Select ID from person where pk_record_id > 'ID' and last_name='Smith'          and first_name ='Rachael'
0
ZtinelCommented:
;WITH CTE
     AS (SELECT *,
                ROW_NUMBER()
                  OVER(PARTITION BY pat_id ORDER BY pk_record_id) rn
           FROM person)
DELETE * FROM CTE
WHERE RN > 1
AND pk_record_id NOT IN (SELECT pk_record_id FROM other)
0
robthomas09Author Commented:
Thanks!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.