SQL - remove duplicates where not in another table - SQL Server 2005

Hello experts,

I have the below table structure, with duplicates:

table: person
pk_record_id  pat_id                 last_name          first_name
newid()          B0024456             Smith              Rachael
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John
newid()          B0024458             Carlson            John
newid()          B0024457             Joneh              Mike

and I would like to end up with:

table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

However, I want to make sure and not remove items from another table where the pk_record_id exists there:

table: other
pk_record_id  person_id            last_name          first_name
newid()          B0024456             Smith              Rachael

Ultimately giving us:
table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

something like:


DELETE t
FROM   person t
WHERE  NOT EXISTS( select null from person x
where x.person_id = t.person_id
and t.person_id not in (select person_id from other) )

Thoughts?

Thanks!
robthomas09Asked:
Who is Participating?
 
johanntagleCommented:
Oh I get it now.  Not having actual values for pk_record_id in your example got me confused.  The ff should do it:

delete from person where pk_record_id in 
(
  select p.pk_record_id 
  from person p inner join
    ( /* get those with with duplicates */
     select pat_id, last_name, first_name from
       (
       select pat_id, last_name, first_name, count(*) as record_count
        from person p
       group by pat_id, last_name, first_name
       having count(*)>1
       )
    ) dup
      on (p.pat_id=dup.pat_id and p.last_name=dup.last_name and p.first_name=dup.first_name)
    /* exclude those part of other table */
  left join other o on (p.pk_record_id = o.pk_record_id)
  where o.pk_record_id is null;
)   

Open in new window


   
Note: untested.
0
 
johanntagleCommented:
and I would like to end up with:
table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

<snip>

Ultimately giving us:
table: person
pk_record_id  pat_id                   last_name       first_name
newid()          B0024456             Smith              Rachael
newid()          B0024457             Joneh              Mike
newid()          B0024458             Carlson            John

The above tables are identical, so I don't really know how the entries in the other table come to play.  The newid() means you'll be assigning new ids to the cleaned-up rows?  Then how about:

create table person_temp
as select distinct pat_id, last_name, first_name from person;

truncate table person;

insert into person select newid(), pat_id, last_name, first_name from person_temp;

Open in new window


Note: I don't use MS-SQL, so adjust syntax as needed.  Also, maybe person_temp can just be a temporary (in memory) table, if it's not too big.


0
 
SharathData EngineerCommented:
Try CTE to delete the dupes.
;WITH CTE 
     AS (SELECT *, 
                ROW_NUMBER() 
                  OVER(PARTITION BY pat_id ORDER BY last_name) rn 
           FROM person) 
DELETE FROM CTE 
      WHERE rn > 1

Open in new window

0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

 
Alpesh PatelAssistant ConsultantCommented:
Select * from other where id in (Select ID from person where pk_record_id > 'ID' and last_name='Smith'          and first_name ='Rachael')
Select ID from person where pk_record_id > 'ID' and last_name='Smith'          and first_name ='Rachael'
0
 
ZtinelCommented:
;WITH CTE
     AS (SELECT *,
                ROW_NUMBER()
                  OVER(PARTITION BY pat_id ORDER BY pk_record_id) rn
           FROM person)
DELETE * FROM CTE
WHERE RN > 1
AND pk_record_id NOT IN (SELECT pk_record_id FROM other)
0
 
robthomas09Author Commented:
Thanks!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.