db_hopewell
asked on
SQL Query find and replace duplicates with a "D" added to primary key
I wanted to know a way to find duplicates in a query and replace them with a case statement that adds a "D" or anything to note that there are 2 instances of the primary key.
Basically I want to show all records from table CTE, but I want it to replace any records that have 2 instances of the same name and birth date with a "D" in the contactID field. Where would I add a case when statement?
;with cte as (
select C.Contact_ID, C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
O.Group_Name, /* O.other_fields... , */
P.End_Date
from pm.Contacts C
inner join pm.Patients P
on P.Patient_ID = C.Contact_ID
inner join pm.Policies O
on O.Contact_ID = C.Contact_ID
where P.End_Date is null
)
select *
from cte T1
where exists (
-- check for duplicates i.e. a record with
-- different ID and same name and birth date
select *
from cte T2
where T2.First_Name = T1.First_Name
and T2.Last_Name = T1.Last_Name
and T2.Date_Of_Birth = T1.Date_Of_Birth
and T2.Contact_ID != T1.Contact_ID
)
Basically I want to show all records from table CTE, but I want it to replace any records that have 2 instances of the same name and birth date with a "D" in the contactID field. Where would I add a case when statement?
;with cte as (
select C.Contact_ID, C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
O.Group_Name, /* O.other_fields... , */
P.End_Date
from pm.Contacts C
inner join pm.Patients P
on P.Patient_ID = C.Contact_ID
inner join pm.Policies O
on O.Contact_ID = C.Contact_ID
where P.End_Date is null
)
select *
from cte T1
where exists (
-- check for duplicates i.e. a record with
-- different ID and same name and birth date
select *
from cte T2
where T2.First_Name = T1.First_Name
and T2.Last_Name = T1.Last_Name
and T2.Date_Of_Birth = T1.Date_Of_Birth
and T2.Contact_ID != T1.Contact_ID
)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
that there are 2 instances of the primary key.This is not possible. Or it is not a primary key..
ASKER
ok let me rephrase. I wrote a complex query and want to find the duplicates and mark them and add a "D" on the end in the ID field. Is there a different way than John_Vidmar's row number and partitioning within sql above? Something like a case statement? The system that I am sending the file to, would create 2 records of the same person with different ID's. One with the 'id' and one with the 'id' + 'D'
row_number is probably how I would check. It is a common function in many database products these days.
You can easily take the row_number output and add it to a case statement to add the 'D'. You should then be able to make that an update statement.
Knowing the database you are targeting will help us provide working examples.
You can easily take the row_number output and add it to a case statement to add the 'D'. You should then be able to make that an update statement.
Knowing the database you are targeting will help us provide working examples.
ASKER
Microsoft SQL Server and I am going to run this in integration services to produce an output txt file for another proprietary system to read in nightly.
Why not using DISTINCT to eliminate them?
If you don't have row_number function then you could use a correlated-subquery:
with cte as
( SELECT C.Contact_ID
, /* other fields you need */
, rn = ( SELECT count(*)
FROM Contacts
WHERE First_Name = C.First_Name
AND Last_Name = C.Last_Name
AND Date_Of_Birth = C.Date_Of_Birth
AND Contact_ID < = C.Contact_ID
)
FROM pm.Contacts C
JOIN pm.Patients P on P.Patient_ID = C.Contact_ID
JOIN pm.Policies O on O.Contact_ID = C.Contact_ID
WHERE P.End_Date IS NULL
)
SELECT Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
, /* other fields you need */
FROM cte
ASKER
Thanks for being patient with me. That will work for now.
Assuming that the Contact_ID is the primary key (and unique) and that you want to concatenate a 'D' at the end of it if there are matching names and birthdates in other records, the following should work:
with cte as
(select first_name fname, last_name lname, date_of_birth dob, count(*) cnt
from pm.contacts
group by first_name, last_name, date_of_birth)
select case when cte.cnt = 1 then C.Contact_ID
else c.Contact_ID||'D'
end as Contact_ID,
C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
O.Group_Name, /* O.other_fields... , */
P.End_Date
from pm.Contacts C
inner join pm.Patients P
on P.Patient_ID = C.Contact_ID
inner join pm.Policies O
on O.Contact_ID = C.Contact_ID
inner join cte on
c.first_name = cte.fname and
c.last_name = cte.lname and
c.date_of_birth = cte.dob
where P.End_Date is null
The above is for Oracle but changing the concatenation operator from "||" to "+" should work for SQL Server.
with cte as
(select first_name fname, last_name lname, date_of_birth dob, count(*) cnt
from pm.contacts
group by first_name, last_name, date_of_birth)
select case when cte.cnt = 1 then C.Contact_ID
else c.Contact_ID||'D'
end as Contact_ID,
C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
O.Group_Name, /* O.other_fields... , */
P.End_Date
from pm.Contacts C
inner join pm.Patients P
on P.Patient_ID = C.Contact_ID
inner join pm.Policies O
on O.Contact_ID = C.Contact_ID
inner join cte on
c.first_name = cte.fname and
c.last_name = cte.lname and
c.date_of_birth = cte.dob
where P.End_Date is null
The above is for Oracle but changing the concatenation operator from "||" to "+" should work for SQL Server.
I assume the Primary key hasn't been created yet or there would not be duplicates?
What if there are 3 duplicates? Adding a 'D' would still create a duplicate.
That said look at the following example. Col1 will be the 'new' PK once the dupes are removed.
Assuming Oracle
Open in new window