[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 265
  • Last Modified:

SQL Query find and replace duplicates with a "D" added to primary key

I wanted to know a way to find duplicates in a query and replace them with a case statement that adds a "D" or anything to note that there are 2 instances of the primary key.

Basically I want to show all records from table CTE, but I want it to replace any records that have 2 instances of the same name and birth date with a "D" in the contactID field. Where would I add a case when statement?

;with cte as (
      select      C.Contact_ID, C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      where      P.End_Date is null
)
select      *
from      cte T1
where      exists (
                  -- check for duplicates i.e. a record with
                  -- different ID and same name and birth date
                  select      *
                  from      cte T2
                  where      T2.First_Name = T1.First_Name
                  and            T2.Last_Name = T1.Last_Name
                  and            T2.Date_Of_Birth = T1.Date_Of_Birth
                  and            T2.Contact_ID != T1.Contact_ID
            )
0
db_hopewell
Asked:
db_hopewell
  • 3
  • 2
  • 2
  • +2
1 Solution
 
John_VidmarCommented:
Looks like you are using T-SQL (MS SQL Server), which contains row_number function, allows you to number the duplicates.  Partitioning tells row_number to restart the numbering at 1 when there is a grouping change:
with cte as
(	SELECT	C.Contact_ID
	,	C.First_Name
	,	C.Last_Name
	,	C.Date_Of_Birth
	,	/* other fields you need */
	,	rn = row_number() OVER (PARTITION BY First_Name, Last_Name, Date_Of_Birth ORDER BY Contact_ID)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte

Open in new window

0
 
slightwv (䄆 Netminder) Commented:
Which database?

I assume the Primary key hasn't been created yet or there would not be duplicates?

What if there are 3 duplicates?  Adding a 'D' would still create a duplicate.

That said look at the following example.  Col1 will be the 'new' PK once the dupes are removed.

Assuming Oracle
drop table tab1 purge;
create table tab1 (col1 varchar2(5), Col2 char(1));

insert into tab1 values('1','A');
insert into tab1 values('2','B');
insert into tab1 values('1','C');
insert into tab1 values('3','D');
insert into tab1 values('3','E');
commit;


update tab1 set col1 = col1 || 'D' where rowid in
(
	select max(rowid) from tab1 group by col1 having count(*) > 1
);

Open in new window

0
 
ste5anSenior DeveloperCommented:
that there are 2 instances of the primary key.
This is not possible. Or it is not a primary key..
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
db_hopewellAuthor Commented:
ok let me rephrase. I wrote a complex query and want to find the duplicates and mark them and add a "D" on the end in the ID field.  Is there a different way than  John_Vidmar's row number and partitioning within sql above? Something like a case statement?  The system that I am sending the file to, would create 2 records of the same person with different ID's. One with the 'id' and one with the 'id' + 'D'
0
 
slightwv (䄆 Netminder) Commented:
row_number is probably how I would check.  It is a common function in many database products these days.

You can easily take the row_number output and add it to a case statement to add the 'D'.  You should then be able to make that an update statement.

Knowing the database you are targeting will help us provide working examples.
0
 
db_hopewellAuthor Commented:
Microsoft SQL Server and I am going to run this in integration services to produce an output txt file for another proprietary system to read in nightly.
0
 
ste5anSenior DeveloperCommented:
Why not using DISTINCT to eliminate them?
0
 
John_VidmarCommented:
If you don't have row_number function then you could use a correlated-subquery:
with cte as
(	SELECT	C.Contact_ID
	,	/* other fields you need */
	,	rn =	(	SELECT	count(*)
				FROM	Contacts
				WHERE	First_Name = C.First_Name
				AND	Last_Name = C.Last_Name
				AND	Date_Of_Birth = C.Date_Of_Birth
				AND	Contact_ID < = C.Contact_ID
			)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte
                                                                             

Open in new window

0
 
db_hopewellAuthor Commented:
Thanks for being patient with me. That will work for now.
0
 
awking00Commented:
Assuming that the Contact_ID is the primary key (and unique) and that you want to concatenate a 'D' at the end of it if there are matching names and birthdates in other records, the following should work:
with cte as
(select first_name fname, last_name lname, date_of_birth dob, count(*) cnt
 from pm.contacts
 group by first_name, last_name, date_of_birth)
select  case when cte.cnt = 1 then C.Contact_ID
             else c.Contact_ID||'D'
        end as Contact_ID,
 C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      inner      join cte on
                    c.first_name = cte.fname and
                    c.last_name = cte.lname and
                    c.date_of_birth = cte.dob
      where      P.End_Date is null
The above is for Oracle but changing the concatenation operator from "||" to "+" should work for SQL Server.
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 3
  • 2
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now