?
Solved

SQL Query find and replace duplicates with a "D" added to primary key

Posted on 2014-04-11
11
Medium Priority
?
263 Views
Last Modified: 2014-04-11
I wanted to know a way to find duplicates in a query and replace them with a case statement that adds a "D" or anything to note that there are 2 instances of the primary key.

Basically I want to show all records from table CTE, but I want it to replace any records that have 2 instances of the same name and birth date with a "D" in the contactID field. Where would I add a case when statement?

;with cte as (
      select      C.Contact_ID, C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      where      P.End_Date is null
)
select      *
from      cte T1
where      exists (
                  -- check for duplicates i.e. a record with
                  -- different ID and same name and birth date
                  select      *
                  from      cte T2
                  where      T2.First_Name = T1.First_Name
                  and            T2.Last_Name = T1.Last_Name
                  and            T2.Date_Of_Birth = T1.Date_Of_Birth
                  and            T2.Contact_ID != T1.Contact_ID
            )
0
Comment
Question by:db_hopewell
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
11 Comments
 
LVL 11

Accepted Solution

by:
John_Vidmar earned 2000 total points
ID: 39994764
Looks like you are using T-SQL (MS SQL Server), which contains row_number function, allows you to number the duplicates.  Partitioning tells row_number to restart the numbering at 1 when there is a grouping change:
with cte as
(	SELECT	C.Contact_ID
	,	C.First_Name
	,	C.Last_Name
	,	C.Date_Of_Birth
	,	/* other fields you need */
	,	rn = row_number() OVER (PARTITION BY First_Name, Last_Name, Date_Of_Birth ORDER BY Contact_ID)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39994774
Which database?

I assume the Primary key hasn't been created yet or there would not be duplicates?

What if there are 3 duplicates?  Adding a 'D' would still create a duplicate.

That said look at the following example.  Col1 will be the 'new' PK once the dupes are removed.

Assuming Oracle
drop table tab1 purge;
create table tab1 (col1 varchar2(5), Col2 char(1));

insert into tab1 values('1','A');
insert into tab1 values('2','B');
insert into tab1 values('1','C');
insert into tab1 values('3','D');
insert into tab1 values('3','E');
commit;


update tab1 set col1 = col1 || 'D' where rowid in
(
	select max(rowid) from tab1 group by col1 having count(*) > 1
);

Open in new window

0
 
LVL 35

Expert Comment

by:ste5an
ID: 39994777
that there are 2 instances of the primary key.
This is not possible. Or it is not a primary key..
0
 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

 

Author Comment

by:db_hopewell
ID: 39994824
ok let me rephrase. I wrote a complex query and want to find the duplicates and mark them and add a "D" on the end in the ID field.  Is there a different way than  John_Vidmar's row number and partitioning within sql above? Something like a case statement?  The system that I am sending the file to, would create 2 records of the same person with different ID's. One with the 'id' and one with the 'id' + 'D'
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39994827
row_number is probably how I would check.  It is a common function in many database products these days.

You can easily take the row_number output and add it to a case statement to add the 'D'.  You should then be able to make that an update statement.

Knowing the database you are targeting will help us provide working examples.
0
 

Author Comment

by:db_hopewell
ID: 39994832
Microsoft SQL Server and I am going to run this in integration services to produce an output txt file for another proprietary system to read in nightly.
0
 
LVL 35

Expert Comment

by:ste5an
ID: 39994848
Why not using DISTINCT to eliminate them?
0
 
LVL 11

Expert Comment

by:John_Vidmar
ID: 39994853
If you don't have row_number function then you could use a correlated-subquery:
with cte as
(	SELECT	C.Contact_ID
	,	/* other fields you need */
	,	rn =	(	SELECT	count(*)
				FROM	Contacts
				WHERE	First_Name = C.First_Name
				AND	Last_Name = C.Last_Name
				AND	Date_Of_Birth = C.Date_Of_Birth
				AND	Contact_ID < = C.Contact_ID
			)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte
                                                                             

Open in new window

0
 

Author Closing Comment

by:db_hopewell
ID: 39994869
Thanks for being patient with me. That will work for now.
0
 
LVL 32

Expert Comment

by:awking00
ID: 39994916
Assuming that the Contact_ID is the primary key (and unique) and that you want to concatenate a 'D' at the end of it if there are matching names and birthdates in other records, the following should work:
with cte as
(select first_name fname, last_name lname, date_of_birth dob, count(*) cnt
 from pm.contacts
 group by first_name, last_name, date_of_birth)
select  case when cte.cnt = 1 then C.Contact_ID
             else c.Contact_ID||'D'
        end as Contact_ID,
 C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      inner      join cte on
                    c.first_name = cte.fname and
                    c.last_name = cte.lname and
                    c.date_of_birth = cte.dob
      where      P.End_Date is null
The above is for Oracle but changing the concatenation operator from "||" to "+" should work for SQL Server.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question