Solved

SQL Query find and replace duplicates with a "D" added to primary key

Posted on 2014-04-11
11
259 Views
Last Modified: 2014-04-11
I wanted to know a way to find duplicates in a query and replace them with a case statement that adds a "D" or anything to note that there are 2 instances of the primary key.

Basically I want to show all records from table CTE, but I want it to replace any records that have 2 instances of the same name and birth date with a "D" in the contactID field. Where would I add a case when statement?

;with cte as (
      select      C.Contact_ID, C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      where      P.End_Date is null
)
select      *
from      cte T1
where      exists (
                  -- check for duplicates i.e. a record with
                  -- different ID and same name and birth date
                  select      *
                  from      cte T2
                  where      T2.First_Name = T1.First_Name
                  and            T2.Last_Name = T1.Last_Name
                  and            T2.Date_Of_Birth = T1.Date_Of_Birth
                  and            T2.Contact_ID != T1.Contact_ID
            )
0
Comment
Question by:db_hopewell
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
11 Comments
 
LVL 11

Accepted Solution

by:
John_Vidmar earned 500 total points
ID: 39994764
Looks like you are using T-SQL (MS SQL Server), which contains row_number function, allows you to number the duplicates.  Partitioning tells row_number to restart the numbering at 1 when there is a grouping change:
with cte as
(	SELECT	C.Contact_ID
	,	C.First_Name
	,	C.Last_Name
	,	C.Date_Of_Birth
	,	/* other fields you need */
	,	rn = row_number() OVER (PARTITION BY First_Name, Last_Name, Date_Of_Birth ORDER BY Contact_ID)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39994774
Which database?

I assume the Primary key hasn't been created yet or there would not be duplicates?

What if there are 3 duplicates?  Adding a 'D' would still create a duplicate.

That said look at the following example.  Col1 will be the 'new' PK once the dupes are removed.

Assuming Oracle
drop table tab1 purge;
create table tab1 (col1 varchar2(5), Col2 char(1));

insert into tab1 values('1','A');
insert into tab1 values('2','B');
insert into tab1 values('1','C');
insert into tab1 values('3','D');
insert into tab1 values('3','E');
commit;


update tab1 set col1 = col1 || 'D' where rowid in
(
	select max(rowid) from tab1 group by col1 having count(*) > 1
);

Open in new window

0
 
LVL 34

Expert Comment

by:ste5an
ID: 39994777
that there are 2 instances of the primary key.
This is not possible. Or it is not a primary key..
0
Transaction Monitoring Vs. Real User Monitoring

Synthetic Transaction Monitoring Vs. Real User Monitoring: When To Use Each Approach? In this article, we will discuss two major monitoring approaches: Synthetic Transaction and Real User Monitoring.

 

Author Comment

by:db_hopewell
ID: 39994824
ok let me rephrase. I wrote a complex query and want to find the duplicates and mark them and add a "D" on the end in the ID field.  Is there a different way than  John_Vidmar's row number and partitioning within sql above? Something like a case statement?  The system that I am sending the file to, would create 2 records of the same person with different ID's. One with the 'id' and one with the 'id' + 'D'
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39994827
row_number is probably how I would check.  It is a common function in many database products these days.

You can easily take the row_number output and add it to a case statement to add the 'D'.  You should then be able to make that an update statement.

Knowing the database you are targeting will help us provide working examples.
0
 

Author Comment

by:db_hopewell
ID: 39994832
Microsoft SQL Server and I am going to run this in integration services to produce an output txt file for another proprietary system to read in nightly.
0
 
LVL 34

Expert Comment

by:ste5an
ID: 39994848
Why not using DISTINCT to eliminate them?
0
 
LVL 11

Expert Comment

by:John_Vidmar
ID: 39994853
If you don't have row_number function then you could use a correlated-subquery:
with cte as
(	SELECT	C.Contact_ID
	,	/* other fields you need */
	,	rn =	(	SELECT	count(*)
				FROM	Contacts
				WHERE	First_Name = C.First_Name
				AND	Last_Name = C.Last_Name
				AND	Date_Of_Birth = C.Date_Of_Birth
				AND	Contact_ID < = C.Contact_ID
			)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte
                                                                             

Open in new window

0
 

Author Closing Comment

by:db_hopewell
ID: 39994869
Thanks for being patient with me. That will work for now.
0
 
LVL 32

Expert Comment

by:awking00
ID: 39994916
Assuming that the Contact_ID is the primary key (and unique) and that you want to concatenate a 'D' at the end of it if there are matching names and birthdates in other records, the following should work:
with cte as
(select first_name fname, last_name lname, date_of_birth dob, count(*) cnt
 from pm.contacts
 group by first_name, last_name, date_of_birth)
select  case when cte.cnt = 1 then C.Contact_ID
             else c.Contact_ID||'D'
        end as Contact_ID,
 C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      inner      join cte on
                    c.first_name = cte.fname and
                    c.last_name = cte.lname and
                    c.date_of_birth = cte.dob
      where      P.End_Date is null
The above is for Oracle but changing the concatenation operator from "||" to "+" should work for SQL Server.
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question