Solved

SQL Query find and replace duplicates with a "D" added to primary key

Posted on 2014-04-11
11
256 Views
Last Modified: 2014-04-11
I wanted to know a way to find duplicates in a query and replace them with a case statement that adds a "D" or anything to note that there are 2 instances of the primary key.

Basically I want to show all records from table CTE, but I want it to replace any records that have 2 instances of the same name and birth date with a "D" in the contactID field. Where would I add a case when statement?

;with cte as (
      select      C.Contact_ID, C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      where      P.End_Date is null
)
select      *
from      cte T1
where      exists (
                  -- check for duplicates i.e. a record with
                  -- different ID and same name and birth date
                  select      *
                  from      cte T2
                  where      T2.First_Name = T1.First_Name
                  and            T2.Last_Name = T1.Last_Name
                  and            T2.Date_Of_Birth = T1.Date_Of_Birth
                  and            T2.Contact_ID != T1.Contact_ID
            )
0
Comment
Question by:db_hopewell
  • 3
  • 2
  • 2
  • +2
11 Comments
 
LVL 11

Accepted Solution

by:
John_Vidmar earned 500 total points
ID: 39994764
Looks like you are using T-SQL (MS SQL Server), which contains row_number function, allows you to number the duplicates.  Partitioning tells row_number to restart the numbering at 1 when there is a grouping change:
with cte as
(	SELECT	C.Contact_ID
	,	C.First_Name
	,	C.Last_Name
	,	C.Date_Of_Birth
	,	/* other fields you need */
	,	rn = row_number() OVER (PARTITION BY First_Name, Last_Name, Date_Of_Birth ORDER BY Contact_ID)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39994774
Which database?

I assume the Primary key hasn't been created yet or there would not be duplicates?

What if there are 3 duplicates?  Adding a 'D' would still create a duplicate.

That said look at the following example.  Col1 will be the 'new' PK once the dupes are removed.

Assuming Oracle
drop table tab1 purge;
create table tab1 (col1 varchar2(5), Col2 char(1));

insert into tab1 values('1','A');
insert into tab1 values('2','B');
insert into tab1 values('1','C');
insert into tab1 values('3','D');
insert into tab1 values('3','E');
commit;


update tab1 set col1 = col1 || 'D' where rowid in
(
	select max(rowid) from tab1 group by col1 having count(*) > 1
);

Open in new window

0
 
LVL 33

Expert Comment

by:ste5an
ID: 39994777
that there are 2 instances of the primary key.
This is not possible. Or it is not a primary key..
0
Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

 

Author Comment

by:db_hopewell
ID: 39994824
ok let me rephrase. I wrote a complex query and want to find the duplicates and mark them and add a "D" on the end in the ID field.  Is there a different way than  John_Vidmar's row number and partitioning within sql above? Something like a case statement?  The system that I am sending the file to, would create 2 records of the same person with different ID's. One with the 'id' and one with the 'id' + 'D'
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39994827
row_number is probably how I would check.  It is a common function in many database products these days.

You can easily take the row_number output and add it to a case statement to add the 'D'.  You should then be able to make that an update statement.

Knowing the database you are targeting will help us provide working examples.
0
 

Author Comment

by:db_hopewell
ID: 39994832
Microsoft SQL Server and I am going to run this in integration services to produce an output txt file for another proprietary system to read in nightly.
0
 
LVL 33

Expert Comment

by:ste5an
ID: 39994848
Why not using DISTINCT to eliminate them?
0
 
LVL 11

Expert Comment

by:John_Vidmar
ID: 39994853
If you don't have row_number function then you could use a correlated-subquery:
with cte as
(	SELECT	C.Contact_ID
	,	/* other fields you need */
	,	rn =	(	SELECT	count(*)
				FROM	Contacts
				WHERE	First_Name = C.First_Name
				AND	Last_Name = C.Last_Name
				AND	Date_Of_Birth = C.Date_Of_Birth
				AND	Contact_ID < = C.Contact_ID
			)
	FROM	pm.Contacts	C
	JOIN	pm.Patients	P	on	P.Patient_ID = C.Contact_ID
	JOIN	pm.Policies	O	on	O.Contact_ID = C.Contact_ID
	WHERE	P.End_Date IS NULL
)
SELECT	Contact_ID = CAST(Contact_ID as varchar) + CASE WHEN rn = 1 THEN '' else 'D' END
,	/* other fields you need */
FROM      cte
                                                                             

Open in new window

0
 

Author Closing Comment

by:db_hopewell
ID: 39994869
Thanks for being patient with me. That will work for now.
0
 
LVL 32

Expert Comment

by:awking00
ID: 39994916
Assuming that the Contact_ID is the primary key (and unique) and that you want to concatenate a 'D' at the end of it if there are matching names and birthdates in other records, the following should work:
with cte as
(select first_name fname, last_name lname, date_of_birth dob, count(*) cnt
 from pm.contacts
 group by first_name, last_name, date_of_birth)
select  case when cte.cnt = 1 then C.Contact_ID
             else c.Contact_ID||'D'
        end as Contact_ID,
 C.First_Name, C.Last_Name, C.Date_Of_Birth, /* C.other_fields... , */
                  O.Group_Name, /* O.other_fields... , */
                  P.End_Date
      from      pm.Contacts C
      inner      join pm.Patients P
      on            P.Patient_ID = C.Contact_ID
      inner      join pm.Policies O
      on            O.Contact_ID = C.Contact_ID
      inner      join cte on
                    c.first_name = cte.fname and
                    c.last_name = cte.lname and
                    c.date_of_birth = cte.dob
      where      P.End_Date is null
The above is for Oracle but changing the concatenation operator from "||" to "+" should work for SQL Server.
0

Featured Post

Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question