Need help with what appears to be a self join

Suppose I have a table named "People", the first column is the ID,
the second is the FirstName and the third column is a character field
to indicate if a record is a duplicate record. I only need to flag the
duplicate records as duplicates, but not the very first record which belongs
to a group of duplicates. I think I would need to use a self join, but beyond
that I am lost. Can someone help me out please?

Table Before Query

ID     FirstName   IsDuplicate
12A    Nirma      
12A    Nirma
13E    Berry
13E    Berry
13E    Berry
13E    Berry
14A    Chastity
15X    Jim
15X    Jim



Table After Query. Duplicates are marked with a "Y" value
ID     FirstName   IsDuplicate
12A    Nirma        
12A    Nirma          Y
13E    Berry
13E    Berry            Y
13E    Berry            Y
13E    Berry            Y
14A    Chastity    
15X    Jim      
15X    Jim                Y
LVL 2
brgdotnetcontractorAsked:
Who is Participating?
 
Mike EghtebasDatabase and Application DeveloperCommented:
with a as
(
SELECT ID, FirstName,
RANK() OVER(ORDER BY ID) AS rnk,
NTILE(100) OVER(ORDER BY ID) AS ntile100
FROM #t
)
Select a.ID, a.FirstName, iif(a.rnk=a.ntile100,'','Y') As IsDuplicate
from a

gives you:
12A	Nirma	
12A	Nirma	Y
13E	Berry	
13E	Berry	Y
13E	Berry	Y
13E	Berry	Y
14A	Chastity	
15X	Jim	
15X	Jim	Y


This is my sample code:
create table #t(ID varchar(5),    FirstName varchar(15),   IsDuplicate char(1));
insert #t (ID ,    FirstName)values
('12A', 'Nirma')        
, ('12A', 'Nirma')
, ('13E', 'Berry')
, ('13E', 'Berry')
, ('13E', 'Berry')
, ('13E', 'Berry')
, ('14A', 'Chastity')    
, ('15X', 'Jim')       
, ('15X', 'Jim') 
  
Select * From #t

Open in new window


This is what you want but we need to incorporate it in update query.

For use of NTILE see https://msdn.microsoft.com/en-us/library/ms175126.aspx

Mike
0
 
arnoldCommented:
I do not believe with the current structure you can achieve what you are looking for.
One way I think to achieve this is to add a column int, and a cursor that will run through and set a random number in the new column, then you can differentiate between duplicate entries.

I.e. You can not issue an update tablename set isduplicate='Y' where id= and name= because both/all of those rows will match.

Alternatively, you can define a temporary table with a primary unique index into which you will insert data from this table and then drop and reinsert the data from the temporary table.
0
 
brgdotnetcontractorAuthor Commented:
So cool. I hope someday to be as talented at you. I am learning more and more sql as time goes by. You are awesome though,
thank you.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.