Need help with what appears to be a self join

Suppose I have a table named "People", the first column is the ID,
the second is the FirstName and the third column is a character field
to indicate if a record is a duplicate record. I only need to flag the
duplicate records as duplicates, but not the very first record which belongs
to a group of duplicates. I think I would need to use a self join, but beyond
that I am lost. Can someone help me out please?

Table Before Query

ID     FirstName   IsDuplicate
12A    Nirma      
12A    Nirma
13E    Berry
13E    Berry
13E    Berry
13E    Berry
14A    Chastity
15X    Jim
15X    Jim



Table After Query. Duplicates are marked with a "Y" value
ID     FirstName   IsDuplicate
12A    Nirma        
12A    Nirma          Y
13E    Berry
13E    Berry            Y
13E    Berry            Y
13E    Berry            Y
14A    Chastity    
15X    Jim      
15X    Jim                Y
LVL 2
brgdotnetcontractorAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

arnoldCommented:
I do not believe with the current structure you can achieve what you are looking for.
One way I think to achieve this is to add a column int, and a cursor that will run through and set a random number in the new column, then you can differentiate between duplicate entries.

I.e. You can not issue an update tablename set isduplicate='Y' where id= and name= because both/all of those rows will match.

Alternatively, you can define a temporary table with a primary unique index into which you will insert data from this table and then drop and reinsert the data from the temporary table.
0
Mike EghtebasDatabase and Application DeveloperCommented:
with a as
(
SELECT ID, FirstName,
RANK() OVER(ORDER BY ID) AS rnk,
NTILE(100) OVER(ORDER BY ID) AS ntile100
FROM #t
)
Select a.ID, a.FirstName, iif(a.rnk=a.ntile100,'','Y') As IsDuplicate
from a

gives you:
12A	Nirma	
12A	Nirma	Y
13E	Berry	
13E	Berry	Y
13E	Berry	Y
13E	Berry	Y
14A	Chastity	
15X	Jim	
15X	Jim	Y


This is my sample code:
create table #t(ID varchar(5),    FirstName varchar(15),   IsDuplicate char(1));
insert #t (ID ,    FirstName)values
('12A', 'Nirma')        
, ('12A', 'Nirma')
, ('13E', 'Berry')
, ('13E', 'Berry')
, ('13E', 'Berry')
, ('13E', 'Berry')
, ('14A', 'Chastity')    
, ('15X', 'Jim')       
, ('15X', 'Jim') 
  
Select * From #t

Open in new window


This is what you want but we need to incorporate it in update query.

For use of NTILE see https://msdn.microsoft.com/en-us/library/ms175126.aspx

Mike
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
brgdotnetcontractorAuthor Commented:
So cool. I hope someday to be as talented at you. I am learning more and more sql as time goes by. You are awesome though,
thank you.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.