SQL- identify duplicate records by analysis

Find duplicates in a table "Users" where you must analyze certain fields like "first_name", "last_name", "address", "email" to conclude if they are indeed valid duplicates (obviously, the unique field "user_identification_number" has different values, hence the need to find and identify duplicate accounts)
The only criteria that is s MUST (where clause) is that at least one of the duplicate record/s must have a value of 0 in the field "criteria".

 Examples: "email" in a record is identical to "email" in another record is a guaranteed duplicate.
"FirstName" in a record in a record is identical to "FirstName" in another record is NOT a guaranteed duplicate (for this the "LastName" must also be identical).


Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.

Thank you.

spiroseAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
ralmadaConnect With a Mentor Commented:
so basically follow this structure
select first, last, address, email from (
 select first, last, address, email, count() over (partition by first, last) cn1, count() over (partition by .... something else you complete this...) cn2  from Users
) A
where cn1 > 1 or cn2 > 1

Open in new window

0
 
Patrick MatthewsCommented:
spirose,With respect, this question has the look and feel of an academic assignment.  Per the Member Agreement, Members are not allowed to ask for and the Experts are not allowed to offer to do your academic assignments for you.What you *can* do, however, is post the code/answers you have so far, and ask the Experts for tips and hints to complete the answer.  You can also ask the Experts for help in understanding key concepts and principles, but in either case, you have to do the heavy lifting yourself.If I am mistaken and this is not an academic assignment, then please post a clarification of your situation and intention to put the Experts at ease.Patrick
0
 
spiroseAuthor Commented:
This is not an academic assignment- sorry if it came across that way. This work involves major data sleuthing and I just wanted to narrow down my options using some sort of query(from thousands of records to possibly hundreds) so that I can identify them more efficiently. I hope that makes sense.

Let me try and make the question more clear:

select first, last, address, email from Users A
inner join Users B
on (??what do I join this to)
where A.criteria = 0
and A.email like .email (??)


I am taking an example of emails since email is a guaranteed way of identifying duplicates.
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
spiroseAuthor Commented:
EDIT: last line of the code should be
A.email = B.email (I do not think like is an appropriate operator here).

Please correct me if I am wrong..
0
 
ralmadaCommented:
Agree with matthewspatrick. No to answer your question below:
>>Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.<<
Hint for you: You can use the count() over (partition by ... ) to get the counts of same first_name, last_name combination on one side and the email on the other....
0
 
spiroseAuthor Commented:
Again, this is NOT an academic assignment. I do not know where I gave that vibe.
0
 
spiroseAuthor Commented:
Thank you for providing an efficient concept of count() over (partition by...)

P.S. I hope I provided enough clarification as to this NOT being an academic assignment to put the Experts at ease.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.