Solved

SQL- identify duplicate records by analysis

Posted on 2010-09-14
7
209 Views
Last Modified: 2012-05-10
Find duplicates in a table "Users" where you must analyze certain fields like "first_name", "last_name", "address", "email" to conclude if they are indeed valid duplicates (obviously, the unique field "user_identification_number" has different values, hence the need to find and identify duplicate accounts)
The only criteria that is s MUST (where clause) is that at least one of the duplicate record/s must have a value of 0 in the field "criteria".

 Examples: "email" in a record is identical to "email" in another record is a guaranteed duplicate.
"FirstName" in a record in a record is identical to "FirstName" in another record is NOT a guaranteed duplicate (for this the "LastName" must also be identical).


Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.

Thank you.

0
Comment
Question by:spirose
  • 4
  • 2
7 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 33674803
spirose,With respect, this question has the look and feel of an academic assignment.  Per the Member Agreement, Members are not allowed to ask for and the Experts are not allowed to offer to do your academic assignments for you.What you *can* do, however, is post the code/answers you have so far, and ask the Experts for tips and hints to complete the answer.  You can also ask the Experts for help in understanding key concepts and principles, but in either case, you have to do the heavy lifting yourself.If I am mistaken and this is not an academic assignment, then please post a clarification of your situation and intention to put the Experts at ease.Patrick
0
 

Author Comment

by:spirose
ID: 33674869
This is not an academic assignment- sorry if it came across that way. This work involves major data sleuthing and I just wanted to narrow down my options using some sort of query(from thousands of records to possibly hundreds) so that I can identify them more efficiently. I hope that makes sense.

Let me try and make the question more clear:

select first, last, address, email from Users A
inner join Users B
on (??what do I join this to)
where A.criteria = 0
and A.email like .email (??)


I am taking an example of emails since email is a guaranteed way of identifying duplicates.
0
 

Author Comment

by:spirose
ID: 33674888
EDIT: last line of the code should be
A.email = B.email (I do not think like is an appropriate operator here).

Please correct me if I am wrong..
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 41

Expert Comment

by:ralmada
ID: 33674892
Agree with matthewspatrick. No to answer your question below:
>>Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.<<
Hint for you: You can use the count() over (partition by ... ) to get the counts of same first_name, last_name combination on one side and the email on the other....
0
 

Author Comment

by:spirose
ID: 33674939
Again, this is NOT an academic assignment. I do not know where I gave that vibe.
0
 
LVL 41

Accepted Solution

by:
ralmada earned 500 total points
ID: 33674950
so basically follow this structure
select first, last, address, email from (

 select first, last, address, email, count() over (partition by first, last) cn1, count() over (partition by .... something else you complete this...) cn2  from Users

) A

where cn1 > 1 or cn2 > 1

Open in new window

0
 

Author Comment

by:spirose
ID: 33675115
Thank you for providing an efficient concept of count() over (partition by...)

P.S. I hope I provided enough clarification as to this NOT being an academic assignment to put the Experts at ease.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Naughty Me. While I was changing the database name from DB1 to DB_PROD1 (yep it's not real database name ^v^), I changed the database name and notified my application fellows that I did it. They turn on the application, and everything is working. A …
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, just open a new email message. In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Both in life and business – not all partnerships are created equal. As the demand for cloud services increases, so do the number of self-proclaimed cloud partners. Asking the right questions up front in the partnership, will enable both parties …

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now