Solved

SQL- identify duplicate records by analysis

Posted on 2010-09-14
7
214 Views
Last Modified: 2012-05-10
Find duplicates in a table "Users" where you must analyze certain fields like "first_name", "last_name", "address", "email" to conclude if they are indeed valid duplicates (obviously, the unique field "user_identification_number" has different values, hence the need to find and identify duplicate accounts)
The only criteria that is s MUST (where clause) is that at least one of the duplicate record/s must have a value of 0 in the field "criteria".

 Examples: "email" in a record is identical to "email" in another record is a guaranteed duplicate.
"FirstName" in a record in a record is identical to "FirstName" in another record is NOT a guaranteed duplicate (for this the "LastName" must also be identical).


Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.

Thank you.

0
Comment
Question by:spirose
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 33674803
spirose,With respect, this question has the look and feel of an academic assignment.  Per the Member Agreement, Members are not allowed to ask for and the Experts are not allowed to offer to do your academic assignments for you.What you *can* do, however, is post the code/answers you have so far, and ask the Experts for tips and hints to complete the answer.  You can also ask the Experts for help in understanding key concepts and principles, but in either case, you have to do the heavy lifting yourself.If I am mistaken and this is not an academic assignment, then please post a clarification of your situation and intention to put the Experts at ease.Patrick
0
 

Author Comment

by:spirose
ID: 33674869
This is not an academic assignment- sorry if it came across that way. This work involves major data sleuthing and I just wanted to narrow down my options using some sort of query(from thousands of records to possibly hundreds) so that I can identify them more efficiently. I hope that makes sense.

Let me try and make the question more clear:

select first, last, address, email from Users A
inner join Users B
on (??what do I join this to)
where A.criteria = 0
and A.email like .email (??)


I am taking an example of emails since email is a guaranteed way of identifying duplicates.
0
 

Author Comment

by:spirose
ID: 33674888
EDIT: last line of the code should be
A.email = B.email (I do not think like is an appropriate operator here).

Please correct me if I am wrong..
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 
LVL 41

Expert Comment

by:ralmada
ID: 33674892
Agree with matthewspatrick. No to answer your question below:
>>Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.<<
Hint for you: You can use the count() over (partition by ... ) to get the counts of same first_name, last_name combination on one side and the email on the other....
0
 

Author Comment

by:spirose
ID: 33674939
Again, this is NOT an academic assignment. I do not know where I gave that vibe.
0
 
LVL 41

Accepted Solution

by:
ralmada earned 500 total points
ID: 33674950
so basically follow this structure
select first, last, address, email from (
 select first, last, address, email, count() over (partition by first, last) cn1, count() over (partition by .... something else you complete this...) cn2  from Users
) A
where cn1 > 1 or cn2 > 1

Open in new window

0
 

Author Comment

by:spirose
ID: 33675115
Thank you for providing an efficient concept of count() over (partition by...)

P.S. I hope I provided enough clarification as to this NOT being an academic assignment to put the Experts at ease.
0

Featured Post

SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Confronted with some SQL you don't know can be a daunting task. It can be even more daunting if that SQL carries some of the old secret codes used in the Ye Olde query syntax, such as: (+)     as used in Oracle;     *=     =*    as used in Sybase …
SQL Server engine let you use a Windows account or a SQL Server account to connect to a SQL Server instance. This can be configured immediatly during the SQL Server installation or after in the Server Authentication section in the Server properties …
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question