[Webinar] Streamline your web hosting managementRegister Today

x
?
Solved

SQL- identify duplicate records by analysis

Posted on 2010-09-14
7
Medium Priority
?
217 Views
Last Modified: 2012-05-10
Find duplicates in a table "Users" where you must analyze certain fields like "first_name", "last_name", "address", "email" to conclude if they are indeed valid duplicates (obviously, the unique field "user_identification_number" has different values, hence the need to find and identify duplicate accounts)
The only criteria that is s MUST (where clause) is that at least one of the duplicate record/s must have a value of 0 in the field "criteria".

 Examples: "email" in a record is identical to "email" in another record is a guaranteed duplicate.
"FirstName" in a record in a record is identical to "FirstName" in another record is NOT a guaranteed duplicate (for this the "LastName" must also be identical).


Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.

Thank you.

0
Comment
Question by:spirose
  • 4
  • 2
7 Comments
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 33674803
spirose,With respect, this question has the look and feel of an academic assignment.  Per the Member Agreement, Members are not allowed to ask for and the Experts are not allowed to offer to do your academic assignments for you.What you *can* do, however, is post the code/answers you have so far, and ask the Experts for tips and hints to complete the answer.  You can also ask the Experts for help in understanding key concepts and principles, but in either case, you have to do the heavy lifting yourself.If I am mistaken and this is not an academic assignment, then please post a clarification of your situation and intention to put the Experts at ease.Patrick
0
 

Author Comment

by:spirose
ID: 33674869
This is not an academic assignment- sorry if it came across that way. This work involves major data sleuthing and I just wanted to narrow down my options using some sort of query(from thousands of records to possibly hundreds) so that I can identify them more efficiently. I hope that makes sense.

Let me try and make the question more clear:

select first, last, address, email from Users A
inner join Users B
on (??what do I join this to)
where A.criteria = 0
and A.email like .email (??)


I am taking an example of emails since email is a guaranteed way of identifying duplicates.
0
 

Author Comment

by:spirose
ID: 33674888
EDIT: last line of the code should be
A.email = B.email (I do not think like is an appropriate operator here).

Please correct me if I am wrong..
0
Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

 
LVL 41

Expert Comment

by:ralmada
ID: 33674892
Agree with matthewspatrick. No to answer your question below:
>>Bottom line: there is no sure fire way to query for accurate duplicates- I just want a generic guidance so that I can eyeball and identify the duplicates myself after setting some basic criteria.<<
Hint for you: You can use the count() over (partition by ... ) to get the counts of same first_name, last_name combination on one side and the email on the other....
0
 

Author Comment

by:spirose
ID: 33674939
Again, this is NOT an academic assignment. I do not know where I gave that vibe.
0
 
LVL 41

Accepted Solution

by:
ralmada earned 2000 total points
ID: 33674950
so basically follow this structure
select first, last, address, email from (
 select first, last, address, email, count() over (partition by first, last) cn1, count() over (partition by .... something else you complete this...) cn2  from Users
) A
where cn1 > 1 or cn2 > 1

Open in new window

0
 

Author Comment

by:spirose
ID: 33675115
Thank you for providing an efficient concept of count() over (partition by...)

P.S. I hope I provided enough clarification as to this NOT being an academic assignment to put the Experts at ease.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
SQL Database Recovery Software repairs the MDF & NDF Files, corrupted due to hardware related issues or software related errors. Provides preview of recovered database objects and allows saving in either MSSQL, CSV, HTML or XLS format. Ensures recov…
Stellar Phoenix SQL Database Repair software easily fixes the suspect mode issue of SQL Server database. It is a simple process to bring the database from suspect mode to normal mode. Check out the video and fix the SQL database suspect mode problem.

611 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question