?
Solved

Delete Duplicate Email Records That Are Attached To One Account But Can Be Found In Multiple Accounts

Posted on 2015-01-16
16
Medium Priority
?
137 Views
Last Modified: 2015-01-24
I have a table, consumer_mgmt.  It collects consumer information from various forms.  These forms are available through different pages that are part of a business package.

A business can offer these signups to gather names and emails from consumers for various types of specials they may offer.  So a consumer my be in the consumer_mgmt table 5, 10, 15 times for that particular business.

But, that consumer may be in the consumer_mgmt table multiple times for a different business.  So multiple times for multiple businesses.

I need to remove duplicates for each business account so the consumer is only the consumer_mgmt only once for each business.

There are approximately 15,000 rows currently in the consumer_mgmt table.
0
Comment
Question by:TecTaoMC
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 6
  • 2
16 Comments
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40554502
This isn't as hard as it sounds ... depending on table structure.  Do you have some unique identifier, an email address, and a business name (at a minimum) in that table?

id                        email                                       business
1                        bob@joe.com                          GMC
2                        joe@joe.com                            GMC
3                        joe@joe.com                            Ford
4                        bob@joe.com                          GMC

And in this example you'd want to delete only record 4, correct?

Back up your data before you try this, but it should look like this:

Delete from MyTable T Where EXISTS (Select ID from MyTable where email=T.email and business=T.business and ID<T.ID);
0
 

Author Comment

by:TecTaoMC
ID: 40554730
In your example I would only want to delete one record, record 4 Bob@joe.com / GMC.

That's what's what I'm having a hard time getting my arms around.  It really doesn't matter which record as long as in the the business GMC only has one bob@joe.com.  Record 1 or record 4 could be deleted.  Business Ford would be untouched since there is only one record.  But if business FORD had two, or three or ten joe@joe.com,  all would need to be deleted so there is only one record.

Thanks for your response so quickly.
0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40554807
So, can you adapt my code to your actual field names?  If not, would you post your table definition so I can write some code for you?
0
Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40555113
No points for this please, because it's not an answer to your current question, but the question itself illustrates why we use relational databases.  Each email is an atomic piece of data and each business is an atomic piece of data.  In a relational data base design, you would have your tables looking something like this.

id     email                                       
1      bob@joe.com                       
2      joe@joe.com                         

id     business
1      GMC
3      Ford

Open in new window


Then to create the relationships between the email and business tables, you would create a "pivot" table.

id     email_id    business_id
1         1           1                    <- relates bob@joe.com  to GMC
2         2           1                    <- relates joe@joe.com  to GMC
3         2           3                    <- relates joe@joe.com  to Ford

Open in new window


For some good discussion on this subject, make a Google search for the exact phrase "Should I Normalize My Database?" and read the thoughtful positions on different sides of the question
0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40555121
Great points, Ray!
0
 

Author Comment

by:TecTaoMC
ID: 40555128
That is a good point Ray.  I am using relational tables throughout this application.  There are a number of tables that pertain to specific modules.  I chose to do this so I wouldn't be writing join statements on every query.  Some I do pertaining to advertisement specials.  Others I don't.  In this case the consumer in this table consumer_mgmt may or may not have joined the consumer signup, but wants to receive some special or the business owner has added them with their permission.

That being said, I am going to read your suggestion on "Should I Normalize My Database"

Daniel, Once I've finished my coffee and read the paper I will be getting back to your last post and write out the query to test on a test table.  I'll post for your review.

Thank you both for your interest and help

Mike
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40555145
... so I wouldn't be writing join statements on every query.
Yeah, I get it!  If you really want to save yourself from writing JOINs learn about the Laravel framework.  It includes an object-relational model called Eloquent that can be used outside of Laravel, if you want.  Eloquent implements an active-record design pattern in a way that uses foreign keys to build your queries with an under-the-covers intelligence about JOIN statements.  The magic is in here:
http://laravel.com/docs/4.2/eloquent#relationships
0
 

Author Comment

by:TecTaoMC
ID: 40555198
Hi Daniel, from your earlier post here is the database structure (id is a unique primary field):

id | c_mid | c_username | b_id | b_username | c_name | c_email | b_cat1 | b_cat2 | datein

The query:

Delete from consumer_mgmt CM Where EXISTS (Select id from consumer_mgmt where c_email=CM.c_email and b_username=CM.b_username and id<CM.id);
0
 

Author Comment

by:TecTaoMC
ID: 40555242
Problem with the code, I'm getting an error when I run it through the run sql, I edit squl with phpMyAdmin.

Delete from `consumer_mgmtPurgeTest` `CM` Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`=`CM`.`c_email` and `b_username`=`CM`.`b_username` and `id` < `CM`.`id` )

Open in new window

0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40555304
Code looks OK.  What's the error message?
0
 

Author Comment

by:TecTaoMC
ID: 40555321
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '`CM` Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`=`CM' at line 1

I've checked all the backticks but still brings up the error
0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40555864
The alias was part of the problem ... but this doesn't work either:
Delete from `consumer_mgmtPurgeTest`  Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`= consumer_mgmtPurgeTest.`c_email` and `b_username`=consumer_mgmtPurgeTest.`b_username` and `id` < consumer_mgmtPurgeTest.`id` );

Open in new window


So ... a DELETE with a JOIN seems in order. The syntax works on this one.

delete  CM  from `consumer_mgmtPurgeTest` CM0 inner join  `consumer_mgmtPurgeTest` CM ON CM.`c_email`= CM0.`c_email` and CM.`b_username`= CM0.`b_username` and CM0.`id` < CM.`id`;

Open in new window

0
 

Author Comment

by:TecTaoMC
ID: 40556775
Daniel, thanks.  It worked perfectly.  I ran a couple of dry runs on copied tables with multiple entries and worked just fine.

A couple of questions.

What was your logic to just a join statement, joining the same table with different identifiers?

Second, I'm not sure I understand the logic of CMO.1id` CM.`id`.  The less than command.
0
 
LVL 32

Accepted Solution

by:
Daniel Wilson earned 0 total points
ID: 40556900
Yeah, self-joins get a bit confusing.

I need to delete SOME of the records?  Which ones?  Ones that match others in the table.  Match in the table ... that's a JOIN.

OK, but if I just join on the email and business name, every record matches something -- at least itself.  So if I just join
ON CM.`c_email`= CM0.`c_email` and CM.`b_username`= CM0.`b_username`

Open in new window

all records match, I delete them, and the table is empty.  Bad move.

I need to delete those that have a match OTHER than themselves. Unique to themselves ... ID gives us that. Not themselves ... this looks like a solution.
 and CM0.`id` <> CM.`id`

Open in new window


But it's not b/c then it gets rid of BOTH the duplicate records.  I want 1 record left where there were duplicates.  I could leave the least or leave the greatest.  I chose to delete those that had one less ... meaning the least is retained.

 and CM0.`id` < CM.`id`

Open in new window


I could as easily have left the greatest.
 and CM0.`id` > CM.`id`

Open in new window

0
 

Author Comment

by:TecTaoMC
ID: 40556917
Boy that is thought provoking.  I've re-read it a few times and it's barely sinking in.  Will re-read to get the logic.
Daniel, thanks for all the help and thanks for the explanation.   It helps to expand my knowledge.

I'll be closing this out.  Top grade, top points.  You have been very helpful.  I have a couple of other functions I'm stumbling through and when I get to a point to post I'll send you a heads up.

Again, thank you for your time and knowledge.
Mike
0
 

Author Comment

by:TecTaoMC
ID: 40567963
Daniel was outstanding.  Not only did he understand what I was trying to achieve, but he walked me through with his coding suggestions.  He was prompt, looked into problems I ran into the first go around.  And after completion, he patiently answered questions I had to understand the logic of his solution.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this series, we will discuss common questions received as a database Solutions Engineer at Percona. In this role, we speak with a wide array of MySQL and MongoDB users responsible for both extremely large and complex environments to smaller singl…
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question