Solved

Delete Duplicate Email Records That Are Attached To One Account But Can Be Found In Multiple Accounts

Posted on 2015-01-16
16
113 Views
Last Modified: 2015-01-24
I have a table, consumer_mgmt.  It collects consumer information from various forms.  These forms are available through different pages that are part of a business package.

A business can offer these signups to gather names and emails from consumers for various types of specials they may offer.  So a consumer my be in the consumer_mgmt table 5, 10, 15 times for that particular business.

But, that consumer may be in the consumer_mgmt table multiple times for a different business.  So multiple times for multiple businesses.

I need to remove duplicates for each business account so the consumer is only the consumer_mgmt only once for each business.

There are approximately 15,000 rows currently in the consumer_mgmt table.
0
Comment
Question by:TecTaoMC
  • 8
  • 6
  • 2
16 Comments
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40554502
This isn't as hard as it sounds ... depending on table structure.  Do you have some unique identifier, an email address, and a business name (at a minimum) in that table?

id                        email                                       business
1                        bob@joe.com                          GMC
2                        joe@joe.com                            GMC
3                        joe@joe.com                            Ford
4                        bob@joe.com                          GMC

And in this example you'd want to delete only record 4, correct?

Back up your data before you try this, but it should look like this:

Delete from MyTable T Where EXISTS (Select ID from MyTable where email=T.email and business=T.business and ID<T.ID);
0
 

Author Comment

by:TecTaoMC
ID: 40554730
In your example I would only want to delete one record, record 4 Bob@joe.com / GMC.

That's what's what I'm having a hard time getting my arms around.  It really doesn't matter which record as long as in the the business GMC only has one bob@joe.com.  Record 1 or record 4 could be deleted.  Business Ford would be untouched since there is only one record.  But if business FORD had two, or three or ten joe@joe.com,  all would need to be deleted so there is only one record.

Thanks for your response so quickly.
0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40554807
So, can you adapt my code to your actual field names?  If not, would you post your table definition so I can write some code for you?
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40555113
No points for this please, because it's not an answer to your current question, but the question itself illustrates why we use relational databases.  Each email is an atomic piece of data and each business is an atomic piece of data.  In a relational data base design, you would have your tables looking something like this.

id     email                                       
1      bob@joe.com                       
2      joe@joe.com                         

id     business
1      GMC
3      Ford

Open in new window


Then to create the relationships between the email and business tables, you would create a "pivot" table.

id     email_id    business_id
1         1           1                    <- relates bob@joe.com  to GMC
2         2           1                    <- relates joe@joe.com  to GMC
3         2           3                    <- relates joe@joe.com  to Ford

Open in new window


For some good discussion on this subject, make a Google search for the exact phrase "Should I Normalize My Database?" and read the thoughtful positions on different sides of the question
0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40555121
Great points, Ray!
0
 

Author Comment

by:TecTaoMC
ID: 40555128
That is a good point Ray.  I am using relational tables throughout this application.  There are a number of tables that pertain to specific modules.  I chose to do this so I wouldn't be writing join statements on every query.  Some I do pertaining to advertisement specials.  Others I don't.  In this case the consumer in this table consumer_mgmt may or may not have joined the consumer signup, but wants to receive some special or the business owner has added them with their permission.

That being said, I am going to read your suggestion on "Should I Normalize My Database"

Daniel, Once I've finished my coffee and read the paper I will be getting back to your last post and write out the query to test on a test table.  I'll post for your review.

Thank you both for your interest and help

Mike
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40555145
... so I wouldn't be writing join statements on every query.
Yeah, I get it!  If you really want to save yourself from writing JOINs learn about the Laravel framework.  It includes an object-relational model called Eloquent that can be used outside of Laravel, if you want.  Eloquent implements an active-record design pattern in a way that uses foreign keys to build your queries with an under-the-covers intelligence about JOIN statements.  The magic is in here:
http://laravel.com/docs/4.2/eloquent#relationships
0
 

Author Comment

by:TecTaoMC
ID: 40555198
Hi Daniel, from your earlier post here is the database structure (id is a unique primary field):

id | c_mid | c_username | b_id | b_username | c_name | c_email | b_cat1 | b_cat2 | datein

The query:

Delete from consumer_mgmt CM Where EXISTS (Select id from consumer_mgmt where c_email=CM.c_email and b_username=CM.b_username and id<CM.id);
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:TecTaoMC
ID: 40555242
Problem with the code, I'm getting an error when I run it through the run sql, I edit squl with phpMyAdmin.

Delete from `consumer_mgmtPurgeTest` `CM` Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`=`CM`.`c_email` and `b_username`=`CM`.`b_username` and `id` < `CM`.`id` )

Open in new window

0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40555304
Code looks OK.  What's the error message?
0
 

Author Comment

by:TecTaoMC
ID: 40555321
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '`CM` Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`=`CM' at line 1

I've checked all the backticks but still brings up the error
0
 
LVL 32

Expert Comment

by:Daniel Wilson
ID: 40555864
The alias was part of the problem ... but this doesn't work either:
Delete from `consumer_mgmtPurgeTest`  Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`= consumer_mgmtPurgeTest.`c_email` and `b_username`=consumer_mgmtPurgeTest.`b_username` and `id` < consumer_mgmtPurgeTest.`id` );

Open in new window


So ... a DELETE with a JOIN seems in order. The syntax works on this one.

delete  CM  from `consumer_mgmtPurgeTest` CM0 inner join  `consumer_mgmtPurgeTest` CM ON CM.`c_email`= CM0.`c_email` and CM.`b_username`= CM0.`b_username` and CM0.`id` < CM.`id`;

Open in new window

0
 

Author Comment

by:TecTaoMC
ID: 40556775
Daniel, thanks.  It worked perfectly.  I ran a couple of dry runs on copied tables with multiple entries and worked just fine.

A couple of questions.

What was your logic to just a join statement, joining the same table with different identifiers?

Second, I'm not sure I understand the logic of CMO.1id` CM.`id`.  The less than command.
0
 
LVL 32

Accepted Solution

by:
Daniel Wilson earned 0 total points
ID: 40556900
Yeah, self-joins get a bit confusing.

I need to delete SOME of the records?  Which ones?  Ones that match others in the table.  Match in the table ... that's a JOIN.

OK, but if I just join on the email and business name, every record matches something -- at least itself.  So if I just join
ON CM.`c_email`= CM0.`c_email` and CM.`b_username`= CM0.`b_username`

Open in new window

all records match, I delete them, and the table is empty.  Bad move.

I need to delete those that have a match OTHER than themselves. Unique to themselves ... ID gives us that. Not themselves ... this looks like a solution.
 and CM0.`id` <> CM.`id`

Open in new window


But it's not b/c then it gets rid of BOTH the duplicate records.  I want 1 record left where there were duplicates.  I could leave the least or leave the greatest.  I chose to delete those that had one less ... meaning the least is retained.

 and CM0.`id` < CM.`id`

Open in new window


I could as easily have left the greatest.
 and CM0.`id` > CM.`id`

Open in new window

0
 

Author Comment

by:TecTaoMC
ID: 40556917
Boy that is thought provoking.  I've re-read it a few times and it's barely sinking in.  Will re-read to get the logic.
Daniel, thanks for all the help and thanks for the explanation.   It helps to expand my knowledge.

I'll be closing this out.  Top grade, top points.  You have been very helpful.  I have a couple of other functions I'm stumbling through and when I get to a point to post I'll send you a heads up.

Again, thank you for your time and knowledge.
Mike
0
 

Author Comment

by:TecTaoMC
ID: 40567963
Daniel was outstanding.  Not only did he understand what I was trying to achieve, but he walked me through with his coding suggestions.  He was prompt, looked into problems I ran into the first go around.  And after completion, he patiently answered questions I had to understand the logic of his solution.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now