[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 148
  • Last Modified:

Delete Duplicate Email Records That Are Attached To One Account But Can Be Found In Multiple Accounts

I have a table, consumer_mgmt.  It collects consumer information from various forms.  These forms are available through different pages that are part of a business package.

A business can offer these signups to gather names and emails from consumers for various types of specials they may offer.  So a consumer my be in the consumer_mgmt table 5, 10, 15 times for that particular business.

But, that consumer may be in the consumer_mgmt table multiple times for a different business.  So multiple times for multiple businesses.

I need to remove duplicates for each business account so the consumer is only the consumer_mgmt only once for each business.

There are approximately 15,000 rows currently in the consumer_mgmt table.
0
TecTaoMC
Asked:
TecTaoMC
  • 8
  • 6
  • 2
1 Solution
 
Daniel WilsonCommented:
This isn't as hard as it sounds ... depending on table structure.  Do you have some unique identifier, an email address, and a business name (at a minimum) in that table?

id                        email                                       business
1                        bob@joe.com                          GMC
2                        joe@joe.com                            GMC
3                        joe@joe.com                            Ford
4                        bob@joe.com                          GMC

And in this example you'd want to delete only record 4, correct?

Back up your data before you try this, but it should look like this:

Delete from MyTable T Where EXISTS (Select ID from MyTable where email=T.email and business=T.business and ID<T.ID);
0
 
TecTaoMCAuthor Commented:
In your example I would only want to delete one record, record 4 Bob@joe.com / GMC.

That's what's what I'm having a hard time getting my arms around.  It really doesn't matter which record as long as in the the business GMC only has one bob@joe.com.  Record 1 or record 4 could be deleted.  Business Ford would be untouched since there is only one record.  But if business FORD had two, or three or ten joe@joe.com,  all would need to be deleted so there is only one record.

Thanks for your response so quickly.
0
 
Daniel WilsonCommented:
So, can you adapt my code to your actual field names?  If not, would you post your table definition so I can write some code for you?
0
Transaction-level recovery for Oracle database

Veeam Explore for Oracle delivers low RTOs and RPOs with agentless transaction log backup and transaction-level recovery of Oracle databases. You can restore the database to a precise point in time, even to a specific transaction.

 
Ray PaseurCommented:
No points for this please, because it's not an answer to your current question, but the question itself illustrates why we use relational databases.  Each email is an atomic piece of data and each business is an atomic piece of data.  In a relational data base design, you would have your tables looking something like this.

id     email                                       
1      bob@joe.com                       
2      joe@joe.com                         

id     business
1      GMC
3      Ford

Open in new window


Then to create the relationships between the email and business tables, you would create a "pivot" table.

id     email_id    business_id
1         1           1                    <- relates bob@joe.com  to GMC
2         2           1                    <- relates joe@joe.com  to GMC
3         2           3                    <- relates joe@joe.com  to Ford

Open in new window


For some good discussion on this subject, make a Google search for the exact phrase "Should I Normalize My Database?" and read the thoughtful positions on different sides of the question
0
 
Daniel WilsonCommented:
Great points, Ray!
0
 
TecTaoMCAuthor Commented:
That is a good point Ray.  I am using relational tables throughout this application.  There are a number of tables that pertain to specific modules.  I chose to do this so I wouldn't be writing join statements on every query.  Some I do pertaining to advertisement specials.  Others I don't.  In this case the consumer in this table consumer_mgmt may or may not have joined the consumer signup, but wants to receive some special or the business owner has added them with their permission.

That being said, I am going to read your suggestion on "Should I Normalize My Database"

Daniel, Once I've finished my coffee and read the paper I will be getting back to your last post and write out the query to test on a test table.  I'll post for your review.

Thank you both for your interest and help

Mike
0
 
Ray PaseurCommented:
... so I wouldn't be writing join statements on every query.
Yeah, I get it!  If you really want to save yourself from writing JOINs learn about the Laravel framework.  It includes an object-relational model called Eloquent that can be used outside of Laravel, if you want.  Eloquent implements an active-record design pattern in a way that uses foreign keys to build your queries with an under-the-covers intelligence about JOIN statements.  The magic is in here:
http://laravel.com/docs/4.2/eloquent#relationships
0
 
TecTaoMCAuthor Commented:
Hi Daniel, from your earlier post here is the database structure (id is a unique primary field):

id | c_mid | c_username | b_id | b_username | c_name | c_email | b_cat1 | b_cat2 | datein

The query:

Delete from consumer_mgmt CM Where EXISTS (Select id from consumer_mgmt where c_email=CM.c_email and b_username=CM.b_username and id<CM.id);
0
 
TecTaoMCAuthor Commented:
Problem with the code, I'm getting an error when I run it through the run sql, I edit squl with phpMyAdmin.

Delete from `consumer_mgmtPurgeTest` `CM` Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`=`CM`.`c_email` and `b_username`=`CM`.`b_username` and `id` < `CM`.`id` )

Open in new window

0
 
Daniel WilsonCommented:
Code looks OK.  What's the error message?
0
 
TecTaoMCAuthor Commented:
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '`CM` Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`=`CM' at line 1

I've checked all the backticks but still brings up the error
0
 
Daniel WilsonCommented:
The alias was part of the problem ... but this doesn't work either:
Delete from `consumer_mgmtPurgeTest`  Where EXISTS (Select `id` from `consumer_mgmtPurgeTest` where `c_email`= consumer_mgmtPurgeTest.`c_email` and `b_username`=consumer_mgmtPurgeTest.`b_username` and `id` < consumer_mgmtPurgeTest.`id` );

Open in new window


So ... a DELETE with a JOIN seems in order. The syntax works on this one.

delete  CM  from `consumer_mgmtPurgeTest` CM0 inner join  `consumer_mgmtPurgeTest` CM ON CM.`c_email`= CM0.`c_email` and CM.`b_username`= CM0.`b_username` and CM0.`id` < CM.`id`;

Open in new window

0
 
TecTaoMCAuthor Commented:
Daniel, thanks.  It worked perfectly.  I ran a couple of dry runs on copied tables with multiple entries and worked just fine.

A couple of questions.

What was your logic to just a join statement, joining the same table with different identifiers?

Second, I'm not sure I understand the logic of CMO.1id` CM.`id`.  The less than command.
0
 
Daniel WilsonCommented:
Yeah, self-joins get a bit confusing.

I need to delete SOME of the records?  Which ones?  Ones that match others in the table.  Match in the table ... that's a JOIN.

OK, but if I just join on the email and business name, every record matches something -- at least itself.  So if I just join
ON CM.`c_email`= CM0.`c_email` and CM.`b_username`= CM0.`b_username`

Open in new window

all records match, I delete them, and the table is empty.  Bad move.

I need to delete those that have a match OTHER than themselves. Unique to themselves ... ID gives us that. Not themselves ... this looks like a solution.
 and CM0.`id` <> CM.`id`

Open in new window


But it's not b/c then it gets rid of BOTH the duplicate records.  I want 1 record left where there were duplicates.  I could leave the least or leave the greatest.  I chose to delete those that had one less ... meaning the least is retained.

 and CM0.`id` < CM.`id`

Open in new window


I could as easily have left the greatest.
 and CM0.`id` > CM.`id`

Open in new window

0
 
TecTaoMCAuthor Commented:
Boy that is thought provoking.  I've re-read it a few times and it's barely sinking in.  Will re-read to get the logic.
Daniel, thanks for all the help and thanks for the explanation.   It helps to expand my knowledge.

I'll be closing this out.  Top grade, top points.  You have been very helpful.  I have a couple of other functions I'm stumbling through and when I get to a point to post I'll send you a heads up.

Again, thank you for your time and knowledge.
Mike
0
 
TecTaoMCAuthor Commented:
Daniel was outstanding.  Not only did he understand what I was trying to achieve, but he walked me through with his coding suggestions.  He was prompt, looked into problems I ran into the first go around.  And after completion, he patiently answered questions I had to understand the logic of his solution.
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 8
  • 6
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now