Solved

MYSQL 1 Row Less :(

Posted on 2014-03-08
8
350 Views
Last Modified: 2014-04-02
Hi All,

Im pretty new to MYSQL and between you guys and You Tube I seem to be getting there.  basically I have a marketing system that has "Email Subscribers" and "Excluded Emails".  If I have an email in my subscriber table which also exists in my Excluded table then when I try to send out emails to my group any subscriber email that has a duplicate in the exclusion table will not go out.  Which sounds perfect unless sometimes people make mistakes when adding contacts and you end up adding emails to the exclusion table when you should be adding to the subscriber table.

Any hooo I have written some SQL to remove these duplicates which looks good apart from one key fact that it looks like its 1 row short.   This is what I did:


select count(*) from ACDEV1.em_exclusion
Inner join ACDEV1.em_subscriber
ON em_exclusion.email = em_subscriber.email

** I ran the above which gave me 71,477 duplicates

create table ACDEV1.temp_ids (id int(10))

** I ran the above to give me a temp table

insert into ACDEV1.temp_ids select em_exclusion.id from ACDEV1.em_exclusion  Inner join ACDEV1.em_subscriber  ON em_exclusion.email = em_subscriber.email

*** Did the above to copy all the IDS of the excluded emails that duplicate with the subscriber table (Do I can delete against later) Which again was 71,477 inserts

delete ACDEV1.em_exclusion_list from ACDEV1.em_exclusion_list inner join ACDEV1.temp_ids on em_exclusion_list.id=temp_ids.id

** Then did the above to delete all items from the exclusion.list table (This is just a data table that holds settings per exluded user).  This only returned 71,476 (1 short)

delete ACDEV1.em_exclusion from ACDEV1.em_exclusion inner join ACDEV1.temp_ids on em_exclusion.id=temp_ids.id

*** I then did the above to remove the actual emails themselves from the email table, which ended up being 1 short again 71,476

select count(*) from ACDEV1.em_exclusion
Inner join ACDEV1.em_subscriber
ON em_exclusion.email = em_subscriber.email

*** now when I run the above it shows 0 rows :(  

WHAT HAPPENS to the 1 missing row


So the question is,  is there a way I should test this / is this a known thing or a header row or something I dont know :(

Basically this looks bullet proof to me but why would there be 1 row less ?

Terry
0
Comment
Question by:ncomper
  • 5
  • 3
8 Comments
 
LVL 5

Author Comment

by:ncomper
ID: 39914532
And just to add I suppose the thing I would be looking for is to know what would you experts do in this situation how would you test what went wrong to find the missing record ?
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39914816
I would say this means that in this table:
ACDEV1.temp_ids

you have 1 value of email_id being duplicated as such.

to find that value:
select email from 
ACDEV1.temp_ids
group by email 
having count(*) > 1

Open in new window

0
 
LVL 5

Author Comment

by:ncomper
ID: 39914858
Hi it was actually

SELECT id
FROM ACDEV1.temp_ids
GROUP BY id
HAVING COUNT( * ) >1
LIMIT 0 , 30

However it showed 24 ID that had duplicates, which is a step forward,  but would, be interested how experts would trouble shoot this "1 row less" issue further ?
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39914880
by knowing this "issue" that a JOIN can return more rows if you have a "many-to-many" relationship, resp in both tables the same values more than once.
the GROUP BY ,... HAVING COUNT(*) > 1 is the key to identify the actual values in such scenarios
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 5

Author Comment

by:ncomper
ID: 39914892
OK cool thanks so if it returned 24 rows i.e 24 duplicate ID's should that be enough for me to satisfy myself that my code is correct or would you still continue to find why there was 1 row less,  I dont know what is acceptable at being "correct" I need this script to be bullet proof
0
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 500 total points
ID: 39916152
you will need to learn that COUNT(*) on a JOIN may be "tricky".
say we consider a column (ID) in 2 tables, in which you join later.
if you have now in table1 2 rows with the same ID or 2 different ID's,
a SELECT COUNT(*) on the table1 will return 2 in both cases.
however, a SELECT COUNT(DISTINCT ID) on table 1 will return different results.
if the 2 rows have the same value for ID, it will return as result 1, otherwise 2!

now, if you JOIN that table with another table on the ID column, and consider that table has also 2 rows, with either 2 different values or 2 times the same value.
a select count(*) on the JOIN will return the following results:
same value on ID on both columns on both tables (all 4 values matching): 4
same value on ID in table 1, and different values on second table (only 1 value matching between table1 and table2): 2
different values on ID in both tables, with matching values on ID on the 2 tables: 2

so, before JOINing and then COUNTint, you need to first do a count for each value, before the join ...

select email, count(*) from table1 group by email
select email, count(*) from table2 group by email

select t1.email, t2.email, t1.cnt, t2.cnt
  from ( select email, count(*) cnt from table1 group by email ) t1
 full outer join ( select email, count(*) cnt from table2 group by email ) t2
   on t1.email = t2.email

...
0
 
LVL 5

Author Comment

by:ncomper
ID: 39916334
thanks will have a play with this tonight ;)
0
 
LVL 5

Author Closing Comment

by:ncomper
ID: 39971971
Thanks
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Loading csv or delimited data files to MySQL database is a very common task frequently questioned about and almost every time LOAD DATA INFILE comes to the rescue. Here we will try to understand some of the very common scenarios for loading data …
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now