Solved

MYSQL 1 Row Less :(

Posted on 2014-03-08
8
357 Views
Last Modified: 2014-04-02
Hi All,

Im pretty new to MYSQL and between you guys and You Tube I seem to be getting there.  basically I have a marketing system that has "Email Subscribers" and "Excluded Emails".  If I have an email in my subscriber table which also exists in my Excluded table then when I try to send out emails to my group any subscriber email that has a duplicate in the exclusion table will not go out.  Which sounds perfect unless sometimes people make mistakes when adding contacts and you end up adding emails to the exclusion table when you should be adding to the subscriber table.

Any hooo I have written some SQL to remove these duplicates which looks good apart from one key fact that it looks like its 1 row short.   This is what I did:


select count(*) from ACDEV1.em_exclusion
Inner join ACDEV1.em_subscriber
ON em_exclusion.email = em_subscriber.email

** I ran the above which gave me 71,477 duplicates

create table ACDEV1.temp_ids (id int(10))

** I ran the above to give me a temp table

insert into ACDEV1.temp_ids select em_exclusion.id from ACDEV1.em_exclusion  Inner join ACDEV1.em_subscriber  ON em_exclusion.email = em_subscriber.email

*** Did the above to copy all the IDS of the excluded emails that duplicate with the subscriber table (Do I can delete against later) Which again was 71,477 inserts

delete ACDEV1.em_exclusion_list from ACDEV1.em_exclusion_list inner join ACDEV1.temp_ids on em_exclusion_list.id=temp_ids.id

** Then did the above to delete all items from the exclusion.list table (This is just a data table that holds settings per exluded user).  This only returned 71,476 (1 short)

delete ACDEV1.em_exclusion from ACDEV1.em_exclusion inner join ACDEV1.temp_ids on em_exclusion.id=temp_ids.id

*** I then did the above to remove the actual emails themselves from the email table, which ended up being 1 short again 71,476

select count(*) from ACDEV1.em_exclusion
Inner join ACDEV1.em_subscriber
ON em_exclusion.email = em_subscriber.email

*** now when I run the above it shows 0 rows :(  

WHAT HAPPENS to the 1 missing row


So the question is,  is there a way I should test this / is this a known thing or a header row or something I dont know :(

Basically this looks bullet proof to me but why would there be 1 row less ?

Terry
0
Comment
Question by:ncomper
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 5

Author Comment

by:ncomper
ID: 39914532
And just to add I suppose the thing I would be looking for is to know what would you experts do in this situation how would you test what went wrong to find the missing record ?
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39914816
I would say this means that in this table:
ACDEV1.temp_ids

you have 1 value of email_id being duplicated as such.

to find that value:
select email from 
ACDEV1.temp_ids
group by email 
having count(*) > 1

Open in new window

0
 
LVL 5

Author Comment

by:ncomper
ID: 39914858
Hi it was actually

SELECT id
FROM ACDEV1.temp_ids
GROUP BY id
HAVING COUNT( * ) >1
LIMIT 0 , 30

However it showed 24 ID that had duplicates, which is a step forward,  but would, be interested how experts would trouble shoot this "1 row less" issue further ?
0
Increase Agility with Enabled Toolchains

Connect your existing build, deployment, management, monitoring, and collaboration platforms. From Puppet to Chef, HipChat to Slack, ServiceNow to JIRA, Splunk to New Relic and beyond, hand off data between systems to engage the right people.

Connect with xMatters.

 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39914880
by knowing this "issue" that a JOIN can return more rows if you have a "many-to-many" relationship, resp in both tables the same values more than once.
the GROUP BY ,... HAVING COUNT(*) > 1 is the key to identify the actual values in such scenarios
0
 
LVL 5

Author Comment

by:ncomper
ID: 39914892
OK cool thanks so if it returned 24 rows i.e 24 duplicate ID's should that be enough for me to satisfy myself that my code is correct or would you still continue to find why there was 1 row less,  I dont know what is acceptable at being "correct" I need this script to be bullet proof
0
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 500 total points
ID: 39916152
you will need to learn that COUNT(*) on a JOIN may be "tricky".
say we consider a column (ID) in 2 tables, in which you join later.
if you have now in table1 2 rows with the same ID or 2 different ID's,
a SELECT COUNT(*) on the table1 will return 2 in both cases.
however, a SELECT COUNT(DISTINCT ID) on table 1 will return different results.
if the 2 rows have the same value for ID, it will return as result 1, otherwise 2!

now, if you JOIN that table with another table on the ID column, and consider that table has also 2 rows, with either 2 different values or 2 times the same value.
a select count(*) on the JOIN will return the following results:
same value on ID on both columns on both tables (all 4 values matching): 4
same value on ID in table 1, and different values on second table (only 1 value matching between table1 and table2): 2
different values on ID in both tables, with matching values on ID on the 2 tables: 2

so, before JOINing and then COUNTint, you need to first do a count for each value, before the join ...

select email, count(*) from table1 group by email
select email, count(*) from table2 group by email

select t1.email, t2.email, t1.cnt, t2.cnt
  from ( select email, count(*) cnt from table1 group by email ) t1
 full outer join ( select email, count(*) cnt from table2 group by email ) t2
   on t1.email = t2.email

...
0
 
LVL 5

Author Comment

by:ncomper
ID: 39916334
thanks will have a play with this tonight ;)
0
 
LVL 5

Author Closing Comment

by:ncomper
ID: 39971971
Thanks
0

Featured Post

Turn Insights Into Action

You’ve already invested in ITSM tools, chat applications, automation utilities, and more. Fortify these solutions with intelligent communications so you can drive business processes forward.

With xMatters, you'll never miss a beat.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As a database administrator, you may need to audit your table(s) to determine whether the data types are optimal for your real-world data needs.  This Article is intended to be a resource for such a task. Preface The other day, I was involved …
This post contains step-by-step instructions for setting up alerting in Percona Monitoring and Management (PMM) using Grafana.
Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software, including Percona Server for MySQL (https://www.percona.com/software/mysql-database/percona-server) and MongoDB (https://www.percona.com/software/mongo-…
This is my first video review of Microsoft Bookings, I will be doing a part two with a bit more information, but wanted to get this out to you folks.

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question