• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 363
  • Last Modified:

MYSQL 1 Row Less :(

Hi All,

Im pretty new to MYSQL and between you guys and You Tube I seem to be getting there.  basically I have a marketing system that has "Email Subscribers" and "Excluded Emails".  If I have an email in my subscriber table which also exists in my Excluded table then when I try to send out emails to my group any subscriber email that has a duplicate in the exclusion table will not go out.  Which sounds perfect unless sometimes people make mistakes when adding contacts and you end up adding emails to the exclusion table when you should be adding to the subscriber table.

Any hooo I have written some SQL to remove these duplicates which looks good apart from one key fact that it looks like its 1 row short.   This is what I did:


select count(*) from ACDEV1.em_exclusion
Inner join ACDEV1.em_subscriber
ON em_exclusion.email = em_subscriber.email

** I ran the above which gave me 71,477 duplicates

create table ACDEV1.temp_ids (id int(10))

** I ran the above to give me a temp table

insert into ACDEV1.temp_ids select em_exclusion.id from ACDEV1.em_exclusion  Inner join ACDEV1.em_subscriber  ON em_exclusion.email = em_subscriber.email

*** Did the above to copy all the IDS of the excluded emails that duplicate with the subscriber table (Do I can delete against later) Which again was 71,477 inserts

delete ACDEV1.em_exclusion_list from ACDEV1.em_exclusion_list inner join ACDEV1.temp_ids on em_exclusion_list.id=temp_ids.id

** Then did the above to delete all items from the exclusion.list table (This is just a data table that holds settings per exluded user).  This only returned 71,476 (1 short)

delete ACDEV1.em_exclusion from ACDEV1.em_exclusion inner join ACDEV1.temp_ids on em_exclusion.id=temp_ids.id

*** I then did the above to remove the actual emails themselves from the email table, which ended up being 1 short again 71,476

select count(*) from ACDEV1.em_exclusion
Inner join ACDEV1.em_subscriber
ON em_exclusion.email = em_subscriber.email

*** now when I run the above it shows 0 rows :(  

WHAT HAPPENS to the 1 missing row


So the question is,  is there a way I should test this / is this a known thing or a header row or something I dont know :(

Basically this looks bullet proof to me but why would there be 1 row less ?

Terry
0
ncomper
Asked:
ncomper
  • 5
  • 3
1 Solution
 
ncomperAuthor Commented:
And just to add I suppose the thing I would be looking for is to know what would you experts do in this situation how would you test what went wrong to find the missing record ?
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
I would say this means that in this table:
ACDEV1.temp_ids

you have 1 value of email_id being duplicated as such.

to find that value:
select email from 
ACDEV1.temp_ids
group by email 
having count(*) > 1

Open in new window

0
 
ncomperAuthor Commented:
Hi it was actually

SELECT id
FROM ACDEV1.temp_ids
GROUP BY id
HAVING COUNT( * ) >1
LIMIT 0 , 30

However it showed 24 ID that had duplicates, which is a step forward,  but would, be interested how experts would trouble shoot this "1 row less" issue further ?
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
by knowing this "issue" that a JOIN can return more rows if you have a "many-to-many" relationship, resp in both tables the same values more than once.
the GROUP BY ,... HAVING COUNT(*) > 1 is the key to identify the actual values in such scenarios
0
 
ncomperAuthor Commented:
OK cool thanks so if it returned 24 rows i.e 24 duplicate ID's should that be enough for me to satisfy myself that my code is correct or would you still continue to find why there was 1 row less,  I dont know what is acceptable at being "correct" I need this script to be bullet proof
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
you will need to learn that COUNT(*) on a JOIN may be "tricky".
say we consider a column (ID) in 2 tables, in which you join later.
if you have now in table1 2 rows with the same ID or 2 different ID's,
a SELECT COUNT(*) on the table1 will return 2 in both cases.
however, a SELECT COUNT(DISTINCT ID) on table 1 will return different results.
if the 2 rows have the same value for ID, it will return as result 1, otherwise 2!

now, if you JOIN that table with another table on the ID column, and consider that table has also 2 rows, with either 2 different values or 2 times the same value.
a select count(*) on the JOIN will return the following results:
same value on ID on both columns on both tables (all 4 values matching): 4
same value on ID in table 1, and different values on second table (only 1 value matching between table1 and table2): 2
different values on ID in both tables, with matching values on ID on the 2 tables: 2

so, before JOINing and then COUNTint, you need to first do a count for each value, before the join ...

select email, count(*) from table1 group by email
select email, count(*) from table2 group by email

select t1.email, t2.email, t1.cnt, t2.cnt
  from ( select email, count(*) cnt from table1 group by email ) t1
 full outer join ( select email, count(*) cnt from table2 group by email ) t2
   on t1.email = t2.email

...
0
 
ncomperAuthor Commented:
thanks will have a play with this tonight ;)
0
 
ncomperAuthor Commented:
Thanks
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now