• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 400
  • Last Modified:

MySQL: Subtract one SELECT with multiple WHEREs from another SELECT with multiple WHEREs

A table of Contacts (names & addresses) is linked to a table of Groups in a many to many relationship, which relationship is indexed in the table groups_has_contacts.

I need a query that will provide the selection of all the contacts in more than one of the groups without duplicating any of the contacts. The query also needs to allow for the subtraction of contacts in any number of the other groups a contact might be in.

The following gets close, but any contact which is in both of the selected groups (in the WHERE clause) will be duplicated in the results.

SELECT `contacts`.`idContacts`
     , `contacts`.`First_Name`, `contacts`.`Last_Name`
     , `Contacts_idContacts` , `contacts`.`Email1` , `Groups_idGroup`, `Group`
FROM `libertytools`.`contacts`
JOIN `libertytools`.`groups_has_contacts`
   ON Contacts_idContacts=idContacts
JOIN `libertytools`.`groups` ON Groups_idGroup=idGroup
WHERE (Groups_idGroup = 20 OR Groups_idGroup = 21)
AND NOT EXISTS (
   SELECT 1
   FROM `libertytools`.`groups_has_contacts` gc2
   WHERE (gc2.Groups_idGroup = 350 OR gc2.Groups_idGroup = 250)
   AND gc2.Contacts_idContacts=idContacts
)
AND (`contacts`.`Email1`
IS NOT NULL
AND `contacts`.`Email1` <> '')
order by `contacts`.`Last_Name` ASC;

SELECT `contacts`.`idContacts` 
     , `contacts`.`First_Name`, `contacts`.`Last_Name`
     , `Contacts_idContacts` , `contacts`.`Email1` , `Groups_idGroup`, `Group`
FROM `libertytools`.`contacts` 
JOIN `libertytools`.`groups_has_contacts` 
   ON Contacts_idContacts=idContacts 
JOIN `libertytools`.`groups` ON Groups_idGroup=idGroup
WHERE (Groups_idGroup = 20 OR Groups_idGroup = 21)
AND NOT EXISTS (
   SELECT 1
   FROM `libertytools`.`groups_has_contacts` gc2
   WHERE (gc2.Groups_idGroup = 350 OR gc2.Groups_idGroup = 250)
   AND gc2.Contacts_idContacts=idContacts
)
AND (`contacts`.`Email1` 
IS NOT NULL 
AND `contacts`.`Email1` <> '') 
order by `contacts`.`Last_Name` ASC;

Open in new window


Note that the last clause that checks for an email needs to be appendable to the end of the query -- we toggle it on and off.

Thanks!
0
Ron1959
Asked:
Ron1959
  • 6
  • 5
1 Solution
 
Kevin CrossChief Technology OfficerCommented:
Hi. Do you need to maintain the data from the `groups` and `groups_has_contacts` tables in the final selection?
0
 
Ron1959Author Commented:
No, we only need the data from the Contacts table.  We may want the name of the groups, but we can grab that in another query, if need be.

The main select could be:

SELECT `contacts`.`idContacts`,  `contacts`.`First_Name`, `contacts`.`Last_Name`,
    `contacts`.`Email1`
0
 
Kevin CrossChief Technology OfficerCommented:
Okay, so here is my first thought. It is based on only needing contact information. What you can do is convert the inclusion in a similar fashion as the exclusion, i.e., use EXISTS.

SELECT `contacts`.`idContacts`, `contacts`.`First_Name`
     , `contacts`.`Last_Name`, `contacts`.`Email1`
FROM `libertytools`.`contacts` 
WHERE EXISTS ( -- include contacts from specific groups
   SELECT 1
   FROM `libertytools`.`groups_has_contacts` gc2
   WHERE gc2.Groups_idGroup IN ( 20, 21 )
   AND gc2.Contacts_idContacts=idContacts
)
AND NOT EXISTS ( -- exclude those in other groups
   SELECT 1
   FROM `libertytools`.`groups_has_contacts` gc2
   WHERE gc2.Groups_idGroup IN ( 350, 250 )
   AND gc2.Contacts_idContacts=idContacts
)
-- direct comparision of Email1 should eliminate NULLs
AND (`contacts`.`Email1` <> '') 
ORDER BY `contacts`.`Last_Name` ASC;

Open in new window


Now, if you need to include the group information, you can take your original query and add a GROUP BY on all the `contacts` columns and then use GROUP_CONCAT() to show a comma-separated list of the group names to which the contact belongs. The latter option, might look something like this:

SELECT `contacts`.`idContacts` 
     , `contacts`.`First_Name`, `contacts`.`Last_Name`
     , `contacts`.`Email1`
     , GROUP_CONCAT(`Group` ORDER BY `idGroup`) `Groups`
FROM `libertytools`.`contacts` 
JOIN `libertytools`.`groups_has_contacts` 
   ON Contacts_idContacts=idContacts 
JOIN `libertytools`.`groups` ON Groups_idGroup=idGroup
WHERE Groups_idGroup IN ( 20, 21 )
AND NOT EXISTS (
   SELECT 1
   FROM `libertytools`.`groups_has_contacts` gc2
   WHERE gc2.Groups_idGroup IN ( 350, 250 )
   AND gc2.Contacts_idContacts=idContacts
)
AND `contacts`.`Email1` <> ''
ORDER BY `contacts`.`Last_Name` ASC;

Open in new window

0
[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

 
Ron1959Author Commented:
Kevin, you make this look too easy!

I can see that I've just been scraping the surface of MySQL capability.  I look forward to learning more.

Thanks, again!
0
 
Ron1959Author Commented:
The WHERE EXISTS syntax seems more straight forward than joins.

Other than not being able to draw field names from the other tables, are there any disadvantages to using WHERE EXISTS rather than JOINS?
0
 
Kevin CrossChief Technology OfficerCommented:
You are most welcome. With an attitude like yours towards "learning," I am certain it will become "easy" for you also. Good luck in your adventures. If you ever need guidance, you know where to find us.

Best regards and happy coding,
Kevin
0
 
Kevin CrossChief Technology OfficerCommented:
You named it. But as you have seen here, the disadvantage may be an advantage when there is a one-to-many relationship but you need unique rows. You have saved on overhead of a GROUP BY and as such even if EXISTS added a small amount of expense to the query, it should be less than having to group the final result. The EXISTS is a short-circuiting method, so as soon as it finds a row it is satisfied in either case. On the contrary, a NOT IN clause would have to check every value to ensure that there was definitely not a match later done the line. In short, the INNER JOIN and the EXISTS probably are treated the same by the optimizer.

To see for yourself, try doing an EXPLAIN or DESCRIBE on both queries and examine the result in comparison to each other.
0
 
Ron1959Author Commented:
Hummm....  Upon further reflection, I do see a need to pull data unique to each contact from the groups_has_contacts table.

I had forgotten that we started to track the dates contacts were added to the groups.  We have a groups_has_contacts.Create_Date field.

Does that put me back to JOINS or is there still a way to use the EXISTS calls?
0
 
Kevin CrossChief Technology OfficerCommented:
Back to the joins. And sorry for my type-o. I said GROUP BY, but didn't include it.

SELECT `contacts`.`idContacts` 
     , `contacts`.`First_Name`, `contacts`.`Last_Name`
     , `contacts`.`Email1`
     , GROUP_CONCAT(`Group` ORDER BY `idGroup`) `Groups`
FROM `libertytools`.`contacts` 
JOIN `libertytools`.`groups_has_contacts` 
   ON Contacts_idContacts=idContacts 
JOIN `libertytools`.`groups` ON Groups_idGroup=idGroup
WHERE Groups_idGroup IN ( 20, 21 )
AND NOT EXISTS (
   SELECT 1
   FROM `libertytools`.`groups_has_contacts` gc2
   WHERE gc2.Groups_idGroup IN ( 350, 250 )
   AND gc2.Contacts_idContacts=idContacts
)
AND `contacts`.`Email1` <> ''
GROUP BY `contacts`.`idContacts` 
     , `contacts`.`First_Name`, `contacts`.`Last_Name`
     , `contacts`.`Email1`
ORDER BY `contacts`.`Last_Name` ASC;

Open in new window


You can add a MAX() on date or use GROUP_CONCAT() on that also.
0
 
Ron1959Author Commented:
I'm impressed that you caught the need to accommodate more than one group name!

This works great, but I don't understand the GROUP BY.  (Without it, the query only returns the first record in the Contacts table.)  

- Ron
0
 
Kevin CrossChief Technology OfficerCommented:
Yes, that is a MySQL thing. It was definitely a type-o to not include the GROUP BY earlier. In other platforms like MS SQL, if you leave off the GROUP BY on columns not in an aggregate, then you will get an error saying column is not in aggregated or included in grouping. In MySQL, you can leave off columns in the SELECT no in an aggregate and MySQL will just grab a row. Think FIRST or LAST function of MS Access.

i.e., GROUP BY `contacts`.`idContacts`

Likely is sufficient for MySQL, but I prefer to stay with the syntax more clear to what I am doing. And it allows me to use copy and paste more readily to other platforms.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now