GROUP BY multiple fields, to include all possible grouped permutations, and also non matching NULLS

Hi,

I have a table (below) for which I need to get a count of the ClientRef numbers, grouped by the two fields Age_Group_FK and VisitReason_FK. The grouping fields obviously join to the two other tables shown. I want ClientRef counts of zero for all permutations not shown,  the important point being that I need to get all the permutations of Age_Group vs VisitReason. I have entirely made this example up, but it serves for the purposes of illustration!

ClientRef  Age_Group_FK VisitReason_FK
---------------------------------------------------
B1                2                     4
B2                2                     3
B3                <NULL>          1
B4                2                     3
B5                2                     <NULL>
B6                2                     3
B7                2                     3
B8                1                     1

Age_Group_ID   Age_Group_Name
--------------------------------------------
1                             Under 15                
2                             16-24
3                             25-40
4                             Over 40

VisitReason_ID    Reason_Name
----------------------------------------
1                         Buy Magazine    
2                         Buy Groceries
3                         Goods Delivery
4                         Buy Alcohol
5                         Buy Pet Food

As you can see there'll need to be 20 (5 X 4) groupings in the final result, with many of them zero of course.

I'm using CROSS JOINS to do this:

select ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value
from Age_Group ag
cross join VisitReason vr
left join ClientRef_Table cr
  on cr.Age_Group_FK = ag.Age_Group_ID
 and cr.VisitReason_FK = vr.VisitReason_ID
group by ag.Age_Group_Name, vr.Reason_Name

....but, here's the bit I can't do:

In some instances there will be a <NULL> value in the client table and it won't have a lookup in any corresponding table, but I need it grouped. So, in the example, I'd like an extra 2 columns in the groupings: For the grouping of Age_Group 16-24 against <NULL> for VisitReason, and for Age_Group <NULL> against VistReason Magazine, totalling, you gueseed it, 22 grouped records altogether.



Currently, the way it does it is by providing an extra grouping with <NULL> in BOTH Age_Group and VisitReason columns. In fact, if there is a <NULL> value in any or either of the grouped columns they all get counted together as <NULL>,<NULL> for both grouped fields, leaving me with 21 grouped records (with a count of 2 in thi case). Unfortunately, putting a <NULL> value in the lookup tables is not possible.


Thanks in advance
Iain
iaing1000Asked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
JimFiveConnect With a Mentor Commented:
Since you changed your select to be count(cr.ClientRef) you can change your having to
HAVING count(cr.ClientRef) > 0 OR (ag.Age_Group_Name is Not Null AND vr.Reason_Name is Not Null)
--
JimFive
0
 
JimFiveCommented:
First why not put a 0 in the lookup_table for N/A or Unknown.

Second, try this as your from:
from (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
cross join (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
left join ClientRef_Table cr
  on cr.Age_Group_FK = ag.Age_Group_ID
 and cr.VisitReason_FK = vr.VisitReason_ID

--
JimFive
0
 
JimFiveCommented:
Oops that should be

ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
iaing1000Author Commented:
Hi,

Thanks for the fast response,

The query runs OK, but provides 30 rows   (4 + NULL=5) X (5 + NULL =6)

It is really imperative and a requirement that I return specifically 22 rows. That mean exactly two groupings for the two incidences of <NULL> values as I mentioned. I suspect that will be difficult...but that's why I'm at EE!

Thanks
Iain
0
 
iaing1000Author Commented:
Hi again,

To simplify the problem somewhat, we can forget about the grouping....the problem can be highlighted before that at the JOIN stage:

If we just join the cross join the two lookup tables (visitreason and age_group), then a RIGHT JOIN to clientref will yield 8 records, but records B3 and B5 both have NULL for both VisitReason and Age_Group???????

So the question becomes: What needs to happen in the join so that B3 correctly shows VisitReason as 'Buy Magazine', and similarly that B5 is listed as age 16-24?...once that is resolved I can use LEFT or FULL OUTER JOIN to get the requireed range of permutations of Age Group and Visit Reason


Thanks
Iain
0
 
JimFiveCommented:
I think what you're going to need to do is weed out those rows with NULL and a count of 0 in the where clause

WHERE sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)

--
JimFive
So the entire thing looks like:
SELECT ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value
FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
WHERE sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)

Open in new window

0
 
JimFiveCommented:
Eliminate the WHERE in that last and change to

GROUP BY ag.Age_Group_Name, vr.Reason_Name
HAVING sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)
 

   

0
 
iaing1000Author Commented:
Thanks again,

Will have to look at it tomorrow, but it looks good at a glance!

Cheers
Iain
0
 
iaing1000Author Commented:
Hi Jim,

I've just had a look at this. There's a major league problem though.

If the client table is empty there are no records returned.

In the example, even if the client table were empty, we would expect 20 records returned (for each of the possible permutations of Age_Group vs VisitReason) with a client count of zero for each one.

Thanks again
Iain
0
 
JimFiveCommented:
This is working for me:

SELECT ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value
FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
GROUP BY ag.Age_Group_Name, vr.Reason_Name
HAVING sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr.Reason_Name is NOT NULL) 

Open in new window

0
 
iaing1000Author Commented:
Hi,

I've done it with a slightly modified HAVING clause, as attached snippet, (otherewise it didn't allow for B3, with a <NULL> Age_Group, from the original example), is this what you meant? I've also changed the main SELECT to use Count() instead of SUM for the same reason....does that seem right?

I think my remaining problem lies in the fact that I will sometimes only select certain records from ClientRef_Table, say those who are 40 years old or less, and when I put the WHERE clause in to filter these, all the permutations of grouping are lost. To illustrate brutally, if I put WHERE cr.ClientRef='B7' into the SQL code after the COALESCE clauses then I'd like 20 records returned with only the Age_Group '16-24', Visit Reason 'Goods Delivery' permutation showing a non-zero count (specifically, a count of 1, since that is where B7 falls into).  

Thanks again ...sorry if I'm not particularly clear sometimes!

Iain
SELECT ag.Age_Group_Name, vr.Reason_Name, count(cr.ClientRef) as count_value
FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID 
GROUP BY ag.Age_Group_Name, vr.Reason_Name
HAVING sum(case when cr.VisitReason_FK is null then 0 else 1 end)  > 0 OR  sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr.Reason_Name is NOT NULL)  

Open in new window

0
 
JimFiveCommented:
To keep the permutations you will need to put the WHERE condition into the ON clause like this:
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
AND cr.ClientRef = 'B7'

--
JimFive
0
 
iaing1000Author Commented:
Hi,

That's looking very good now. One final thing, though.

I'd like to have the cross joined tables referred to as a single aliased entity. Is this possible?

Thanks

Something like:.....but it's not liking my example code!
FROM  ((SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr) as all_links
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
etc.

Open in new window

0
 
JimFiveCommented:
I don't really recommend it as it seems to be harder to read, but this should work.
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name
           FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
                       CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr) as all_links

What you might want to do though is create a view called all_links that contains the cross join then you can just do
FROM all_links LEFT JOIN ClientRef_Table

You'll get various reactions to this idea, from an efficiency standpoint it may be slightly less efficient, but I like to separate out derived tables if they are either going to be reused a lot, or if they are getting confusing to read.
--
JimFive
0
 
iaing1000Author Commented:
Hi,

Thanks ever so much, that works well.

There is, of course, one last thing that I forgot!...my apologies

I want to be able to group by gender, but there is no look up table. Is there a way to CROSS JOIN the values for gender (namely: <NULL>,'Male','Female') into all_links?...by just creating the values on the fly?

I realise that this could be done by selecting DISTINCT(Gender) from ClientRef_Table but that may be quite slow (let's imagine there's a hypothetical gender field  in ClientRef_Table, since this is only a made up example). My real dataset looks like it will require a nice CROSS JOIN which contains all groupings I need (at some point I may indeed put this into a view or similar to make it easier to read and of course have the ability to re-use it).

Thanks once again,
Iain
0
 
JimFiveCommented:
Sure, do it just like I added the nulls to the other tables add
CROSS JOIN (Select Null as Gender UNION Select 'Male' as Gender UNION Select 'Female' as Gender) gender
0
 
iaing1000Author Commented:
To illustrate my last point, this will work if there is a Gender field in ClientRef_Table. But I'd like it without the SELCT DISTINCT(Gender) FROM ClientRef_Table if possible and instead have something with the literal values plugged in...

Cheers
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender
   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 
      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
      CROSS JOIN (SELECT DISTINCT(Gender) FROM ClientRef_Table) gr) as all_links
LEFT JOIN (SELECT * FROM ClientRef_Table WHERE ClientRef like 'B3%') cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
AND cr.Gender = all_links.Gender
GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender
HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is Not Null)

Open in new window

0
 
iaing1000Author Commented:
Looking very nearly there,

What do I need to use for the COALESCE for Gender? It doesn't seem to like 0 because of a type conversion obviously, but if I put NULL in its place it doesn't return NULL genders from ClientRef_Table ither?

Cheers
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender
   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 
      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
      CROSS JOIN (Select Null as Gender UNION Select 'Male' as Gender UNION Select 'Female' as Gender) gr) as all_links
LEFT JOIN (SELECT * FROM ClientRef_Table WHERE ClientRef like 'B3%') cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
AND COALESCE (cr.Gender,NULL) = all_links.Gender
GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender
HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is Not Null)

Open in new window

0
 
JimFiveCommented:
You're right, that doesn't quite work.  COALESCE returns the first argument that is NOT NULL.
So for your gender you'll want something like (SELECT 'Unknown' as Gender UNION Select 'Male' as Gender UNION SELECT 'Female' as Gender) as gender

And then COALESCE(cr.Gender,'Unknown') = all_links.Gender

If you want it to show <NULL> like the other fields use '<NULL>' to fake it.
0
 
iaing1000Author Commented:
Hi,

I eventually saw a way to fake it...in a way that seemed to standardise as much as possible with the rest of it!

I'm doing a whole bunch of similar reports which are built up via strings in code, based on user input, so I want to keep it as formulaic as possible. For instance, all the elements of the final clause of the HAVING part have the 'IS NOT NULL' and this means I can generate it simply in a loop.....hence my need to have the Gender NULLs thing working.

Many thanks for staying with me on this tricky one.

Thanks once again

All the best
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender_ID, Gender
   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 
      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
      CROSS JOIN (Select '' as Gender_ID, NULL as Gender UNION Select 'Male' as Gender_ID, 'Male' as Gender UNION Select 'Female' as Gender_ID, 'Female' as Gender) gr) as all_links
LEFT JOIN (SELECT * FROM ClientRef_Table) cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
AND COALESCE(cr.Gender,'') = all_links.Gender_ID
GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender
HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is NOT NULL)

Open in new window

0
All Courses

From novice to tech pro — start learning today.