Solved

GROUP BY multiple fields, to include all possible grouped permutations, and also non matching NULLS

Posted on 2007-12-05
20
231 Views
Last Modified: 2007-12-06
Hi,

I have a table (below) for which I need to get a count of the ClientRef numbers, grouped by the two fields Age_Group_FK and VisitReason_FK. The grouping fields obviously join to the two other tables shown. I want ClientRef counts of zero for all permutations not shown,  the important point being that I need to get all the permutations of Age_Group vs VisitReason. I have entirely made this example up, but it serves for the purposes of illustration!

ClientRef  Age_Group_FK VisitReason_FK
---------------------------------------------------
B1                2                     4
B2                2                     3
B3                <NULL>          1
B4                2                     3
B5                2                     <NULL>
B6                2                     3
B7                2                     3
B8                1                     1

Age_Group_ID   Age_Group_Name
--------------------------------------------
1                             Under 15                
2                             16-24
3                             25-40
4                             Over 40

VisitReason_ID    Reason_Name
----------------------------------------
1                         Buy Magazine    
2                         Buy Groceries
3                         Goods Delivery
4                         Buy Alcohol
5                         Buy Pet Food

As you can see there'll need to be 20 (5 X 4) groupings in the final result, with many of them zero of course.

I'm using CROSS JOINS to do this:

select ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value
from Age_Group ag
cross join VisitReason vr
left join ClientRef_Table cr
  on cr.Age_Group_FK = ag.Age_Group_ID
 and cr.VisitReason_FK = vr.VisitReason_ID
group by ag.Age_Group_Name, vr.Reason_Name

....but, here's the bit I can't do:

In some instances there will be a <NULL> value in the client table and it won't have a lookup in any corresponding table, but I need it grouped. So, in the example, I'd like an extra 2 columns in the groupings: For the grouping of Age_Group 16-24 against <NULL> for VisitReason, and for Age_Group <NULL> against VistReason Magazine, totalling, you gueseed it, 22 grouped records altogether.



Currently, the way it does it is by providing an extra grouping with <NULL> in BOTH Age_Group and VisitReason columns. In fact, if there is a <NULL> value in any or either of the grouped columns they all get counted together as <NULL>,<NULL> for both grouped fields, leaving me with 21 grouped records (with a count of 2 in thi case). Unfortunately, putting a <NULL> value in the lookup tables is not possible.


Thanks in advance
Iain
0
Comment
Question by:iaing1000
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 10
20 Comments
 
LVL 15

Expert Comment

by:JimFive
ID: 20412923
First why not put a 0 in the lookup_table for N/A or Unknown.

Second, try this as your from:
from (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
cross join (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
left join ClientRef_Table cr
  on cr.Age_Group_FK = ag.Age_Group_ID
 and cr.VisitReason_FK = vr.VisitReason_ID

--
JimFive
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20412942
Oops that should be

ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
0
 

Author Comment

by:iaing1000
ID: 20413285
Hi,

Thanks for the fast response,

The query runs OK, but provides 30 rows   (4 + NULL=5) X (5 + NULL =6)

It is really imperative and a requirement that I return specifically 22 rows. That mean exactly two groupings for the two incidences of <NULL> values as I mentioned. I suspect that will be difficult...but that's why I'm at EE!

Thanks
Iain
0
Edgartown IT Case Study

Learn about Edgartown's quest to ensure the safety and security of the entire town's employee and citizen data. Read the case study!

 

Author Comment

by:iaing1000
ID: 20413571
Hi again,

To simplify the problem somewhat, we can forget about the grouping....the problem can be highlighted before that at the JOIN stage:

If we just join the cross join the two lookup tables (visitreason and age_group), then a RIGHT JOIN to clientref will yield 8 records, but records B3 and B5 both have NULL for both VisitReason and Age_Group???????

So the question becomes: What needs to happen in the join so that B3 correctly shows VisitReason as 'Buy Magazine', and similarly that B5 is listed as age 16-24?...once that is resolved I can use LEFT or FULL OUTER JOIN to get the requireed range of permutations of Age Group and Visit Reason


Thanks
Iain
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20413733
I think what you're going to need to do is weed out those rows with NULL and a count of 0 in the where clause

WHERE sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)

--
JimFive
So the entire thing looks like:
SELECT ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value
FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
WHERE sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)

Open in new window

0
 
LVL 15

Expert Comment

by:JimFive
ID: 20413792
Eliminate the WHERE in that last and change to

GROUP BY ag.Age_Group_Name, vr.Reason_Name
HAVING sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)
 

   

0
 

Author Comment

by:iaing1000
ID: 20413930
Thanks again,

Will have to look at it tomorrow, but it looks good at a glance!

Cheers
Iain
0
 

Author Comment

by:iaing1000
ID: 20418607
Hi Jim,

I've just had a look at this. There's a major league problem though.

If the client table is empty there are no records returned.

In the example, even if the client table were empty, we would expect 20 records returned (for each of the possible permutations of Age_Group vs VisitReason) with a client count of zero for each one.

Thanks again
Iain
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20419300
This is working for me:

SELECT ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value
FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
GROUP BY ag.Age_Group_Name, vr.Reason_Name
HAVING sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr.Reason_Name is NOT NULL) 

Open in new window

0
 

Author Comment

by:iaing1000
ID: 20419649
Hi,

I've done it with a slightly modified HAVING clause, as attached snippet, (otherewise it didn't allow for B3, with a <NULL> Age_Group, from the original example), is this what you meant? I've also changed the main SELECT to use Count() instead of SUM for the same reason....does that seem right?

I think my remaining problem lies in the fact that I will sometimes only select certain records from ClientRef_Table, say those who are 40 years old or less, and when I put the WHERE clause in to filter these, all the permutations of grouping are lost. To illustrate brutally, if I put WHERE cr.ClientRef='B7' into the SQL code after the COALESCE clauses then I'd like 20 records returned with only the Age_Group '16-24', Visit Reason 'Goods Delivery' permutation showing a non-zero count (specifically, a count of 1, since that is where B7 falls into).  

Thanks again ...sorry if I'm not particularly clear sometimes!

Iain
SELECT ag.Age_Group_Name, vr.Reason_Name, count(cr.ClientRef) as count_value
FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID 
GROUP BY ag.Age_Group_Name, vr.Reason_Name
HAVING sum(case when cr.VisitReason_FK is null then 0 else 1 end)  > 0 OR  sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr.Reason_Name is NOT NULL)  

Open in new window

0
 
LVL 15

Expert Comment

by:JimFive
ID: 20420266
To keep the permutations you will need to put the WHERE condition into the ON clause like this:
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
AND cr.ClientRef = 'B7'

--
JimFive
0
 
LVL 15

Accepted Solution

by:
JimFive earned 500 total points
ID: 20420282
Since you changed your select to be count(cr.ClientRef) you can change your having to
HAVING count(cr.ClientRef) > 0 OR (ag.Age_Group_Name is Not Null AND vr.Reason_Name is Not Null)
--
JimFive
0
 

Author Comment

by:iaing1000
ID: 20421344
Hi,

That's looking very good now. One final thing, though.

I'd like to have the cross joined tables referred to as a single aliased entity. Is this possible?

Thanks

Something like:.....but it's not liking my example code!
FROM  ((SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr) as all_links
LEFT JOIN ClientRef_Table cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
etc.

Open in new window

0
 
LVL 15

Expert Comment

by:JimFive
ID: 20421943
I don't really recommend it as it seems to be harder to read, but this should work.
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name
           FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
                       CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr) as all_links

What you might want to do though is create a view called all_links that contains the cross join then you can just do
FROM all_links LEFT JOIN ClientRef_Table

You'll get various reactions to this idea, from an efficiency standpoint it may be slightly less efficient, but I like to separate out derived tables if they are either going to be reused a lot, or if they are getting confusing to read.
--
JimFive
0
 

Author Comment

by:iaing1000
ID: 20422100
Hi,

Thanks ever so much, that works well.

There is, of course, one last thing that I forgot!...my apologies

I want to be able to group by gender, but there is no look up table. Is there a way to CROSS JOIN the values for gender (namely: <NULL>,'Male','Female') into all_links?...by just creating the values on the fly?

I realise that this could be done by selecting DISTINCT(Gender) from ClientRef_Table but that may be quite slow (let's imagine there's a hypothetical gender field  in ClientRef_Table, since this is only a made up example). My real dataset looks like it will require a nice CROSS JOIN which contains all groupings I need (at some point I may indeed put this into a view or similar to make it easier to read and of course have the ability to re-use it).

Thanks once again,
Iain
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20422316
Sure, do it just like I added the nulls to the other tables add
CROSS JOIN (Select Null as Gender UNION Select 'Male' as Gender UNION Select 'Female' as Gender) gender
0
 

Author Comment

by:iaing1000
ID: 20422346
To illustrate my last point, this will work if there is a Gender field in ClientRef_Table. But I'd like it without the SELCT DISTINCT(Gender) FROM ClientRef_Table if possible and instead have something with the literal values plugged in...

Cheers
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender
   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 
      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
      CROSS JOIN (SELECT DISTINCT(Gender) FROM ClientRef_Table) gr) as all_links
LEFT JOIN (SELECT * FROM ClientRef_Table WHERE ClientRef like 'B3%') cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
AND cr.Gender = all_links.Gender
GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender
HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is Not Null)

Open in new window

0
 

Author Comment

by:iaing1000
ID: 20422535
Looking very nearly there,

What do I need to use for the COALESCE for Gender? It doesn't seem to like 0 because of a type conversion obviously, but if I put NULL in its place it doesn't return NULL genders from ClientRef_Table ither?

Cheers
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender
   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 
      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
      CROSS JOIN (Select Null as Gender UNION Select 'Male' as Gender UNION Select 'Female' as Gender) gr) as all_links
LEFT JOIN (SELECT * FROM ClientRef_Table WHERE ClientRef like 'B3%') cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
AND COALESCE (cr.Gender,NULL) = all_links.Gender
GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender
HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is Not Null)

Open in new window

0
 
LVL 15

Expert Comment

by:JimFive
ID: 20423547
You're right, that doesn't quite work.  COALESCE returns the first argument that is NOT NULL.
So for your gender you'll want something like (SELECT 'Unknown' as Gender UNION Select 'Male' as Gender UNION SELECT 'Female' as Gender) as gender

And then COALESCE(cr.Gender,'Unknown') = all_links.Gender

If you want it to show <NULL> like the other fields use '<NULL>' to fake it.
0
 

Author Comment

by:iaing1000
ID: 20423717
Hi,

I eventually saw a way to fake it...in a way that seemed to standardise as much as possible with the rest of it!

I'm doing a whole bunch of similar reports which are built up via strings in code, based on user input, so I want to keep it as formulaic as possible. For instance, all the elements of the final clause of the HAVING part have the 'IS NOT NULL' and this means I can generate it simply in a loop.....hence my need to have the Gender NULLs thing working.

Many thanks for staying with me on this tricky one.

Thanks once again

All the best
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender_ID, Gender
   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 
      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
      CROSS JOIN (Select '' as Gender_ID, NULL as Gender UNION Select 'Male' as Gender_ID, 'Male' as Gender UNION Select 'Female' as Gender_ID, 'Female' as Gender) gr) as all_links
LEFT JOIN (SELECT * FROM ClientRef_Table) cr
ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 
AND COALESCE(cr.Gender,'') = all_links.Gender_ID
GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender
HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is NOT NULL)

Open in new window

0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question