Solved

GROUP BY multiple fields, to include all possible grouped permutations, and also non matching NULLS

Posted on 2007-12-05
20
226 Views
Last Modified: 2007-12-06
Hi,

I have a table (below) for which I need to get a count of the ClientRef numbers, grouped by the two fields Age_Group_FK and VisitReason_FK. The grouping fields obviously join to the two other tables shown. I want ClientRef counts of zero for all permutations not shown,  the important point being that I need to get all the permutations of Age_Group vs VisitReason. I have entirely made this example up, but it serves for the purposes of illustration!

ClientRef  Age_Group_FK VisitReason_FK
---------------------------------------------------
B1                2                     4
B2                2                     3
B3                <NULL>          1
B4                2                     3
B5                2                     <NULL>
B6                2                     3
B7                2                     3
B8                1                     1

Age_Group_ID   Age_Group_Name
--------------------------------------------
1                             Under 15                
2                             16-24
3                             25-40
4                             Over 40

VisitReason_ID    Reason_Name
----------------------------------------
1                         Buy Magazine    
2                         Buy Groceries
3                         Goods Delivery
4                         Buy Alcohol
5                         Buy Pet Food

As you can see there'll need to be 20 (5 X 4) groupings in the final result, with many of them zero of course.

I'm using CROSS JOINS to do this:

select ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value
from Age_Group ag
cross join VisitReason vr
left join ClientRef_Table cr
  on cr.Age_Group_FK = ag.Age_Group_ID
 and cr.VisitReason_FK = vr.VisitReason_ID
group by ag.Age_Group_Name, vr.Reason_Name

....but, here's the bit I can't do:

In some instances there will be a <NULL> value in the client table and it won't have a lookup in any corresponding table, but I need it grouped. So, in the example, I'd like an extra 2 columns in the groupings: For the grouping of Age_Group 16-24 against <NULL> for VisitReason, and for Age_Group <NULL> against VistReason Magazine, totalling, you gueseed it, 22 grouped records altogether.



Currently, the way it does it is by providing an extra grouping with <NULL> in BOTH Age_Group and VisitReason columns. In fact, if there is a <NULL> value in any or either of the grouped columns they all get counted together as <NULL>,<NULL> for both grouped fields, leaving me with 21 grouped records (with a count of 2 in thi case). Unfortunately, putting a <NULL> value in the lookup tables is not possible.


Thanks in advance
Iain
0
Comment
Question by:iaing1000
  • 10
  • 10
20 Comments
 
LVL 15

Expert Comment

by:JimFive
ID: 20412923
First why not put a 0 in the lookup_table for N/A or Unknown.

Second, try this as your from:
from (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
cross join (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr
left join ClientRef_Table cr
  on cr.Age_Group_FK = ag.Age_Group_ID
 and cr.VisitReason_FK = vr.VisitReason_ID

--
JimFive
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20412942
Oops that should be

ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
0
 

Author Comment

by:iaing1000
ID: 20413285
Hi,

Thanks for the fast response,

The query runs OK, but provides 30 rows   (4 + NULL=5) X (5 + NULL =6)

It is really imperative and a requirement that I return specifically 22 rows. That mean exactly two groupings for the two incidences of <NULL> values as I mentioned. I suspect that will be difficult...but that's why I'm at EE!

Thanks
Iain
0
 

Author Comment

by:iaing1000
ID: 20413571
Hi again,

To simplify the problem somewhat, we can forget about the grouping....the problem can be highlighted before that at the JOIN stage:

If we just join the cross join the two lookup tables (visitreason and age_group), then a RIGHT JOIN to clientref will yield 8 records, but records B3 and B5 both have NULL for both VisitReason and Age_Group???????

So the question becomes: What needs to happen in the join so that B3 correctly shows VisitReason as 'Buy Magazine', and similarly that B5 is listed as age 16-24?...once that is resolved I can use LEFT or FULL OUTER JOIN to get the requireed range of permutations of Age Group and Visit Reason


Thanks
Iain
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20413733
I think what you're going to need to do is weed out those rows with NULL and a count of 0 in the where clause

WHERE sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)

--
JimFive
So the entire thing looks like:
SELECT ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value

FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag

CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr

LEFT JOIN ClientRef_Table cr

ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID

AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID

WHERE sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)

Open in new window

0
 
LVL 15

Expert Comment

by:JimFive
ID: 20413792
Eliminate the WHERE in that last and change to

GROUP BY ag.Age_Group_Name, vr.Reason_Name
HAVING sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr_VisitReason is NOT NULL)
 

   

0
 

Author Comment

by:iaing1000
ID: 20413930
Thanks again,

Will have to look at it tomorrow, but it looks good at a glance!

Cheers
Iain
0
 

Author Comment

by:iaing1000
ID: 20418607
Hi Jim,

I've just had a look at this. There's a major league problem though.

If the client table is empty there are no records returned.

In the example, even if the client table were empty, we would expect 20 records returned (for each of the possible permutations of Age_Group vs VisitReason) with a client count of zero for each one.

Thanks again
Iain
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20419300
This is working for me:

SELECT ag.Age_Group_Name, vr.Reason_Name, sum(case when cr.Age_Group_FK is null then 0 else 1 end) as count_value

FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag

CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr

LEFT JOIN ClientRef_Table cr

ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID

AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID

GROUP BY ag.Age_Group_Name, vr.Reason_Name

HAVING sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr.Reason_Name is NOT NULL) 

Open in new window

0
 

Author Comment

by:iaing1000
ID: 20419649
Hi,

I've done it with a slightly modified HAVING clause, as attached snippet, (otherewise it didn't allow for B3, with a <NULL> Age_Group, from the original example), is this what you meant? I've also changed the main SELECT to use Count() instead of SUM for the same reason....does that seem right?

I think my remaining problem lies in the fact that I will sometimes only select certain records from ClientRef_Table, say those who are 40 years old or less, and when I put the WHERE clause in to filter these, all the permutations of grouping are lost. To illustrate brutally, if I put WHERE cr.ClientRef='B7' into the SQL code after the COALESCE clauses then I'd like 20 records returned with only the Age_Group '16-24', Visit Reason 'Goods Delivery' permutation showing a non-zero count (specifically, a count of 1, since that is where B7 falls into).  

Thanks again ...sorry if I'm not particularly clear sometimes!

Iain
SELECT ag.Age_Group_Name, vr.Reason_Name, count(cr.ClientRef) as count_value

FROM  (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag

CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr

LEFT JOIN ClientRef_Table cr

ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID

AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID 

GROUP BY ag.Age_Group_Name, vr.Reason_Name

HAVING sum(case when cr.VisitReason_FK is null then 0 else 1 end)  > 0 OR  sum(case when cr.Age_Group_FK is null then 0 else 1 end)  > 0 OR (ag.Age_Group_Name is NOT NULL AND vr.Reason_Name is NOT NULL)  

Open in new window

0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 15

Expert Comment

by:JimFive
ID: 20420266
To keep the permutations you will need to put the WHERE condition into the ON clause like this:
ON COALESCE(cr.Age_Group_FK,0) = ag.Age_Group_ID
AND COALESCE(cr.VisitReason_FK,0) = vr.VisitReason_ID
AND cr.ClientRef = 'B7'

--
JimFive
0
 
LVL 15

Accepted Solution

by:
JimFive earned 500 total points
ID: 20420282
Since you changed your select to be count(cr.ClientRef) you can change your having to
HAVING count(cr.ClientRef) > 0 OR (ag.Age_Group_Name is Not Null AND vr.Reason_Name is Not Null)
--
JimFive
0
 

Author Comment

by:iaing1000
ID: 20421344
Hi,

That's looking very good now. One final thing, though.

I'd like to have the cross joined tables referred to as a single aliased entity. Is this possible?

Thanks

Something like:.....but it's not liking my example code!
FROM  ((SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag

CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr) as all_links

LEFT JOIN ClientRef_Table cr

ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID

AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 

etc.

Open in new window

0
 
LVL 15

Expert Comment

by:JimFive
ID: 20421943
I don't really recommend it as it seems to be harder to read, but this should work.
FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name
           FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag
                       CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr) as all_links

What you might want to do though is create a view called all_links that contains the cross join then you can just do
FROM all_links LEFT JOIN ClientRef_Table

You'll get various reactions to this idea, from an efficiency standpoint it may be slightly less efficient, but I like to separate out derived tables if they are either going to be reused a lot, or if they are getting confusing to read.
--
JimFive
0
 

Author Comment

by:iaing1000
ID: 20422100
Hi,

Thanks ever so much, that works well.

There is, of course, one last thing that I forgot!...my apologies

I want to be able to group by gender, but there is no look up table. Is there a way to CROSS JOIN the values for gender (namely: <NULL>,'Male','Female') into all_links?...by just creating the values on the fly?

I realise that this could be done by selecting DISTINCT(Gender) from ClientRef_Table but that may be quite slow (let's imagine there's a hypothetical gender field  in ClientRef_Table, since this is only a made up example). My real dataset looks like it will require a nice CROSS JOIN which contains all groupings I need (at some point I may indeed put this into a view or similar to make it easier to read and of course have the ability to re-use it).

Thanks once again,
Iain
0
 
LVL 15

Expert Comment

by:JimFive
ID: 20422316
Sure, do it just like I added the nulls to the other tables add
CROSS JOIN (Select Null as Gender UNION Select 'Male' as Gender UNION Select 'Female' as Gender) gender
0
 

Author Comment

by:iaing1000
ID: 20422346
To illustrate my last point, this will work if there is a Gender field in ClientRef_Table. But I'd like it without the SELCT DISTINCT(Gender) FROM ClientRef_Table if possible and instead have something with the literal values plugged in...

Cheers
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value

FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender

   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 

      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr

      CROSS JOIN (SELECT DISTINCT(Gender) FROM ClientRef_Table) gr) as all_links

LEFT JOIN (SELECT * FROM ClientRef_Table WHERE ClientRef like 'B3%') cr

ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID

AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 

AND cr.Gender = all_links.Gender

GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender

HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is Not Null)

Open in new window

0
 

Author Comment

by:iaing1000
ID: 20422535
Looking very nearly there,

What do I need to use for the COALESCE for Gender? It doesn't seem to like 0 because of a type conversion obviously, but if I put NULL in its place it doesn't return NULL genders from ClientRef_Table ither?

Cheers
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value

FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender

   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 

      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr

      CROSS JOIN (Select Null as Gender UNION Select 'Male' as Gender UNION Select 'Female' as Gender) gr) as all_links

LEFT JOIN (SELECT * FROM ClientRef_Table WHERE ClientRef like 'B3%') cr

ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID

AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 

AND COALESCE (cr.Gender,NULL) = all_links.Gender

GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender

HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is Not Null)

Open in new window

0
 
LVL 15

Expert Comment

by:JimFive
ID: 20423547
You're right, that doesn't quite work.  COALESCE returns the first argument that is NOT NULL.
So for your gender you'll want something like (SELECT 'Unknown' as Gender UNION Select 'Male' as Gender UNION SELECT 'Female' as Gender) as gender

And then COALESCE(cr.Gender,'Unknown') = all_links.Gender

If you want it to show <NULL> like the other fields use '<NULL>' to fake it.
0
 

Author Comment

by:iaing1000
ID: 20423717
Hi,

I eventually saw a way to fake it...in a way that seemed to standardise as much as possible with the rest of it!

I'm doing a whole bunch of similar reports which are built up via strings in code, based on user input, so I want to keep it as formulaic as possible. For instance, all the elements of the final clause of the HAVING part have the 'IS NOT NULL' and this means I can generate it simply in a loop.....hence my need to have the Gender NULLs thing working.

Many thanks for staying with me on this tricky one.

Thanks once again

All the best
Iain
SELECT all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender, count(cr.ClientRef) as count_value

FROM (SELECT Age_Group_ID, Age_Group_Name, VisitReason_ID, Reason_Name, Gender_ID, Gender

   FROM (SELECT 0 as Age_Group_ID, NULL as Age_Group_Name UNION SELECT Age_Group_ID, Age_Group_Name FROM Age_Group) ag 

      CROSS JOIN (SELECT 0 as VisitReason_ID, NULL as Reason_Name UNION SELECT VisitReason_ID, Reason_Name FROM VisitReason) vr

      CROSS JOIN (Select '' as Gender_ID, NULL as Gender UNION Select 'Male' as Gender_ID, 'Male' as Gender UNION Select 'Female' as Gender_ID, 'Female' as Gender) gr) as all_links

LEFT JOIN (SELECT * FROM ClientRef_Table) cr

ON COALESCE(cr.Age_Group_FK,0) = all_links.Age_Group_ID

AND COALESCE(cr.VisitReason_FK,0) = all_links.VisitReason_ID 

AND COALESCE(cr.Gender,'') = all_links.Gender_ID

GROUP BY all_links.Age_Group_Name, all_links.Reason_Name, all_links.Gender

HAVING count(cr.ClientRef) > 0 OR (all_links.Age_Group_Name is Not Null AND all_links.Reason_Name is Not Null AND all_links.Gender is NOT NULL)

Open in new window

0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

Introduction SQL Server Integration Services can read XML files, that’s known by every BI developer.  (If you didn’t, don’t worry, I’m aiming this article at newcomers as well.) But how far can you go?  When does the XML Source component become …
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now