Link to home
Start Free TrialLog in
Avatar of databarracks
databarracks

asked on

SQL Query Using Group By, Distinct but Return All values

Hi there,

I am a little stuck with a SQL Query of which I use to populate a pivot grid control in my application. I have attached an excel spreadsheet to illustrate what the source data looks like and what my desired query result should look like.

The issue I am facing is that I cannot use the group by statement on User, Date and Stage without only showing values where there was actually records within that specific week.

SELECT        dbo.MyTable.User, dbo.MyTable.Stage, dbo.MyTable.Date, SUM(dbo.MyTable.Amount) AS Amount, COUNT(DISTINCT dbo.MyTable.Id) AS Total
FROM            dbo.MyTable
GROUP BY dbo.MyTable.User, dbo.MyTable.Stage, dbo.MyTable.Date

Open in new window


Could someone please kindly assist me with this query of which your help is most appreciated?
SQL-Query-Plot.xls
Avatar of arnold
arnold
Flag of United States of America image

Add a where count(distinct dbo.mytable.id) >0
Avatar of databarracks
databarracks

ASKER

Hi Arnold,

Apologies but your response doesn't make sense to me? Could you please kindly explain to me how this is going to work?
I get "An aggregate may not apperar in the WHERE clause unliess it is in a subquery...etc" message
You are grouping by the following dbo.MyTable.User, dbo.MyTable.Stage, dbo.MyTable.Date
meaning the combination of these three then you are using the count of ID which is unique to each row.

You have the desired result, what is the actual result of your query based on the data set?

Rereading what you are looking for, you want the data that does not exist reflected in the report i.e. step 2 for Mark on the 03/17/2014 which does not exist, mark did not have step 2 on that date.
that is not how group by works, since this is a single table and you

using with cte get the range of stages
https://technet.microsoft.com/en-us/library/ms190766%28v=sql.105%29.aspx

you want to get the distinct stages available, then get the data to include them.  let me think on that.....
You must reference the aggregate in a HAVING clause not a where clause

AFTER the group by clause add

HAVING count(distinct dbo.mytable.id) >0
Hi Paul and Arnold,

I added the HAVING after the group by already and it still doesn't show rows that didn't exist in a grouped date. It basically shows the same rows as my original query.

I think Arnold is right that CTE might be the correct approach possibly but not sure how this will affect performance etc and have no idea how to build the query.?

I am guessing I need to write a "CTE With" of which will obtain all the Stage names from the table first then do some kind of join..... I am completely out of my depth here apologies.

Again I really appreciate your help so far on this matter.
you would need an outer join .

https://social.technet.microsoft.com/Forums/sqlserver/en-US/27266322-8b7e-4f4c-9f85-95418fcee5ef/cte-full-outer-join-question?forum=transactsql

This might be what you need ( might be repeating unneeded.....
;with cte as (select distinct stage from mytable group by stage)  
select a.user,cte.stage,sum(a.amount0 as Amount,count(distinct a.id) as Count, a.date from mytable a outer join cte on cte.stage=a.stage

Open in new window

this result:
| user |       date | amount |  stage |
|------|------------|--------|--------|
| MARK | 17/03/2015 |     60 | Step 1 |
| MARK | 17/03/2015 |      0 | Step 2 |
| MARK | 17/03/2015 |     40 | Step 3 |
| MARK | 17/03/2015 |      0 | Step 4 |
| MARK | 24/03/2015 |      0 | Step 4 |
| MARK | 24/03/2015 |      5 | Step 3 |
| MARK | 24/03/2015 |     10 | Step 2 |
| MARK | 24/03/2015 |     25 | Step 1 |
| MARK | 31/03/2015 |     10 | Step 1 |
| MARK | 31/03/2015 |      5 | Step 2 |
| MARK | 31/03/2015 |      0 | Step 3 |
| MARK | 31/03/2015 |      2 | Step 4 |
| JOHN | 24/03/2015 |     10 | Step 1 |
| JOHN | 24/03/2015 |      0 | Step 4 |
| JOHN | 24/03/2015 |      0 | Step 3 |
| JOHN | 24/03/2015 |      0 | Step 2 |
| JOHN | 31/03/2015 |      7 | Step 2 |
| JOHN | 31/03/2015 |      3 | Step 2 |
| JOHN | 31/03/2015 |      1 | Step 3 |
| JOHN | 31/03/2015 |      0 | Step 4 |
| JOHN | 31/03/2015 |      9 | Step 1 |
        

Open in new window


Produced by this query:
select t.[user], t.date, coalesce(m.amount,0) as amount, t.stage
from (select *
      from (select distinct [user], date from mytable) ud
      cross join (select distinct stage from mytable) s
      ) t
left join mytable m on t.[user] = m.[user] and t.date = m.date and t.stage = m.stage
order by [user] DESC, date
;

Open in new window


details:
    CREATE TABLE MyTable
        ([ID] int, [USER] varchar(4), [STAGE] varchar(6), [AMOUNT] int, [DATE] varchar(10))
    ;
        
    INSERT INTO MyTable
        ([ID], [USER], [STAGE], [AMOUNT], [DATE])
    VALUES
        (123, 'MARK', 'Step 1', 60, '17/03/2015'),
        (124, 'MARK', 'Step 3', 40, '17/03/2015'),
        (125, 'MARK', 'Step 1', 25, '24/03/2015'),
        (126, 'MARK', 'Step 2', 10, '24/03/2015'),
        (127, 'MARK', 'Step 3', 5, '24/03/2015'),
        (128, 'MARK', 'Step 2', 5, '31/03/2015'),
        (129, 'MARK', 'Step 1', 10, '31/03/2015'),
        (130, 'MARK', 'Step 4', 2, '31/03/2015'),
        (131, 'JOHN', 'Step 1', 10, '24/03/2015'),
        (132, 'JOHN', 'Step 1', 9, '31/03/2015'),
        (133, 'JOHN', 'Step 2', 7, '31/03/2015'),
        (134, 'JOHN', 'Step 2', 3, '31/03/2015'),
        (135, 'JOHN', 'Step 3', 1, '31/03/2015')
    ;
    
**Query 1**:

    
    select t.[user], t.date, coalesce(m.amount,0) as amount, t.stage
    from (select *
          from (select distinct [user], date from mytable) ud
          cross join (select distinct stage from mytable) s
          ) t
    left join mytable m on t.[user] = m.[user] and t.date = m.date and t.stage = m.stage
    order by [user] DESC, date

**[Results][2]**:
    | user |       date | amount |  stage |
    |------|------------|--------|--------|
    | MARK | 17/03/2015 |     60 | Step 1 |
    | MARK | 17/03/2015 |      0 | Step 2 |
    | MARK | 17/03/2015 |     40 | Step 3 |
    | MARK | 17/03/2015 |      0 | Step 4 |
    | MARK | 24/03/2015 |      0 | Step 4 |
    | MARK | 24/03/2015 |      5 | Step 3 |
    | MARK | 24/03/2015 |     10 | Step 2 |
    | MARK | 24/03/2015 |     25 | Step 1 |
    | MARK | 31/03/2015 |     10 | Step 1 |
    | MARK | 31/03/2015 |      5 | Step 2 |
    | MARK | 31/03/2015 |      0 | Step 3 |
    | MARK | 31/03/2015 |      2 | Step 4 |
    | JOHN | 24/03/2015 |     10 | Step 1 |
    | JOHN | 24/03/2015 |      0 | Step 4 |
    | JOHN | 24/03/2015 |      0 | Step 3 |
    | JOHN | 24/03/2015 |      0 | Step 2 |
    | JOHN | 31/03/2015 |      7 | Step 2 |
    | JOHN | 31/03/2015 |      3 | Step 2 |
    | JOHN | 31/03/2015 |      1 | Step 3 |
    | JOHN | 31/03/2015 |      0 | Step 4 |
    | JOHN | 31/03/2015 |      9 | Step 1 |

  [1]: http://sqlfiddle.com/#!6/33474/7
  [2]: http://sqlfiddle.com/#!6/33474/7/0

Open in new window

Hi Arnold,

I am still getting the same results:

Arnold - I had to add a few extra things to your query such as the group by and had to change the join type as outer join doesn't exist so used left outer join instead. The query produced results but exactly the same as before

Paul - If you look at your result set above it doesn't adhere to the desired result as per my spreadsheet as all users should have 4 entries per each date with all steps displayed?
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You have two joins one on the stage and one on the date.
Co-mingling between different DBs.

...

Is this an assignment?
Hi Paul,

You solved it :) hooray well done on that one really appreciate your help on this. Thank you too Arnold for your help it is much appreciated P.S it wasn't an assignment we are trying to run a few queries from Salesforce for our business so was testing to see if what was required by our team was achievable.

Thanks again guys
Excellent stuff from Paul he was excellent and a mention to Arnold who also didn't give up on resolving this matter. Brilliant :)