Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

using DISTINCT in a complex query

Posted on 2013-06-18
14
Medium Priority
?
219 Views
Last Modified: 2013-06-19
Hi

Im trying to get distinct only customerids for the  part of the query below after
-- want distinct iCustomerids only

Using distinct in the select doesnt work and I just want 1 row returned for a customer without it messing up the total in the first part

Is this possible without tmp tables etc?

select     vState,count(DISTINCT icustomerid) as [Total Pharmacies],
                     sum(case when iMonth =  1 then 1 else 0 end) "Jan",
                     sum(case when iMonth =  2 then 1 else 0 end) "Feb",
                     sum(case when iMonth =  3 then 1 else 0 end) "Mar",
                     sum(case when iMonth =  4 then 1 else 0 end) "Apr",
                     sum(case when iMonth =  5 then 1 else 0 end) "May",
                     sum(case when iMonth =  6 then 1 else 0 end) "Jun",
                     sum(case when iMonth =  7 then 1 else 0 end) "Jul",
                     sum(case when iMonth =  8 then 1 else 0 end) "Aug",
                     sum(case when iMonth =  9 then 1 else 0 end) "Sep",
                     sum(case when iMonth = 10 then 1 else 0 end) "Oct",
                     sum(case when iMonth = 11 then 1 else 0 end) "Nov",
                     sum(case when iMonth = 12 then 1 else 0 end) "Dec"
            from (
-- want distinct iCustomerids only
            select  vState,iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                        datepart(month,  tbSurveyResponseFeedback.dtResponse) as iMonth
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE()) AND (iSurveyQuestionId = @iSurveyQuestionId)
            where  vState in ('NSW','ACT','QLD','SA','VIC','TAS','WA','NT') and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            )
            ) data
      
            group by vState
            
ORDER BY VSTATE
0
Comment
Question by:louise_8
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 2
14 Comments
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 39258243
Unfortunately, without having all the aliases let alone the schema for the tables involved it is difficult to guess at the right result.  For example, what tables do the following columns belong to:
vState, iSurveyResponseId, dtResponse, iSurveyQuestionId
0
 

Author Comment

by:louise_8
ID: 39258277
does this help?

select     tbCustomer.vState,count(DISTINCT tbCustomer.icustomerid) as [Total Pharmacies],
                     sum(case when iMonth =  1 then 1 else 0 end) "Jan",
                     sum(case when iMonth =  2 then 1 else 0 end) "Feb",
                     sum(case when iMonth =  3 then 1 else 0 end) "Mar",
                     sum(case when iMonth =  4 then 1 else 0 end) "Apr",
                     sum(case when iMonth =  5 then 1 else 0 end) "May",
                     sum(case when iMonth =  6 then 1 else 0 end) "Jun",
                     sum(case when iMonth =  7 then 1 else 0 end) "Jul",
                     sum(case when iMonth =  8 then 1 else 0 end) "Aug",
                     sum(case when iMonth =  9 then 1 else 0 end) "Sep",
                     sum(case when iMonth = 10 then 1 else 0 end) "Oct",
                     sum(case when iMonth = 11 then 1 else 0 end) "Nov",
                     sum(case when iMonth = 12 then 1 else 0 end) "Dec"
            from (
-- want distinct iCustomerids only
            select  tbCustomer.vState,tbSurveyResponseFeedback.iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                        datepart(month,  tbSurveyResponseFeedback.dtResponse) as iMonth
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE()) AND (iSurveyQuestionId = @iSurveyQuestionId)
            where  vState in ('NSW','ACT','QLD','SA','VIC','TAS','WA','NT') and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            )
            ) data
     
            group by tbCustomer.vState
           
ORDER BY tbCustomer.VSTATE
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 39258347
I still don't know what tables the following columns belong to:
dtResponse or iSurveyQuestionId

Can a  iCustomerid belong to more than one vState and more than one dtResponse?  If the answer is yes, which one do you want?  If the answer is no then the following should do it:

SELECT  data.vState,
        COUNT(DISTINCT data.icustomerid) AS [Total Pharmacies],
        SUM(CASE data.iMonth WHEN 1 THEN 1 ELSE 0 END) [Jan],
        SUM(CASE data.iMonth WHEN 2 THEN 1 ELSE 0 END) [Feb],
        SUM(CASE data.iMonth WHEN 3 THEN 1 ELSE 0 END) [Mar],
        SUM(CASE data.iMonth WHEN 4 THEN 1 ELSE 0 END) [Apr],
        SUM(CASE data.iMonth WHEN 5 THEN 1 ELSE 0 END) [May],
        SUM(CASE data.iMonth WHEN 6 THEN 1 ELSE 0 END) [Jun],
        SUM(CASE data.iMonth WHEN 7 THEN 1 ELSE 0 END) [Jul],
        SUM(CASE data.iMonth WHEN 8 THEN 1 ELSE 0 END) [Aug],
        SUM(CASE data.iMonth WHEN 9 THEN 1 ELSE 0 END) [Sep],
        SUM(CASE data.iMonth WHEN 10 THEN 1 ELSE 0 END) [Oct],
        SUM(CASE data.iMonth WHEN 11 THEN 1 ELSE 0 END) [Nov],
        SUM(CASE data.iMonth WHEN 12 THEN 1 ELSE 0 END) [Dec]
FROM    (
         SELECT MAX(c.vState) vState,
                -- s.iSurveyResponseId,
                c.iCustomerid,
                MAX(DATEPART(month, s.dtResponse)) AS iMonth
         FROM   dbo.tbCustomer c
                LEFT  JOIN tbSurveyResponseFeedback s ON s.iCustomerid = c.iCustomerId
                                                       AND (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE())
                                                       AND (iSurveyQuestionId = @iSurveyQuestionId)
         WHERE  c.vState IN ('NSW', 'ACT', 'QLD', 'SA', 'VIC', 'TAS', 'WA', 'NT')
                AND c.iClientid = @iClientid
                AND c.iCustomerid IN (SELECT   iCustomerid
                                      FROM     tbCustomergroup
                                      WHERE    (iGroupid = @iTopGroupid
                                                AND dtinactive IS NULL)
                                                OR @iTopGroupid = 0
                                      )
         GROUP BY
		c.iCustomerid
         
        ) data
GROUP BY data.vState
ORDER BY data.vState

Open in new window

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:louise_8
ID: 39258478
Thanks again

Table tbCustomer iCustomerid is a unique field and in that table we see
1 icustomerid with 1 state per row
and bSurveyResponseFeedback can have many icustomerids with the same iSurveyQuestionId - tbSurveyResponseFeedback.iSurveyResponseId is the unique field here as with dtResponse

eg tbCustomer
iCustomerid vState
123               NSW
456               NSW

ybSurveyResponseFeedback
iSurveyResponseId  iCustomerid   dtResponse iSurveyQuestionid
1        123        '01/01/2013 12:00'       2
2        123        '01/01/2013 15:10'       2
3        123        '05/01/2013 13:09'       2
4        456        '09/03/2013 16:13'       2

I've included the table names for dtResponse and iSurveyQuestionId

select     data.vState,count(DISTINCT data.icustomerid) as [Total Pharmacies],
                     sum(case when data.iMonth =  1 then 1 else 0 end) "Jan",
                     sum(case when data.iMonth =  2 then 1 else 0 end) "Feb",
                     sum(case when data.iMonth =  3 then 1 else 0 end) "Mar",
                     sum(case when data.iMonth =  4 then 1 else 0 end) "Apr",
                     sum(case when data.iMonth =  5 then 1 else 0 end) "May",
                     sum(case when data.iMonth =  6 then 1 else 0 end) "Jun",
                     sum(case when data.iMonth =  7 then 1 else 0 end) "Jul",
                     sum(case when data.iMonth =  8 then 1 else 0 end) "Aug",
                     sum(case when data.iMonth =  9 then 1 else 0 end) "Sep",
                     sum(case when data.iMonth = 10 then 1 else 0 end) "Oct",
                     sum(case when data.iMonth = 11 then 1 else 0 end) "Nov",
                     sum(case when data.iMonth = 12 then 1 else 0 end) "Dec"
            from (
            select  dbo.tbCustomer.vState,tbSurveyResponseFeedback.iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                        datepart(month,  tbSurveyResponseFeedback.dtResponse) as iMonth
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE()) AND (tbSurveyResponseFeedback.iSurveyQuestionId = @iSurveyQuestionId)
            where  dbo.tbCustomer.vState in ('NSW','ACT','QLD','SA','VIC','TAS','WA','NT') and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            and dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ( (iGroupid = @iGroupid  and dtinactive is null) or @iGroupid = 0))
            ) data
      
            group by data.vState
            
ORDER BY data.vState
0
 

Author Comment

by:louise_8
ID: 39258488
Here is some data Im receiving back.
Whilst I want the Total Pharmacies to reflect the sum of all the distinct customerids from data
I'd like the columns Jan-Dec to just output 1 for each unique customer that answered a survey for that month even if they answered 5 times... the problem is the customer is being included each time and inflating the numbers


vState      Total Pharmacies      Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep      Oct      Nov      Dec
ACT      2      0      1      1      1      0      1      0      0      0      0      0      0
NSW      66      48      40      37      28      49      31      0      0      0      0      0      0
QLD      84      9      31      22      45      39      29      0      0      0      0      0      0
SA      23      0      3      5      6      2      5      0      0      0      0      0      0
TAS      4      0      0      0      0      4      0      0      0      0      0      0      0
VIC      28      10      8      13      6      10      4      0      0      0      0      0      0
WA      18      22      18      17      17      19      9      0      0      0      0      0      0

cheers
0
 
LVL 49

Accepted Solution

by:
PortletPaul earned 1000 total points
ID: 39258648
I'm unable to test at the moment, but perhaps this approach will assist:
SELECT
      data.vState
    , count(DISTINCT data.icustomerid) AS [Total Pharmacies]
    , count( DISTINCT (CASE WHEN data.iMonth = 1  THEN data.icustomerid ELSE NULL END) ) "Jan"
    , count( DISTINCT (CASE WHEN data.iMonth = 2  THEN data.icustomerid ELSE NULL END) ) "Feb"
    , count( DISTINCT (CASE WHEN data.iMonth = 3  THEN data.icustomerid ELSE NULL END) ) "Mar"
    , count( DISTINCT (CASE WHEN data.iMonth = 4  THEN data.icustomerid ELSE NULL END) ) "Apr"
    , count( DISTINCT (CASE WHEN data.iMonth = 5  THEN data.icustomerid ELSE NULL END) ) "May"
    , count( DISTINCT (CASE WHEN data.iMonth = 6  THEN data.icustomerid ELSE NULL END) ) "Jun"
    , count( DISTINCT (CASE WHEN data.iMonth = 7  THEN data.icustomerid ELSE NULL END) ) "Jul"
    , count( DISTINCT (CASE WHEN data.iMonth = 8  THEN data.icustomerid ELSE NULL END) ) "Aug"
    , count( DISTINCT (CASE WHEN data.iMonth = 9  THEN data.icustomerid ELSE NULL END) ) "Sep"
    , count( DISTINCT (CASE WHEN data.iMonth = 10 THEN data.icustomerid ELSE NULL END) ) "Oct"
    , count( DISTINCT (CASE WHEN data.iMonth = 11 THEN data.icustomerid ELSE NULL END) ) "Nov"
    , count( DISTINCT (CASE WHEN data.iMonth = 12 THEN data.icustomerid ELSE NULL END) ) "Dec"
... as you have it now ...

Open in new window

0
 

Author Comment

by:louise_8
ID: 39261332
Hi acperkins

Thanks again for your help.
Its getting close.
It works great for most months but for May the count returns 37 for NSW when the actual distinct number is 35.

Looking through that values for the data part of the query specific to NSW in May
I can see from the data if there's a iCustomerid with 2 entries for iMonth & iSurveyResponseId (a survey entry) - it counts this as 1 which is correct (unique customer)
However where there's 3 for the same customer - it counts this as 2.

Cheers
Louise

ps
would a file of data help?
0
 

Author Comment

by:louise_8
ID: 39261352
Hi again

when I change the query to
  ...and     (dtResponse BETWEEN '01 may 2013' AND '31 may 2013')
I get 35 but including other months 37 as per my last comment

thanks
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 39261366
>>would a file of data help?
always this helps

source data & expected results = happy campers

you could even try starting a sqlfiddle (  http://sqlfiddle.com )

(whoo hoo Aussie's at world cup..!)
0
 

Author Comment

by:louise_8
ID: 39261382
world cup = :)

I've uploaded the file for data part of query (below)
2nd sheet includes date 01 may 2013 - 19 jun 2013

when the date range > May, the distinct count of customerids is 37 for May when its really 35.
As mentioned, seems to include where customerids have more than 2 entries but counts 2 entries as 1

query
      select  vState,iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                         tbSurveyResponseFeedback.dtResponse
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN '01 may 2013' AND '31 May 2013') AND (iSurveyQuestionId = @iSurveyQuestionId)
            where  vState = @vState and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            and dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ( (iGroupid = @iGroupid  and dtinactive is null) or @iGroupid = 0))
HO-SurveyMay13PlusMayJun13.xlsx
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 39261439
Hi.  "Beware of Between" it can (and has) lead you astray.

METHOD                                  CLI_COUNT
NOT using Between (< Jun-01) 37
Using Between (May 31)           35
DURING 31 May                         2


Please try the following on your survey result table.

select
  'NOT using Between (< Jun-01)' as method
, count(distinct iCustomerId) as cli_count
from tbSurveyResponseFeedback
where DtResponse >= '2013-05-01'
and DtResponse < '2013-06-01'      -- note LESS THAN 1st of June

union all

select
  'Using Between (May 31)' as method
, count(distinct iCustomerId) as cli_count
from tbSurveyResponseFeedback
where DtResponse between '2013-05-01' and '2013-05-31'

union all

select
  'DURING 31 May' as method
, count(distinct iCustomerId) as cli_count
from tbSurveyResponseFeedback
where DtResponse >= '2013-05-31'
and DtResponse < '2013-06-01'      -- note LESS THAN 1st of June

Open in new window

0
 
LVL 49

Assisted Solution

by:PortletPaul
PortletPaul earned 1000 total points
ID: 39261458
Your "expected result" of 35 is in error; hence the frustration. When filtering "between 1st May and 31st May" this has omitted 2 records that occur during the 31st of May.

in summary:

select <<whatever>>
from <<whereever>>
where some_date between {first day of period} and {last day of period}

while this can be workable,  a far simpler - AND more reliable method is to avoid using 'between', like so:

select <<whatever>>
from <<whereever>>
where some_date >= {first day of period}
    and some_date <   {first day of next period}

I try to explain this more completely in an article: "Beware of Between"

{edit, adding this: http://sqlfiddle.com/#!3/c8689/7 }
0
 

Author Closing Comment

by:louise_8
ID: 39261585
Thanks to you both
It was a simple date fix ie not using between that solved the final issue and the distinct /count change worked great

Appreciate your help!
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 39261602
Thanks - very happy you got the results you wanted. It was also pleasing to see some familiar Aussie data for a slight change.

Cheers, Paul
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
I have a large data set and a SSIS package. How can I load this file in multi threading?
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how the fundamental information of how to create a table.

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question