Solved

using DISTINCT in a complex query

Posted on 2013-06-18
14
206 Views
Last Modified: 2013-06-19
Hi

Im trying to get distinct only customerids for the  part of the query below after
-- want distinct iCustomerids only

Using distinct in the select doesnt work and I just want 1 row returned for a customer without it messing up the total in the first part

Is this possible without tmp tables etc?

select     vState,count(DISTINCT icustomerid) as [Total Pharmacies],
                     sum(case when iMonth =  1 then 1 else 0 end) "Jan",
                     sum(case when iMonth =  2 then 1 else 0 end) "Feb",
                     sum(case when iMonth =  3 then 1 else 0 end) "Mar",
                     sum(case when iMonth =  4 then 1 else 0 end) "Apr",
                     sum(case when iMonth =  5 then 1 else 0 end) "May",
                     sum(case when iMonth =  6 then 1 else 0 end) "Jun",
                     sum(case when iMonth =  7 then 1 else 0 end) "Jul",
                     sum(case when iMonth =  8 then 1 else 0 end) "Aug",
                     sum(case when iMonth =  9 then 1 else 0 end) "Sep",
                     sum(case when iMonth = 10 then 1 else 0 end) "Oct",
                     sum(case when iMonth = 11 then 1 else 0 end) "Nov",
                     sum(case when iMonth = 12 then 1 else 0 end) "Dec"
            from (
-- want distinct iCustomerids only
            select  vState,iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                        datepart(month,  tbSurveyResponseFeedback.dtResponse) as iMonth
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE()) AND (iSurveyQuestionId = @iSurveyQuestionId)
            where  vState in ('NSW','ACT','QLD','SA','VIC','TAS','WA','NT') and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            )
            ) data
      
            group by vState
            
ORDER BY VSTATE
0
Comment
Question by:louise_8
  • 7
  • 5
  • 2
14 Comments
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 39258243
Unfortunately, without having all the aliases let alone the schema for the tables involved it is difficult to guess at the right result.  For example, what tables do the following columns belong to:
vState, iSurveyResponseId, dtResponse, iSurveyQuestionId
0
 

Author Comment

by:louise_8
ID: 39258277
does this help?

select     tbCustomer.vState,count(DISTINCT tbCustomer.icustomerid) as [Total Pharmacies],
                     sum(case when iMonth =  1 then 1 else 0 end) "Jan",
                     sum(case when iMonth =  2 then 1 else 0 end) "Feb",
                     sum(case when iMonth =  3 then 1 else 0 end) "Mar",
                     sum(case when iMonth =  4 then 1 else 0 end) "Apr",
                     sum(case when iMonth =  5 then 1 else 0 end) "May",
                     sum(case when iMonth =  6 then 1 else 0 end) "Jun",
                     sum(case when iMonth =  7 then 1 else 0 end) "Jul",
                     sum(case when iMonth =  8 then 1 else 0 end) "Aug",
                     sum(case when iMonth =  9 then 1 else 0 end) "Sep",
                     sum(case when iMonth = 10 then 1 else 0 end) "Oct",
                     sum(case when iMonth = 11 then 1 else 0 end) "Nov",
                     sum(case when iMonth = 12 then 1 else 0 end) "Dec"
            from (
-- want distinct iCustomerids only
            select  tbCustomer.vState,tbSurveyResponseFeedback.iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                        datepart(month,  tbSurveyResponseFeedback.dtResponse) as iMonth
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE()) AND (iSurveyQuestionId = @iSurveyQuestionId)
            where  vState in ('NSW','ACT','QLD','SA','VIC','TAS','WA','NT') and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            )
            ) data
     
            group by tbCustomer.vState
           
ORDER BY tbCustomer.VSTATE
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 39258347
I still don't know what tables the following columns belong to:
dtResponse or iSurveyQuestionId

Can a  iCustomerid belong to more than one vState and more than one dtResponse?  If the answer is yes, which one do you want?  If the answer is no then the following should do it:

SELECT  data.vState,
        COUNT(DISTINCT data.icustomerid) AS [Total Pharmacies],
        SUM(CASE data.iMonth WHEN 1 THEN 1 ELSE 0 END) [Jan],
        SUM(CASE data.iMonth WHEN 2 THEN 1 ELSE 0 END) [Feb],
        SUM(CASE data.iMonth WHEN 3 THEN 1 ELSE 0 END) [Mar],
        SUM(CASE data.iMonth WHEN 4 THEN 1 ELSE 0 END) [Apr],
        SUM(CASE data.iMonth WHEN 5 THEN 1 ELSE 0 END) [May],
        SUM(CASE data.iMonth WHEN 6 THEN 1 ELSE 0 END) [Jun],
        SUM(CASE data.iMonth WHEN 7 THEN 1 ELSE 0 END) [Jul],
        SUM(CASE data.iMonth WHEN 8 THEN 1 ELSE 0 END) [Aug],
        SUM(CASE data.iMonth WHEN 9 THEN 1 ELSE 0 END) [Sep],
        SUM(CASE data.iMonth WHEN 10 THEN 1 ELSE 0 END) [Oct],
        SUM(CASE data.iMonth WHEN 11 THEN 1 ELSE 0 END) [Nov],
        SUM(CASE data.iMonth WHEN 12 THEN 1 ELSE 0 END) [Dec]
FROM    (
         SELECT MAX(c.vState) vState,
                -- s.iSurveyResponseId,
                c.iCustomerid,
                MAX(DATEPART(month, s.dtResponse)) AS iMonth
         FROM   dbo.tbCustomer c
                LEFT  JOIN tbSurveyResponseFeedback s ON s.iCustomerid = c.iCustomerId
                                                       AND (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE())
                                                       AND (iSurveyQuestionId = @iSurveyQuestionId)
         WHERE  c.vState IN ('NSW', 'ACT', 'QLD', 'SA', 'VIC', 'TAS', 'WA', 'NT')
                AND c.iClientid = @iClientid
                AND c.iCustomerid IN (SELECT   iCustomerid
                                      FROM     tbCustomergroup
                                      WHERE    (iGroupid = @iTopGroupid
                                                AND dtinactive IS NULL)
                                                OR @iTopGroupid = 0
                                      )
         GROUP BY
		c.iCustomerid
         
        ) data
GROUP BY data.vState
ORDER BY data.vState

Open in new window

0
 

Author Comment

by:louise_8
ID: 39258478
Thanks again

Table tbCustomer iCustomerid is a unique field and in that table we see
1 icustomerid with 1 state per row
and bSurveyResponseFeedback can have many icustomerids with the same iSurveyQuestionId - tbSurveyResponseFeedback.iSurveyResponseId is the unique field here as with dtResponse

eg tbCustomer
iCustomerid vState
123               NSW
456               NSW

ybSurveyResponseFeedback
iSurveyResponseId  iCustomerid   dtResponse iSurveyQuestionid
1        123        '01/01/2013 12:00'       2
2        123        '01/01/2013 15:10'       2
3        123        '05/01/2013 13:09'       2
4        456        '09/03/2013 16:13'       2

I've included the table names for dtResponse and iSurveyQuestionId

select     data.vState,count(DISTINCT data.icustomerid) as [Total Pharmacies],
                     sum(case when data.iMonth =  1 then 1 else 0 end) "Jan",
                     sum(case when data.iMonth =  2 then 1 else 0 end) "Feb",
                     sum(case when data.iMonth =  3 then 1 else 0 end) "Mar",
                     sum(case when data.iMonth =  4 then 1 else 0 end) "Apr",
                     sum(case when data.iMonth =  5 then 1 else 0 end) "May",
                     sum(case when data.iMonth =  6 then 1 else 0 end) "Jun",
                     sum(case when data.iMonth =  7 then 1 else 0 end) "Jul",
                     sum(case when data.iMonth =  8 then 1 else 0 end) "Aug",
                     sum(case when data.iMonth =  9 then 1 else 0 end) "Sep",
                     sum(case when data.iMonth = 10 then 1 else 0 end) "Oct",
                     sum(case when data.iMonth = 11 then 1 else 0 end) "Nov",
                     sum(case when data.iMonth = 12 then 1 else 0 end) "Dec"
            from (
            select  dbo.tbCustomer.vState,tbSurveyResponseFeedback.iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                        datepart(month,  tbSurveyResponseFeedback.dtResponse) as iMonth
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN CONVERT(DATETIME, @dtYearStart, 102) AND GETDATE()) AND (tbSurveyResponseFeedback.iSurveyQuestionId = @iSurveyQuestionId)
            where  dbo.tbCustomer.vState in ('NSW','ACT','QLD','SA','VIC','TAS','WA','NT') and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            and dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ( (iGroupid = @iGroupid  and dtinactive is null) or @iGroupid = 0))
            ) data
      
            group by data.vState
            
ORDER BY data.vState
0
 

Author Comment

by:louise_8
ID: 39258488
Here is some data Im receiving back.
Whilst I want the Total Pharmacies to reflect the sum of all the distinct customerids from data
I'd like the columns Jan-Dec to just output 1 for each unique customer that answered a survey for that month even if they answered 5 times... the problem is the customer is being included each time and inflating the numbers


vState      Total Pharmacies      Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep      Oct      Nov      Dec
ACT      2      0      1      1      1      0      1      0      0      0      0      0      0
NSW      66      48      40      37      28      49      31      0      0      0      0      0      0
QLD      84      9      31      22      45      39      29      0      0      0      0      0      0
SA      23      0      3      5      6      2      5      0      0      0      0      0      0
TAS      4      0      0      0      0      4      0      0      0      0      0      0      0
VIC      28      10      8      13      6      10      4      0      0      0      0      0      0
WA      18      22      18      17      17      19      9      0      0      0      0      0      0

cheers
0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 250 total points
ID: 39258648
I'm unable to test at the moment, but perhaps this approach will assist:
SELECT
      data.vState
    , count(DISTINCT data.icustomerid) AS [Total Pharmacies]
    , count( DISTINCT (CASE WHEN data.iMonth = 1  THEN data.icustomerid ELSE NULL END) ) "Jan"
    , count( DISTINCT (CASE WHEN data.iMonth = 2  THEN data.icustomerid ELSE NULL END) ) "Feb"
    , count( DISTINCT (CASE WHEN data.iMonth = 3  THEN data.icustomerid ELSE NULL END) ) "Mar"
    , count( DISTINCT (CASE WHEN data.iMonth = 4  THEN data.icustomerid ELSE NULL END) ) "Apr"
    , count( DISTINCT (CASE WHEN data.iMonth = 5  THEN data.icustomerid ELSE NULL END) ) "May"
    , count( DISTINCT (CASE WHEN data.iMonth = 6  THEN data.icustomerid ELSE NULL END) ) "Jun"
    , count( DISTINCT (CASE WHEN data.iMonth = 7  THEN data.icustomerid ELSE NULL END) ) "Jul"
    , count( DISTINCT (CASE WHEN data.iMonth = 8  THEN data.icustomerid ELSE NULL END) ) "Aug"
    , count( DISTINCT (CASE WHEN data.iMonth = 9  THEN data.icustomerid ELSE NULL END) ) "Sep"
    , count( DISTINCT (CASE WHEN data.iMonth = 10 THEN data.icustomerid ELSE NULL END) ) "Oct"
    , count( DISTINCT (CASE WHEN data.iMonth = 11 THEN data.icustomerid ELSE NULL END) ) "Nov"
    , count( DISTINCT (CASE WHEN data.iMonth = 12 THEN data.icustomerid ELSE NULL END) ) "Dec"
... as you have it now ...

Open in new window

0
 

Author Comment

by:louise_8
ID: 39261332
Hi acperkins

Thanks again for your help.
Its getting close.
It works great for most months but for May the count returns 37 for NSW when the actual distinct number is 35.

Looking through that values for the data part of the query specific to NSW in May
I can see from the data if there's a iCustomerid with 2 entries for iMonth & iSurveyResponseId (a survey entry) - it counts this as 1 which is correct (unique customer)
However where there's 3 for the same customer - it counts this as 2.

Cheers
Louise

ps
would a file of data help?
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 

Author Comment

by:louise_8
ID: 39261352
Hi again

when I change the query to
  ...and     (dtResponse BETWEEN '01 may 2013' AND '31 may 2013')
I get 35 but including other months 37 as per my last comment

thanks
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39261366
>>would a file of data help?
always this helps

source data & expected results = happy campers

you could even try starting a sqlfiddle (  http://sqlfiddle.com )

(whoo hoo Aussie's at world cup..!)
0
 

Author Comment

by:louise_8
ID: 39261382
world cup = :)

I've uploaded the file for data part of query (below)
2nd sheet includes date 01 may 2013 - 19 jun 2013

when the date range > May, the distinct count of customerids is 37 for May when its really 35.
As mentioned, seems to include where customerids have more than 2 entries but counts 2 entries as 1

query
      select  vState,iSurveyResponseId, dbo.tbCustomer.iCustomerid,
                         tbSurveyResponseFeedback.dtResponse
            from  
                        dbo.tbCustomer left  join tbSurveyResponseFeedback on tbSurveyResponseFeedback.iCustomerid =  dbo.tbCustomer.iCustomerId
            and     (dtResponse BETWEEN '01 may 2013' AND '31 May 2013') AND (iSurveyQuestionId = @iSurveyQuestionId)
            where  vState = @vState and dbo.tbCustomer.iClientid = @iClientid
            and  dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ((iGroupid = @iTopGroupid and  dtinactive is null) or @iTopGroupid = 0))
            and dbo.tbCustomer.iCustomerid in(select iCustomerid from tbCustomergroup where ( (iGroupid = @iGroupid  and dtinactive is null) or @iGroupid = 0))
HO-SurveyMay13PlusMayJun13.xlsx
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39261439
Hi.  "Beware of Between" it can (and has) lead you astray.

METHOD                                  CLI_COUNT
NOT using Between (< Jun-01) 37
Using Between (May 31)           35
DURING 31 May                         2


Please try the following on your survey result table.

select
  'NOT using Between (< Jun-01)' as method
, count(distinct iCustomerId) as cli_count
from tbSurveyResponseFeedback
where DtResponse >= '2013-05-01'
and DtResponse < '2013-06-01'      -- note LESS THAN 1st of June

union all

select
  'Using Between (May 31)' as method
, count(distinct iCustomerId) as cli_count
from tbSurveyResponseFeedback
where DtResponse between '2013-05-01' and '2013-05-31'

union all

select
  'DURING 31 May' as method
, count(distinct iCustomerId) as cli_count
from tbSurveyResponseFeedback
where DtResponse >= '2013-05-31'
and DtResponse < '2013-06-01'      -- note LESS THAN 1st of June

Open in new window

0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 250 total points
ID: 39261458
Your "expected result" of 35 is in error; hence the frustration. When filtering "between 1st May and 31st May" this has omitted 2 records that occur during the 31st of May.

in summary:

select <<whatever>>
from <<whereever>>
where some_date between {first day of period} and {last day of period}

while this can be workable,  a far simpler - AND more reliable method is to avoid using 'between', like so:

select <<whatever>>
from <<whereever>>
where some_date >= {first day of period}
    and some_date <   {first day of next period}

I try to explain this more completely in an article: "Beware of Between"

{edit, adding this: http://sqlfiddle.com/#!3/c8689/7 }
0
 

Author Closing Comment

by:louise_8
ID: 39261585
Thanks to you both
It was a simple date fix ie not using between that solved the final issue and the distinct /count change worked great

Appreciate your help!
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39261602
Thanks - very happy you got the results you wanted. It was also pleasing to see some familiar Aussie data for a slight change.

Cheers, Paul
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Join & Write a Comment

Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
Viewers will learn how the fundamental information of how to create a table.

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now