Link to home
Start Free TrialLog in
Avatar of Big Monty
Big MontyFlag for United States of America

asked on

SQL Query that uses count(*) over, possible to use distinct?

I have the following query that I use as a part of a custom search engine on my site. I've stripped it down for readability, the joined tables will have columns in the select clause, that's why they're there:

select distinct ( u.userID ), u.lastLogin, totalUsers = COUNT( u.userID ) OVER() 
From   ZipCodes z  
	inner join tblUsers u on z.zipCode = u.zipCode  
	inner join tblUserOffers uo on u.userID = uo.userID 
Where <a big long where clause>
order by u.lastLogin OFFSET 0 
ROWS FETCH NEXT 500 ROWS ONLY 

Open in new window


my problem is with the totalUsers field. It's listing all of the records when I should get results based off of distinct( userID ).

For example, when I run the query with certain parameters, I get back 21 rows of data, however the totalUsers column has a value of 46. If I remove the DISTINCT clause, I get back 46 rows of data, however there are duplicates (and I'm not sure if I'm enough of a sql ninja to tune the complex query to prevent dupes). This query (part of a stored procedure) will be used often, so it needs to be as efficient as possible.

Cheers in advance for any help :)
Avatar of HainKurt
HainKurt
Flag of Canada image

above query looks weird...
what are you trying to do?

give a sample from input and the result you are looking for...
Avatar of PortletPaul
"select distinct" is NOT A FUNCTION, the parentheses around u.userID have NO EFFECT whatsoever

"select distinct" ensures the output to be different on each row - so when you state that the rows have duplicates technically that is impossible - there has to be something different on each row.

You need to remove your reliance on "select distinct" - this will probably mean a complete re-write of the query. I'd like to help more but without the full query it isn't really possible. Please read the following:
https://www.experts-exchange.com/articles/12282/Select-Distinct-is-returning-duplicates.html

Basically you need to use one or more subqueries, in one of them you will be able to do the count, and then join it all together without need "select distinct" at all. Something like this:
SELECT
      u.userID
    , u.lastLogin
    , z.totalUsers
from tblUsers u
INNER JOIN (
      SELECT
            zipCode
          , COUNT(*) AS totalUsers
      FROM tblUsers
      GROUP BY zipCode
) z ON z.zipCode = u.zipCode
INNER JOIN tblUserOffers uo ON u.userID = uo.userID
WHERE <a big long where clause>
ORDER BY u.lastLogin OFFSET 0
ROWS FETCH NEXT 500 ROWS ONLY

Open in new window

Without sample input  and output, I'm flying blind but here goes:

If you are expecting the total column to return 21, you can do the following:

select distinct u.userID, u.lastLogin, totalUsers = COUNT( DISTINCT u.userID ) OVER() 
From   ZipCodes z  
	inner join tblUsers u on z.zipCode = u.zipCode  
	inner join tblUserOffers uo on u.userID = uo.userID 
Where <a big long where clause>
order by u.lastLogin OFFSET 0 
ROWS FETCH NEXT 500 ROWS ONLY 

Open in new window

Avatar of Big Monty

ASKER

Huseyin - i'm pulling data from my users table, along with related data from other tables (thus the reason for the join). I'm also using paging through the use of OFFSET FETCH NEXT. I tried to simplify the query for readability, that's probably why it looks weird.

Nakul - that was one of the first things I tried. Unfortunately it's invalid syntax, distinct can't be used with COUNT OVER

Paul - I'm aware that distinct isn't a function, the extra parentheses were left over from my trying count(*) inside the distinct, and I just forgot to remove them :) I think you're right, I think the best approach is to eliminate my use of using DISTINCT. I think I've identified the issue, the 2nd join I think is causing the issue:

 inner join tblUserOffers uo on u.userID = uo.userID

This join is here because I have references to it in my full select clause:

select u.userID, fullName = u.firstName + ' ' + u.lastName, u.displayName, u.city, u.stateProvince,
      Round( dbo.CalculateDistance(-71.43520, 42.32320, z.Longitude, z.Latitude), 2 ) As Distance,
      profilePicture = case when isNull( u.profilePicture, '' ) = '' then '/images/person.png' else u.profilePicture end,
      confirmed = case u.confirmed when 1 then '<html string>' else '' end,
      ( SELECT  offer + ', '       FROM tblOffers o             inner join tblUserOffers uo on o.offerID = uo.offerID       WHERE u.userID = uo.userID and       uo.visible = 1 FOR XML PATH ('') )  as offers,       
      ( SELECT count( o.offerID )         FROM tblOffers o             inner join tblUserOffers uo on uo.offerID = o.offerID         WHERE u.userID = uo.userID and uo.visible = 1 FOR XML PATH ('')    )  as offerCount,

      u.lastLogin, totalUsers = COUNT( * ) OVER()

the purpose of the bolded part is to list what each user is offering on the site (the site is an exchange and barter website, where users fill out profiles and what the want to offer on the site).

- tblOffers is a table filled with offer definitions. It allows users to select from a generic list and attach them to their profile
- tblUserOffers is cross table indicating the offerID and who it belongs to, defined by userID.

When I comment out the 2nd join, I get back individual results. However, it returns ALL records found, including the users with no offers, which I want to avoid.

What am I missing here?
could you provide a few rows of data for each table please?

and, from that data what you would expect to get as the result?

this is always the smplest, most productive/fastes method for solving SQL problems
sure, here is what I expect the results to look like:

userID      fullName      displayName      city      stateProvince      Distance      profilePicture      confirmed      offers      offerCount      lastLogin      totalUsers
15188      xxx      xxx      Framingham      Massachusetts      0      /uploads/members/15188.jpg      confirmedHTML      Diet & Lifestyle Coaching ,       1      NULL      21
135      xxx      xxx    Framingham      Massachusetts      0      /uploads/members/135.jpg      confirmedHTML      Disability Consulting, Editing Papers/Essays,       2      2016-02-23 11:54:52.390      21
15209      xxx     xxxx    Framingham      Massachusetts      0      /images/person.png            Viola Lessons, Violin Lessons, Ballroom Dance Lessons,       3      2016-12-01 20:37:07.223      21

tblUsers contains the following columns:

userID      fullName      displayName      city      stateProvince       profilePicture      confirmed      lastLogin      

tblOffers looks like:

offerID      offer      description      categoryID      createDate      active      isXTOffer
101      Diet & Lifestyle Coaching147      2015-04-27 16:08:50.027      1      1
102      Disability Consulting      Disability Consulting desc      113      2015-04-27 16:09:22.217      1      1
103      Web Design      web design desc      107      2015-04-27 16:10:42.580      1      1
106      Editing Papers/Essays      Editing Papers/Essays      104      2015-05-08 19:00:54.083      1      1

and tblUserOffers looks like:

userOfferID      offerID      userID
1194               101      100
573                102      5593
605                       103      6679
if a spreadsheet would be easier, let me know and I'll build one up
Try to imagine that you want to help somebody who is having trouble - but ALL you know about the problem is on this page.

Also remember there isn't a solitary thing we can do in SQL - to demonstrate or prove something - until we have BOTH tables and data.

Every referenced column of Every table needs to be in the sample - otherwise the query will just fail.

Here is a working example:
DECLARE @tblOffers TABLE
    ([offerID] int, [offer] varchar(28), [description] varchar(26), [categoryID] int, [createDate] varchar(7), [active] varchar(4), [isXTOffer] varchar(4))
;
    
INSERT INTO @tblOffers 	
    ([offerID], [offer], [description], [categoryID], [createDate], [active], [isXTOffer])
VALUES
    (101, 'Diet & Lifestyle Coaching147', '08:50.0', 1, '1', NULL, NULL),
    (102, 'Disability Consulting', 'Disability Consulting desc', 113, '09:22.2', '1', '1'),
    (103, 'Web Design', 'web design desc', 107, '10:42.6', '1', '1')
;

DECLARE @tblUserOffers TABLE
    ([userOfferID] int, [offerID] int, [userID] int, [visible] int)
;
    
INSERT INTO @tblUserOffers 
    ([userOfferID], [offerID], [userID])
VALUES
    (1194, 101, 100),
    (573, 102, 5593),
    (605, 103, 6679)
;

DECLARE @tblUsers TABLE
    ([userID] int, [firstName] varchar(10), [lastName] varchar(10), [displayName] varchar(12), [city] varchar(5), [stateProvince] varchar(14)
     , [profilePicture] varchar(14), [confirmed] int, [lastLogin] datetime)
;
    
INSERT INTO @tblUsers 
    ([userID], [firstName], [lastName], [displayName], [city], [stateProvince], [profilePicture], [confirmed])
VALUES
    (100, 'afirstName', 'alastName', 'adisplayName', 'acity', 'astateProvince', 'profilePicture', 1),
    (5593, 'bfirstName', 'b lastName', 'bdisplayName', 'bcity', 'bstateProvince', 'profilePicture', 1),
    (6679, 'cfirstName', 'clastName', 'cdisplayName', 'ccity', 'cstateProvince', 'profilePicture', 1)
;



SELECT
      u.userID
    , fullName = u.firstName + ' ' + u.lastName
    , u.displayName
    , u.city
    , u.stateProvince
    --, ROUND(dbo.CalculateDistance(-71.43520, 42.32320, z.Longitude, z.Latitude), 2) AS Distance
    , profilePicture = CASE
                            WHEN ISNULL(u.profilePicture, '') = '' THEN '/images/person.png'
                            ELSE u.profilePicture END
    , confirmed = CASE u.confirmed
                       WHEN 1 THEN '<html string>'
                       ELSE '' END
    , (
            SELECT
                  offer + ', '
            FROM @tblOffers o
            INNER JOIN @tblUserOffers uo ON o.offerID = uo.offerID
            WHERE u.userID = uo.userID
            AND uo.visible = 1
            FOR xml PATH ('')
      )
      AS offers
    , (
            SELECT
                  COUNT(o.offerID)
            FROM @tblOffers o
            INNER JOIN @tblUserOffers uo ON uo.offerID = o.offerID
            WHERE u.userID = uo.userID
            AND uo.visible = 1
            FOR xml PATH ('')
      )
      AS offerCount
    , u.lastLogin
    , totalUsers = COUNT(*) OVER ()
FROM @tblUsers as U

Open in new window

That query, using that sample data, produces the following  result:
+---+--------+-----------------------+--------------+-------+----------------+----------------+---------------+--------+------------+-----------+------------+
|   | userID |       fullName        | displayName  | city  | stateProvince  | profilePicture |   confirmed   | offers | offerCount | lastLogin | totalUsers |
+---+--------+-----------------------+--------------+-------+----------------+----------------+---------------+--------+------------+-----------+------------+
| 1 |    100 | afirstName alastName  | adisplayName | acity | astateProvince | profilePicture | <html string> | NULL   |          0 | NULL      |          3 |
| 2 |   5593 | bfirstName b lastName | bdisplayName | bcity | bstateProvince | profilePicture | <html string> | NULL   |          0 | NULL      |          3 |
| 3 |   6679 | cfirstName clastName  | cdisplayName | ccity | cstateProvince | profilePicture | <html string> | NULL   |          0 | NULL      |          3 |
+---+--------+-----------------------+--------------+-------+----------------+----------------+---------------+--------+------------+-----------+------------+

Open in new window

Have a look at this url: http://rextester.com/KAP76290
Hey Paul,

I hear you, I'm trying to give (what I think) is the most useful data, and leaving out what isn't necessary so it'll be easier to read. I obviously need to work on this skill :)

I'll try to put something together in an hour or so on RexTester (nifty little site!). In the meantime, your example doesn't list anything for the OFFERS column, do you know why that is?
>>"example doesn't list anything for the OFFERS column, do you know why that is?"
I think it is due to not having any values in a field - I had to add it to the table but I didn't add data.

That missing data is referenced in a case expression. ( AND uo.visible = 1 )
If I comment out that conditoin you get data in the column.

By the way here is a query variant for you:
SELECT
      u.userID
    , fullName = u.firstName + ' ' + u.lastName
    , u.displayName
    , u.city
    , u.stateProvince
    --, ROUND(dbo.CalculateDistance(-71.43520, 42.32320, z.Longitude, z.Latitude), 2) AS Distance
    , profilePicture = CASE
                            WHEN ISNULL(u.profilePicture, '') = '' THEN '/images/person.png'
                            ELSE u.profilePicture END
    , confirmed = CASE u.confirmed
                       WHEN 1 THEN '<html string>'
                       ELSE '' END
    , oa1.offers
    , oa2.offerCount
    , u.lastLogin
    , totalUsers = COUNT(*) OVER ()
FROM @tblUsers as U
OUTER APPLY (
    SELECT
    STUFF((
          SELECT
                 ', ' + offer
            FROM @tblOffers o
            INNER JOIN @tblUserOffers uo ON o.offerID = uo.offerID
            WHERE u.userID = uo.userID
            --AND uo.visible = 1
          FOR XML PATH ('')
          )
         , 1, 1, '')
     ) AS OA1 (offers)
OUTER APPLY (
          SELECT
                 COUNT(o.offerID)
            FROM @tblOffers o
            INNER JOIN @tblUserOffers uo ON o.offerID = uo.offerID
            WHERE u.userID = uo.userID
            --AND uo.visible = 1
     ) AS OA2 (offerCount)

Open in new window

using outer apply yielded the same results with null values being returned in the OFFERS column when I added the following to the FROM clause:
From   ZipCodes z  
	inner join tblUsers u on z.zipCode = u.zipCode  
	inner join tblUserOffers uo on u.userID = uo.userID 

Open in new window


I also have the following as my WHERE clause:
Where u.userID <> 100 and 
	Longitude Between -72.41289 And -70.45751 And 
	Latitude Between 41.59901 And 43.04739 And 
	dbo.CalculateDistance(-71.43520, 42.32320, Longitude, Latitude) <= 50 and 
	status = 1  

Open in new window


still working on the clean data set
The queries I supplied don't have any reference to table zipcode - because there were no facts about that table presented.  The second query DOES produce a result in the offers column - from the sample data - if  uo.visible = 1 or you ignore uo.visible completely.
I've come up with a sql script that has all of the tables and examples of their data, while keeping the data anonymous.

with this script, I only get back userID 135 if I comment out the where clause, which is weird because the longitude / latitude for that user is outside of the where clause boundaries. If I leave in the where clause, I get no results back. I would expect to get back userIDs 135 and 136, as those are the only records inside tblUserOffers

DECLARE @tblOffers TABLE
    ([offerID] int, [offer] varchar(28), [description] varchar(26), [categoryID] int, [createDate] varchar(7), [active] varchar(4), [isXTOffer] varchar(4))
;
    
INSERT INTO @tblOffers 	
    ([offerID], [offer], [description], [categoryID], [createDate], [active], [isXTOffer])
VALUES
    (1193, 'Diet & Lifestyle Coaching', '08:50.0', 1, '1', NULL, NULL),
    (658, 'Disability Consulting', 'Disability Consulting desc', 113, '09:22.2', '1', '1'),
    (713, 'Web Design', 'web design desc', 107, '10:42.6', '1', '1'),
	(643, 'Web Design2', 'web design desc2', 107, '10:42.6', '1', '1'),
	(220, 'Web Design3', 'web design desc3', 107, '10:42.6', '1', '1')
;

DECLARE @tblUserOffers TABLE
    ([userOfferID] int, [offerID] int, [userID] int, [visible] int)
;
    
INSERT INTO @tblUserOffers 
    ([userOfferID], [offerID], [userID])
VALUES
    (1, 1193, 135),
    (2, 658, 135),
    (3, 713, 135),
    (4, 643, 136),
    (5, 220, 136)
;

DECLARE @tblUsers TABLE
    ([userID] int, [firstName] varchar(10), [lastName] varchar(10), [displayName] varchar(12), [city] varchar(5), [stateProvince] varchar(14)
     , [profilePicture] varchar(14), [confirmed] int, [lastLogin] datetime, [ZipCode] [char](5))
;
    
INSERT INTO @tblUsers 
    ([userID], [firstName], [lastName], [displayName], [city], [stateProvince], [profilePicture], [confirmed], zipCode)
VALUES
    (112, 'afirstName', 'alastName', 'adisplayName', 'acity', 'astateProvince', 'profilePicture', 1, '11111'),
    (122, 'bfirstName', 'blastName', 'bdisplayName', 'bcity', 'bstateProvince', 'profilePicture', 1, '22222'),
    (135, 'cfirstName', 'clastName', 'cdisplayName', 'ccity', 'cstateProvince', 'profilePicture', 1, '33333'),
    (136, 'dfirstName', 'dlastName', 'ddisplayName', 'dcity', 'dstateProvince', 'profilePicture', 1, '44444')
;

declare @ZIPCodes TABLE ( [ZipCode] [char](5), [Latitude] [decimal](12, 4), [Longitude] [decimal](12, 4) );

insert into @ZipCodes 
	( zipCode, latitude, Longitude )
values
	( '11111', 42.59901, -71.41289 ),
	( '22222', 43.12901, -71.46389 ),
	( '33333', 44.59901, -74.12345 )



SELECT
      u.userID
    , fullName = u.firstName + ' ' + u.lastName
    , u.displayName
    , u.city
    , u.stateProvince
    , profilePicture = CASE
                            WHEN ISNULL(u.profilePicture, '') = '' THEN '/images/person.png'
                            ELSE u.profilePicture END
    , confirmed = CASE u.confirmed
                       WHEN 1 THEN '<html string>'
                       ELSE '' END
    , oa1.offers
    , oa2.offerCount
    , u.lastLogin
    , totalUsers = COUNT(*) OVER ()
From   @ZipCodes z  
	inner join @tblUsers u on z.zipCode = u.zipCode  
	inner join @tblUserOffers uo on u.userID = uo.userID 
OUTER APPLY (
    SELECT
    STUFF((
          SELECT
                 ', ' + offer
            FROM @tblOffers o
            INNER JOIN @tblUserOffers uo ON o.offerID = uo.offerID
            WHERE u.userID = uo.userID
            --AND uo.visible = 1
          FOR XML PATH ('')
          )
         , 1, 1, '')
     ) AS OA1 (offers)
OUTER APPLY (
          SELECT
                 COUNT(o.offerID)
            FROM @tblOffers o
            INNER JOIN @tblUserOffers uo ON o.offerID = uo.offerID
            WHERE u.userID = uo.userID
            --AND uo.visible = 1
     ) AS OA2 (offerCount)
--Where Longitude Between -72.41289 And -70.45751 
--And 	Latitude Between 41.59901 And 43.04739 

Open in new window

zipcode 33333 does not match the where clause
userid 135 is in zipcode 333333
useris 136 is i zipcode 44444 but that isn't in the zipcode table
good catch, I wanted to have one record in the zipCodes table that didn't fall into the range, and had "44444" in the table originally, then decided I didnt need it. Anyways, I fixed the data and now I'm getting an accurate representation of what I'm getting, meaning duplicate rows:

http://rextester.com/KHAX17922

I really appreciate you sticking with this and helping me out, one of the many reasons I've been an EE member for so long :)
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
this looks promising....let me test some more and I'll report back soon, but for now, it looks like it's the solution :)
thanks for all of the help, big weight off my mind now that this works :)