Need to modify my sql query slightly, but I need help

brgdotnet
brgdotnet used Ask the Experts™
on
Can someone please help me to change my query slightly?

I have a simple database table  which is named "City". It includes the id of the district, and last names of people in the district.
As you can see in the sample table the last names occur more than once, as there are many people in the city with the same last name in different districts.

I have a query which counts the number of unique names in the city by district. The name will not be counted more than once. Meaning if it was already counted in a previous district, it won't be counted again. So for example you can see in the query result that last name "Marconi" was only counted once, as it was counted in District 1, but not District 2.

The query works great, but it needs to be modified slightly so that it will also display the total of all of the names in a district, regardless if there are repeat names in a district. Please see the "Desired result" below to see what I need my query to produce. Attached is a copy of my table with data which can be recreated in sql server by running the attached script. My sql is very generic, so that it runs on other sql platforms like Sybase and Oracle.


My Working Query :
select t.Cityid, t.DiD, count(FamilyName) AS DistinctFamilyNames
from (
    select
            CityId, MIN(DistrictId)AS DiD,FamilyName  
    from City WHERE CityId = 'Austin' Group By CityId,FamilyName
    ) t
group by t.CityId, t.DiD;



City table :

CityId  DistrictId                FamilyName
------------------------------------
Austin      | 1                  | Brown
Austin      | 1                  | James
Austin      | 1                  | Marconi
Austin      | 2                  | Marconi
Austin      | 2                    | Trevor
Austin      | 2                  | Jones
Austin      | 3                  | Brown
Austin      | 3                  | Marconi
Austin      | 3                  | Marconi
Austin      | 3                  | Greyson
Austin      | 3                  | Marconi

Current Query result :
CityId  DistrictId          DistinctFamiies
------------------------------------
Austin      | 1                  | 3
Austin      | 2                  | 2
Austin      | 3                  | 2


Desired result :
CityId  DistrictId       DistinctFamiyNames        TotalNamesPerDistrict (This is the new column I need)
-------------------------------------------------------------------------------------------------------------
Austin      | 1                  | 3                                          |    3
Austin      | 2                  | 2                                          |    3
Austin      | 3                  | 2                                          |    5
SimplSample.txt
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Jim HornSQL Server Data Dude
Most Valuable Expert 2013
Author of the Year 2015

Commented:
Give this a whirl..
SELECT CityID, DistrictId, 
   COUNT(DISTINCT FamilyName) as DistinctFamilyNames, 
   COUNT(FamilyName) as TotalNamesPerDistrict
FROM City
GROUP BY CityID, DistrictID
ORDER BY CityID, DistrictID

Open in new window

Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>Give this a whirl..

Doesn't return the desired results.  Marconi is distinct in district 2 and3 so it is counted.  The requirement is not to count it in another district IF it has already been counted in a previous district.


Your sample data only has 4 names in district 3 so I cannot get a 5 as in your expected results.

See if this works:
select cityid, districtid,
	count(case when rn=1 then 1 end) distinctfamiliynames,
	count(*) totalnamesperdistrict
from (
	select
		familyname,
		cityid, districtid,
		row_number() over(partition by familyname order by districtid,familyname) rn
	from city
) outerquery
group by cityid, districtid
order by cityid, districtid
/

Open in new window

brgdotnetcontractor

Author

Commented:
Thanks Gentlemen, your sql skills are impressive. Hey Jim Horn, I think one of your name counts are not correct, so if you can somehow modify your suggested query to work that would be great. Please see my comments below about a gift certificate.

SlightWV Sir, I can't use that query because it uses row number() over( which does not work with my Sybase database.

SlightWV. Apologies if my data was not consistent with my post. Attached is a new sample database. For the new sample note that the outcome should look like the table below. As an example, Note that there are 5 names in district 3, but only one name is unique "Greyson", so DistinctFamilyNames is "1" and the total number of names in that district is 5.

CityId   |  DistrictId  |  DistinctFamilyNames  | TotalNamesPerDistrict
'Austin'|  1               |  3                                   |   3
'Austin'|  2               |  2                                     |   3
'Austin'|  3               |  1                                     |   5
SecodBatch.txt
DBPicture.PNG
Why Diversity in Tech Matters

Kesha Williams, certified professional and software developer, explores the imbalance of diversity in the world of technology -- especially when it comes to hiring women. She showcases ways she's making a difference through the Colors of STEM program.

PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
>>"row number() over( which does not work with my Sybase database."

Well, if you use Microsoft SQL Server in your chosen topics you will get suggestions that it supports.  Please choose topics with some care because this helps all involved.

SecodBatch.txt looks like T-SQL:
USE [Sandnox]
GO
/****** Object:  Table [dbo].[City]    Script Date: 12/14/2018 8:19:02 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[City](
	[CityId] [nvarchar](50) NULL,
	[DistrictId] [int] NULL,
	[FamilyName] [nvarchar](50) NULL
) ON [PRIMARY]

GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 1, N'Brown')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 1, N'James')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 1, N'Marconi')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 2, N'Marconi')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 2, N'Trevor')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 2, N'Jones')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 3, N'Brown')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 3, N'Greyson')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 3, N'Marconi')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 3, N'Marconi')
GO
INSERT [dbo].[City] ([CityId], [DistrictId], [FamilyName]) VALUES (N'Austin', 3, N'Marconi')
GO

Open in new window

Whilst I have removed Microsoft SQL Sever as a topic, and added in Sybase, it intrigues me that you don't have access to row_number() - what "flavour" of Sybase is it (and what version?)
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
Without row_number you could use a correlated subquery to locate the minimum districtid for each family name instead:
SELECT
    cityid
  , districtid
  , COUNT( CASE WHEN districtid = mindistrictid THEN 1 END ) AS distinctfamiliynames
  , COUNT( * )                                               AS totalnamesperdistrict
FROM (
    SELECT
        *
      , ( SELECT
                MIN( districtid )
            FROM city c2
            WHERE city.cityid = c2.cityid
            AND city.FamilyName = c2.FamilyName
        )
        mindistrictid
    FROM city
) d
GROUP BY
    cityid , districtid
ORDER BY
    cityid, districtid
;
 ==== ======== ============ ====================== ======================= 
       cityid   districtid   distinctfamiliynames   totalnamesperdistrict  
 ==== ======== ============ ====================== ======================= 
   1   Austin            1                      3                       3  
   2   Austin            2                      2                       3  
   3   Austin            3                      1                       5  
 ==== ======== ============ ====================== ======================= 

Open in new window


demo available at this link: https://rextester.com/GTHFP4413
brgdotnetcontractor

Author

Commented:
May I get some more help on this. The Query provided by Paul almost works. I inserted just two more rows of data in the table, and the query won't work against the new data. Can someone help to finish this? If I query the table, I see the two new rows listed last. I created the script to recreate the table with the new Data.

Here is what is in the table

Austin      1      Brown
Austin      1      James
Austin      1      Marconi
Austin      2      Marconi
Austin      2      Trevor
Austin      2      Jones
Austin      3      Brown
Austin      3      Greyson
Austin      3      Marconi
Austin      2      Brown
Austin      3      Marconi
Austin      3      Marconi
Austin      1      Marconi
Austin      1      Marconi

Results of incorrect Query :
CityId  DistrictId            DistinctNamesInDistrict  TotalNamesInDistrict
Austin      1                  |5                       |5
Austin      2                  |3                          |4
Austin      3                  |1                            |5



The Query should produce this instead :
CityId  DistrictId            DistinctNamesInDistrict  TotalNamesInDistrict
Austin      1                  |3                                         |5
Austin      2                  |2                                            |4
Austin      3                  | 1                                              |5

Here is the query provided by Paul. So close.
Q_Data.txt
Pauls.txt
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
What version of Sybase are you using?
brgdotnetcontractor

Author

Commented:
Hi Slight VW,
SELECT @@version  gives : Adaptive Server /16.0

Just an FYI if I try to use "select row_number() over(partition by " syntax then I get an error of :
"Incorrect Syntax near the keyword 'over'.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
This is ugly but see if it works for you:
select c.cityid, c.districtid, mincitycount, totalcitycount from (
	select cityid, districtid,count(*) totalcitycount from city group by cityid, districtid
) c
join
	(select cityid, min_dist, count(*) mincitycount from (
		select
			familyname,
			cityid,
			min(districtid) min_dist
		from city
		group by
			familyname, cityid
)
group by cityid, min_dist
) q on c.districtid=q.min_dist
order by c.cityid, c.districtid
/

Open in new window

EE Topic Advisor
Most Valuable Expert 2014
Awarded 2013
Commented:
I forgot to include the logic for distinct in the family names, so instead of the case expression returning 1 it should return the family name, then distinct will work.
SELECT
    cityid
  , districtid
  , COUNT( DISTINCT CASE WHEN districtid = mindistrictid THEN FamilyName END ) AS distinctfamiliynames
  , COUNT( * )                                                                 AS totalnamesperdistrict
FROM (
    SELECT
        *
      , ( SELECT
                MIN( districtid )
            FROM city c2
            WHERE city.cityid = c2.cityid
            AND city.FamilyName = c2.FamilyName
        )
        mindistrictid
    FROM city
) d
GROUP BY
    cityid , districtid
ORDER BY
    cityid, districtid
;

Open in new window

+----+--------+------------+----------------------+-----------------------+
|    | cityid | districtid | distinctfamiliynames | totalnamesperdistrict |
+----+--------+------------+----------------------+-----------------------+
|  1 | Austin |          1 |                    3 |                     5 |
|  2 | Austin |          2 |                    2 |                     4 |
|  3 | Austin |          3 |                    1 |                     5 |
+----+--------+------------+----------------------+-----------------------+

Open in new window

brgdotnetcontractor

Author

Commented:
Thanks Guys, you are awesome. I

Pauls query is 100% now, and I am getting the correct results.
Slight VW, I get the below error. No worries, thanks so much for your help.

Msg 156, Level 15, State 1, Line 14
Incorrect syntax near the keyword 'group'.
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
Msg 156, Level 15, State 1, Line 14
Incorrect syntax near the keyword 'group'.

In T-SQL declaring aliases for subqueries is mandatory  so that error message is a little misleading.

What it really means is that the keyword GROUP was NOT expected YET; an alias should have been in the code before that word was detected (see line 13 below)
select c.cityid, c.districtid, mincitycount, totalcitycount from (
	select cityid, districtid,count(*) totalcitycount from city group by cityid, districtid
) c
join
	(select cityid, min_dist, count(*) mincitycount from (
		select
			familyname,
			cityid,
			min(districtid) min_dist
		from city
		group by
			familyname, cityid
) d --<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< alias needed here                                   
group by cityid, min_dist
) q on c.districtid=q.min_dist
order by c.cityid, c.districtid

Open in new window

This is a difference between Oracle and T-SQL at play, in Oracle that alias is not mandatory.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>) d --<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< alias needed here    

Thanks.  I know that difference just forgot on the inner view...
brgdotnetcontractor

Author

Commented:
Hi Guys,

Hey I am using PortletPauls query and it is fabulously working for Sybase and Sql Server. For Oracle though, I am getting an error of the type "From Keyword Not Found where expected). Lines 9-14 of Portlet pauls query below. I have been trying to get it to work for Oracle, but am stuck. Does anyone see how I can fix this for the Oracle platform? Should I submit another question under the "Oracle" topic area?

SELECT
    cityid
  , districtid
  , COUNT( DISTINCT CASE WHEN districtid = mindistrictid THEN FamilyName END ) AS distinctfamiliynames
  , COUNT( * )                                                                 AS totalnamesperdistrict
FROM (
    SELECT
        *
      , ( SELECT
                MIN( districtid )
            FROM city c2
            WHERE city.cityid = c2.cityid
            AND city.FamilyName = c2.FamilyName

        )
        mindistrictid
    FROM city
) d
GROUP BY
    cityid , districtid
ORDER BY
    cityid, districtid
;
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
This should work in Oracle:
with md as (
    select cityid, familyname, min(Districtid) mindistrictid
    from city
    group by cityid, familyname
    )
SELECT
    cityid
  , districtid
  , COUNT( DISTINCT CASE WHEN districtid = mindistrictid THEN FamilyName END ) AS distinctfamiliynames
  , COUNT( * )                                                                 AS totalnamesperdistrict
FROM (
    SELECT
          city.Cityid, city.DistrictId, city.FamilyName
        , md.mindistrictid
    FROM city
    inner join md on city.cityid = md.cityid and city.familyname = md.familyname
) d
GROUP BY
    cityid , districtid
ORDER BY
    cityid, districtid
;

Open in new window

Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
In Oracle and SQL Server, use the ROW_NUMBER query.  It will be much more efficient.

A quirk in Oracle for the original query is the * and inline select.

This should make the original work:
...
FROM (
    SELECT
       city.*
      , ( SELECT
                MIN( districtid )
...
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
I tried removing the asterisk and the issue was not resolved... that's how I end-up with the cte

but yes, the original solution using row_number would work in both SQL Server and Oracle

https://www.experts-exchange.com/questions/29129607/Need-to-modify-my-sql-query-slightly-but-I-need-help.html#a42758084
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Here is a complete Oracle test case using Paul's query with my suggested change:
CREATE TABLE City(
	CityId varchar2(50),
	DistrictId number,
	FamilyName varchar2(50)
);

INSERT into city VALUES ('Austin', 1, 'Brown');
INSERT into city VALUES ('Austin', 1, 'James');
INSERT into city VALUES ('Austin', 1, 'Marconi');
INSERT into city VALUES ('Austin', 2, 'Marconi');
INSERT into city VALUES ('Austin', 2, 'Trevor');
INSERT into city VALUES ('Austin', 2, 'Jones');
INSERT into city VALUES ('Austin', 3, 'Brown');
INSERT into city VALUES ('Austin', 3, 'Greyson');
INSERT into city VALUES ('Austin', 3, 'Marconi');
INSERT into city VALUES ('Austin', 2, 'Brown');
INSERT into city VALUES ('Austin', 3, 'Marconi');
INSERT into city VALUES ('Austin', 3, 'Marconi');
INSERT into city VALUES ('Austin', 1, 'Marconi');
INSERT into city VALUES ('Austin', 1, 'Marconi');
commit

SELECT
    cityid
  , districtid
  , COUNT( DISTINCT CASE WHEN districtid = mindistrictid THEN FamilyName END ) AS distinctfamiliynames
  , COUNT( * )                                                                 AS totalnamesperdistrict
FROM (
    SELECT
        city.*
      , ( SELECT
                MIN( districtid )
            FROM city c2
            WHERE city.cityid = c2.cityid
            AND city.FamilyName = c2.FamilyName
        )
        mindistrictid
    FROM city
) d
GROUP BY
    cityid , districtid
ORDER BY
    cityid, districtid
; 

Open in new window


Results:
CITYID               DISTRICTID DISTINCTFAMILIYNAMES TOTALNAMESPERDISTRICT
-------------------- ---------- -------------------- ---------------------
Austin                        1                    3                     5
Austin                        2                    2                     4
Austin                        3                    1                     5

Open in new window

PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
Mmmmm, I must have done something wrong in my earlier test (using Rextester which is a bit painful for Oracle I'll warrant)

this did work at rextester
drop table city\\

CREATE TABLE City(
	CityId varchar2(50),
	DistrictId number,
	FamilyName varchar2(50)    
)\\

INSERT into city VALUES ('Austin', 1, 'Brown') \\
INSERT into city VALUES ('Austin', 1, 'James') \\
INSERT into city VALUES ('Austin', 1, 'Marconi') \\
INSERT into city VALUES ('Austin', 2, 'Marconi') \\
INSERT into city VALUES ('Austin', 2, 'Trevor') \\
INSERT into city VALUES ('Austin', 2, 'Jones') \\
INSERT into city VALUES ('Austin', 3, 'Brown') \\
INSERT into city VALUES ('Austin', 3, 'Greyson') \\
INSERT into city VALUES ('Austin', 3, 'Marconi') \\
INSERT into city VALUES ('Austin', 2, 'Brown') \\
INSERT into city VALUES ('Austin', 3, 'Marconi') \\
INSERT into city VALUES ('Austin', 3, 'Marconi' ) \\
INSERT into city VALUES ('Austin', 1, 'Marconi') \\
INSERT into city VALUES ('Austin', 1, 'Marconi') \\


SELECT
    cityid
  , districtid
  , COUNT( DISTINCT CASE WHEN districtid = mindistrictid THEN FamilyName END )     AS distinctfamiliynames
  , COUNT( * )                                                                     AS totalnamesperdistrict
FROM (
    SELECT
        city.*
      , ( SELECT
                MIN( districtid )    
            FROM city c2
            WHERE city.cityid = c2.cityid
            AND city.FamilyName = c2.FamilyName
        )    
        mindistrictid
    FROM city
)     d
GROUP BY
    cityid , districtid
ORDER BY
    cityid, districtid
\\ 

Open in new window

brgdotnetcontractor

Author

Commented:
Thanks to both of you, so very much.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial