<

Go Premium for a chance to win a PS4. Enter to Win

x

SQL Server GROUP BY

Published on
41,364 Points
4,264 Views
16 Endorsements
Last Modified:
Awarded
Jim Horn
Microsoft SQL Server database developer, architect, and author specializing in Business Intelligence, ETL, and Data Warehousing.
A montage of common Microsoft SQL Server GROUP BY solutions commonly provided by SQL Server experts here at Experts Exchange.
GROUP BY allows a query to perform an aggregate calculation on a SET of values, and return a single value.  The most common used aggregates are Sum and Count, follwed by Average, Minimum, and Maximum.  

So let's queue this up with some demo code, then I'll lay out common solutions in increasing levels of difficulty.
 

Setup

Let's create and populate three tables with sample data of customers, customer types, sales, and sales line items.
 
-- Delete the tables
IF EXISTS(SELECT name FROM sys.tables WHERE name='sale_line_items')
   DROP TABLE sale_line_items
GO

IF EXISTS(SELECT name FROM sys.tables WHERE name='sale')
   DROP TABLE sale
GO

IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
   DROP TABLE customer
GO

IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
   DROP TABLE customer_type
GO

-- Create the tables
CREATE TABLE customer_type(
   id int PRIMARY KEY NOT NULL, 
   name varchar(50)) 

CREATE TABLE customer (
   id int identity(1,1) PRIMARY KEY NOT NULL, 
   fk_customer_type_id int REFERENCES customer_type(id), 
   name varchar(50)) 

CREATE TABLE sale(
   id int identity(1,1) PRIMARY KEY NOT NULL, 
   fk_customer_id int REFERENCES customer(id), 
   dt date) 
GO

CREATE TABLE sale_line_items(
   id int identity(1,1) PRIMARY KEY NOT NULL,
   fk_sale_id int REFERENCES sale(id),
   fk_product_id int, 
   qty int, 
   unit_price numeric(19,2), 
   discount_pct decimal(5,4) NOT NULL DEFAULT 0 
      -- Only allow values between zero and 100%
      CONSTRAINT ck_discount CHECK (discount_pct >= 0 AND discount_pct <= 1), 
   ext_price as (qty * unit_price) * (1 - discount_pct))
GO

INSERT INTO customer_type (id, name) 
VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')

INSERT INTO customer (name, fk_customer_type_id ) 
VALUES 
   ('Johnny B. Goode',1), ('Buckaroo Banzai',1), ('Rick Springfield',2), ('Wile E. Coyote',3), ('Bono', 2), 
   ('Officer Barbrady',3), ('Sweet Polly Purebred',3), ('Sammy Hagar',2), ('Bart Simpson',3), ('Marty McFly',1)

Declare @dt datetime = '2013-09-01' 

INSERT INTO sale (fk_customer_id, dt) 
VALUES 
   (1, @dt), 
   (2, DATEADD(d, 1, @dt)), 
   (3, DATEADD(d, 2, @dt)),
   (4, DATEADD(d, 3, @dt)), (1, DATEADD(d, 3, @dt)), (5, DATEADD(d, 3, @dt)),
   (2, DATEADD(d, 4, @dt)), (3, DATEADD(d, 4, @dt)), (9, DATEADD(d, 4, @dt)),
   (4, DATEADD(d, 5, @dt)), (7, DATEADD(d, 5, @dt)), (1, DATEADD(d, 5, @dt)),
   (9, DATEADD(d, 6, @dt)), (5, DATEADD(d, 6, @dt)), (1, DATEADD(d, 6, @dt)),
   (4, DATEADD(d, 7, @dt)), (6, DATEADD(d, 7, @dt)), 
   (9, DATEADD(d, 8, @dt)), (2, DATEADD(d, 8, @dt)), 
   (1, DATEADD(d, 9, @dt)) 

INSERT INTO sale_line_items (fk_sale_id, fk_product_id, qty, unit_price, discount_pct)
VALUES 
   (1,  1, 100, .98, 0), (1, 42, 2, 84, 0), (1, 55, 10, 4.95, .2), 
   (2,  71, 100, .98, 0), (2, 45, 2, 84, 0), 
   (3,  100, 100, .98, 0),
   (4,  22, 100, .98, 0), (4, 62, 2, 84, 0), (4, 455, 10, 4.95, .25), (4, 55, 10, 4.95, 0),
   (5,  1, 100, .98, 0), (5, 42, 2, 84, 0), (5, 555, 10, 4.95, .05),
	 
   (6,  91, 100, .98, 0), (6, 322, 2, 84, 0), (6, 565, 10, 4.95, .075), 
   (7,  333, 73, .98, 0), (7, 422, 2, 84, 0), (7, 595, 10, 4.95, .5), 
   (8,  1, 1, .98, 0), 
   (9,  3, 100, .98, 0), (9, 142, 2, 84, 0), (9, 5005, 10, 4.95, .5), 
   (10,  812, 100, .98, 0), (10, 42, 2, 84, 0), (10, 551, 10, 4.95, .333), (10, 42, 2, 84, 0), (10, 55, 10, 4.95, .05),
   
   (11,  1, 100, .98, 0), (11, 42, 2, 84, 0), (11, 55, 10, 4.95, .2), 
   (12,  71, 100, .98, 0), (12, 45, 2, 84, 0), 
   (13,  100, 100, .98, 0),
   (14,  22, 100, .98, 0), (14, 62, 2, 84, 0), (14, 455, 10, 4.95, .25), (14, 55, 10, 4.95, 0),
   (15,  1, 100, .98, 0), (15, 42, 2, 84, 0), (15, 555, 10, 4.95, .05),
	 
   (16,  91, 100, .98, 0), (16, 322, 2, 84, 0), (16, 565, 10, 4.95, .075), 
   (17,  333, 73, .98, 0), (17, 422, 2, 84, 0), (17, 595, 10, 4.95, .5), 
   (18,  1, 1, .98, 0), 
   (19,  3, 100, .98, 0), (19, 142, 2, 84, 0), (19, 5005, 10, 4.95, .5), 
   (20,  812, 100, .98, 0), (20, 42, 2, 84, 0), (20, 551, 10, 4.95, .333), (20, 42, 2, 84, 0), (20, 55, 10, 4.95, .05)
   

-- All rows in all tables
SELECT * FROM customer
SELECT * FROM customer_type
SELECT * FROM sale
SELECT * FROM sale_line_items

Open in new window

For starters, let's get the query clause order straight.

SELECT - FROM - WHERE - GROUP BY - HAVING - ORDER BY

And to sear this order into your brain, this mnemonic was learned in one of my few formal  SQL Server classes.  I deeply regret doing this to you, but it's the only one I know of.  If anyone can come up with something more friendly, please let me know...

SWEATY  - FEET  - WILL  - GIVE - HORRIBLE - ODORS

Good luck getting that out of your head.
 

Crawl


1. Simple Aggregate: Give me the Sum / Count of something


-- All sales by sale date
SELECT s.id, s.dt, COUNT(si.ext_price) as sales_count, SUM(si.ext_price) as sales_price
FROM sale s
   JOIN sale_line_items si ON s.id = si.fk_sale_id 
GROUP BY s.id, s.dt

-- All Sales by customer and sale date
SELECT s.id, c.name, s.dt, COUNT(si.ext_price) as sales_count, SUM(si.ext_price) as sales_revenue
FROM sale s
   JOIN customer c ON s.fk_customer_id = c.id
   JOIN sale_line_items si ON s.id = si.fk_sale_id 
GROUP BY s.id, c.name, s.dt

Open in new window

1 resultsNotice that in the above queries all columns in the SELECT clause that are NOT involved in an aggregate such as Sum and Count are in the GROUP BY clause (see red rectangles below).  Other than hard-coded values, any deviation will result in an error. Non aggregated columnsOther common examples:

  • What is the total orders per day?
  • How many times did users log in to my application?
  • How many times did people vote in this election?
 

2. Include rows whose totals equal zero: LEFT JOIN

This is typical when the rows involve an ordered list, such as January, February ..., December, states in a country, etc. where the business wishes that ALL rows are in the set, regardless of if they have any values.

-- Total Sales by Customer, including the customers that didn't buy anything:  customers LEFT JOIN
SELECT c.id, c.name, SUM(COALESCE(si.ext_price,0)) as sales_price
FROM customer c
   LEFT JOIN sale s ON c.id = s.fk_customer_id   -- Include all customer in results
   LEFT JOIN sale_line_items si ON s.id = si.fk_sale_id   -- Include all customer-sale in results
GROUP BY c.id, c.name
ORDER BY c.name

Open in new window

2 resultsCOALESCE is a function used to find the first NULL value.  Here I use it to replace NULL values with zero to make the result more readable.  If the difference between NULL and zero matters to you, remove the COALESCE.
for more info on Joins see SQL Server: Table Joins Explained!
 

WALK


3. Filter rows by a condition on an aggregate value: HAVING

To filter on a single column / single values, use the WHERE clause.

-- Total Sales by Customer, where each sale is greater than $100:  WHERE
SELECT c.id, c.name, SUM(si.ext_price) as sales_price
FROM customer c
   JOIN sale s ON c.id = s.fk_customer_id 
   JOIN sale_line_items si ON s.id = si.fk_sale_id 
WHERE si.ext_price > 100
GROUP BY c.id, c.name
ORDER BY Sum(si.ext_price) DESC

Open in new window

3 results - WHEREOther common examples:

  • Give me all sales that are below $0 (credits, errors).
  • Give me all activity for a specific customer (state, month, product, you get the idea).
If you need to filter on a calculated column, such as the sum of sales, counts of visits, or count of claims, use the HAVING clause exactly the same way you would use the WHERE clause.
-- Total Sales by Customer, where total sales are over $500:  HAVING
SELECT c.id, c.name, SUM(si.ext_price) as sales_price
FROM customer c
   JOIN sale s ON c.id = s.fk_customer_id 
   JOIN sale_line_items si ON s.id = si.fk_sale_id 
GROUP BY c.id, c.name
HAVING SUM(si.ext_price) > 500
ORDER BY Sum(si.ext_price) DESC

Open in new window


3 results - HAVINGNotice that in the above queries all columns in the SELECT clause that are NOT involved in an aggregate such as Sum and Count are in the GROUP BY clause (see red rectangles below).  Other than hard-coded values, any deviation will result in an error.

Examples:
  • All sales agents whose combined sales were greater than $1 million --> HAVING SUM(Sales) > 1000000
  • All NFL players that rushed for more than 2,000 yards in a single season --> HAVING SUM(yards) > 2000
  • All Customers with duplicate rows --> HAVING COUNT(some_column) > 0
 

RUN!!!


4. Order rows based on a value within each group: RANK with PARTITION BY

SQL Server ranking functions can be used to generate a rank order of a column, and PARTITION BY performs this within a group.  

With a single column of values..
 
SELECT ct.name as customer_type, c.name as customer, si.ext_price, 
   RANK() OVER (PARTITION BY ct.name ORDER BY si.ext_price DESC) as rank_order
FROM sale s
   JOIN sale_line_items si ON s.id = si.fk_sale_id 
   JOIN customer c ON s.fk_customer_id = c.id
   JOIN customer_type ct ON c.fk_customer_type_id = ct.id
GROUP BY ct.name, c.name, si.ext_price

Open in new window

4 results - RANK on a single column.. and with a Sum() of values to give totals within groups
 
SELECT 
   customer_type, 
   RANK() OVER (PARTITION BY customer_type ORDER BY sales_revenue DESC) as rank_order,
   customer, 
   sales_count, 
   sales_revenue
FROM (
   -- The inner query gets the sales count and sum
   SELECT ct.name as customer_type, c.name as customer, 
   COUNT(si.ext_price) as sales_count, 
   SUM(si.ext_price) as sales_revenue
   FROM sale s
   JOIN sale_line_items si ON s.id = si.fk_sale_id 
   JOIN customer c ON s.fk_customer_id = c.id
   JOIN customer_type ct ON c.fk_customer_type_id = ct.id
   GROUP BY ct.name, c.name) a

Open in new window


4 results - RANK with PARTITION BY on a SUM value 

5. Aggregate AND values from a single row that make up the aggregate: Subquery

The subquery in the middle returns

  • id and name -- The columns that are grouped
  • total_sales - The Max(si.ext_price), which is the biggest sale
The outer query returns all of the subquery, and
  • s.dt -- The date of that last sale.
SELECT a.id, a.name, a.max_ext_price, s.dt as last_sale
FROM (
   -- Subquery that generates sales_price
   SELECT c.id, c.name, MAX(si.ext_price) as max_ext_price
   FROM customer c
      LEFT JOIN sale s ON c.id = s.fk_customer_id 
      JOIN sale_line_items si ON s.id = si.fk_sale_id 
      GROUP BY c.id, c.name) a 
      LEFT JOIN sale s ON a.id = s.fk_customer_id 

Open in new window


5 results

6. PIVOT: One column as rows, another column as columns, with an aggregate value as the 'cell'

By Day in Month

SELECT name as customer_name, 
   [1] as '1', [2] as '2', [3] as '3', [4] as '4', [5] as '5',  [6] as '6', [7] as '7', [8] as '8', [9] as '9', [10] as '10' 
FROM (
   SELECT c.name, DATEPART(d, s.dt) as day_of_month, si.ext_price as sales
   FROM sale_line_items si
   JOIN sale s ON si.fk_sale_id = s.id
   JOIN customer c ON c.id = s.fk_customer_id) p
PIVOT
(
SUM(sales)
FOR day_of_month IN (
   [1], [2], [3], [4], [5],  [6], [7], [8], [9], [10])
) as pivottable
ORDER BY name

Open in new window

6 results - daysBy day in week

SELECT name as customer_name, 
   [1] as Sunday, [2] as Monday, [3] as Tuesday, [4] as Wednesday, [5] as Thursday,  [6] as Friday, [7] as Saturday
FROM (
   SELECT c.name, DATEPART(dw, s.dt) as day_of_week, si.ext_price as sales
   FROM sale_line_items si
   JOIN sale s ON si.fk_sale_id = s.id
   JOIN customer c ON c.id = s.fk_customer_id) p
PIVOT
(
SUM(sales)
FOR day_of_week IN ([1], [2], [3], [4], [5],  [6], [7])) as pivottable
ORDER BY name

Open in new window

6 results - weekdays
Thank you for reading my article, please leave valuable feedback.
If you liked this article please click the 'Good Article' button. 

I look forward to hearing from you. -  Jim - ( LinkedIn )  ( Twitter )
16
Comment
Author:Jim Horn
1 Comment
 

Expert Comment

by:BernardGBailey
Hi Jim,
Pity I didn't read this earlier.

I know there is so much more that SQL Server queries can do than I know how to do, so I'm always on the lookout for tips like this.

Well done.

Cheers
Bernard
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Join & Write a Comment

Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month