Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

SQL Server GROUP BY

Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Microsoft SQL Server database developer, architect, and author specializing in Business Intelligence, ETL, and Data Warehousing.
Published:
Updated:
A montage of common Microsoft SQL Server GROUP BY solutions commonly provided by SQL Server experts here at Experts Exchange.
GROUP BY allows a query to perform an aggregate calculation on a SET of values, and return a single value.  The most common used aggregates are Sum and Count, follwed by Average, Minimum, and Maximum.  

So let's queue this up with some demo code, then I'll lay out common solutions in increasing levels of difficulty.
 

Setup

Let's create and populate three tables with sample data of customers, customer types, sales, and sales line items.
 
-- Delete the tables
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='sale_line_items')
                         DROP TABLE sale_line_items
                      GO
                      
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='sale')
                         DROP TABLE sale
                      GO
                      
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='customer')
                         DROP TABLE customer
                      GO
                      
                      IF EXISTS(SELECT name FROM sys.tables WHERE name='customer_type')
                         DROP TABLE customer_type
                      GO
                      
                      -- Create the tables
                      CREATE TABLE customer_type(
                         id int PRIMARY KEY NOT NULL, 
                         name varchar(50)) 
                      
                      CREATE TABLE customer (
                         id int identity(1,1) PRIMARY KEY NOT NULL, 
                         fk_customer_type_id int REFERENCES customer_type(id), 
                         name varchar(50)) 
                      
                      CREATE TABLE sale(
                         id int identity(1,1) PRIMARY KEY NOT NULL, 
                         fk_customer_id int REFERENCES customer(id), 
                         dt date) 
                      GO
                      
                      CREATE TABLE sale_line_items(
                         id int identity(1,1) PRIMARY KEY NOT NULL,
                         fk_sale_id int REFERENCES sale(id),
                         fk_product_id int, 
                         qty int, 
                         unit_price numeric(19,2), 
                         discount_pct decimal(5,4) NOT NULL DEFAULT 0 
                            -- Only allow values between zero and 100%
                            CONSTRAINT ck_discount CHECK (discount_pct >= 0 AND discount_pct <= 1), 
                         ext_price as (qty * unit_price) * (1 - discount_pct))
                      GO
                      
                      INSERT INTO customer_type (id, name) 
                      VALUES (1, 'Movie Characters'), (2, 'Rock Stars'), (3, 'Cartoon Characters')
                      
                      INSERT INTO customer (name, fk_customer_type_id ) 
                      VALUES 
                         ('Johnny B. Goode',1), ('Buckaroo Banzai',1), ('Rick Springfield',2), ('Wile E. Coyote',3), ('Bono', 2), 
                         ('Officer Barbrady',3), ('Sweet Polly Purebred',3), ('Sammy Hagar',2), ('Bart Simpson',3), ('Marty McFly',1)
                      
                      Declare @dt datetime = '2013-09-01' 
                      
                      INSERT INTO sale (fk_customer_id, dt) 
                      VALUES 
                         (1, @dt), 
                         (2, DATEADD(d, 1, @dt)), 
                         (3, DATEADD(d, 2, @dt)),
                         (4, DATEADD(d, 3, @dt)), (1, DATEADD(d, 3, @dt)), (5, DATEADD(d, 3, @dt)),
                         (2, DATEADD(d, 4, @dt)), (3, DATEADD(d, 4, @dt)), (9, DATEADD(d, 4, @dt)),
                         (4, DATEADD(d, 5, @dt)), (7, DATEADD(d, 5, @dt)), (1, DATEADD(d, 5, @dt)),
                         (9, DATEADD(d, 6, @dt)), (5, DATEADD(d, 6, @dt)), (1, DATEADD(d, 6, @dt)),
                         (4, DATEADD(d, 7, @dt)), (6, DATEADD(d, 7, @dt)), 
                         (9, DATEADD(d, 8, @dt)), (2, DATEADD(d, 8, @dt)), 
                         (1, DATEADD(d, 9, @dt)) 
                      
                      INSERT INTO sale_line_items (fk_sale_id, fk_product_id, qty, unit_price, discount_pct)
                      VALUES 
                         (1,  1, 100, .98, 0), (1, 42, 2, 84, 0), (1, 55, 10, 4.95, .2), 
                         (2,  71, 100, .98, 0), (2, 45, 2, 84, 0), 
                         (3,  100, 100, .98, 0),
                         (4,  22, 100, .98, 0), (4, 62, 2, 84, 0), (4, 455, 10, 4.95, .25), (4, 55, 10, 4.95, 0),
                         (5,  1, 100, .98, 0), (5, 42, 2, 84, 0), (5, 555, 10, 4.95, .05),
                      	 
                         (6,  91, 100, .98, 0), (6, 322, 2, 84, 0), (6, 565, 10, 4.95, .075), 
                         (7,  333, 73, .98, 0), (7, 422, 2, 84, 0), (7, 595, 10, 4.95, .5), 
                         (8,  1, 1, .98, 0), 
                         (9,  3, 100, .98, 0), (9, 142, 2, 84, 0), (9, 5005, 10, 4.95, .5), 
                         (10,  812, 100, .98, 0), (10, 42, 2, 84, 0), (10, 551, 10, 4.95, .333), (10, 42, 2, 84, 0), (10, 55, 10, 4.95, .05),
                         
                         (11,  1, 100, .98, 0), (11, 42, 2, 84, 0), (11, 55, 10, 4.95, .2), 
                         (12,  71, 100, .98, 0), (12, 45, 2, 84, 0), 
                         (13,  100, 100, .98, 0),
                         (14,  22, 100, .98, 0), (14, 62, 2, 84, 0), (14, 455, 10, 4.95, .25), (14, 55, 10, 4.95, 0),
                         (15,  1, 100, .98, 0), (15, 42, 2, 84, 0), (15, 555, 10, 4.95, .05),
                      	 
                         (16,  91, 100, .98, 0), (16, 322, 2, 84, 0), (16, 565, 10, 4.95, .075), 
                         (17,  333, 73, .98, 0), (17, 422, 2, 84, 0), (17, 595, 10, 4.95, .5), 
                         (18,  1, 1, .98, 0), 
                         (19,  3, 100, .98, 0), (19, 142, 2, 84, 0), (19, 5005, 10, 4.95, .5), 
                         (20,  812, 100, .98, 0), (20, 42, 2, 84, 0), (20, 551, 10, 4.95, .333), (20, 42, 2, 84, 0), (20, 55, 10, 4.95, .05)
                         
                      
                      -- All rows in all tables
                      SELECT * FROM customer
                      SELECT * FROM customer_type
                      SELECT * FROM sale
                      SELECT * FROM sale_line_items

Open in new window

For starters, let's get the query clause order straight.

SELECT - FROM - WHERE - GROUP BY - HAVING - ORDER BY

And to sear this order into your brain, this mnemonic was learned in one of my few formal  SQL Server classes.  I deeply regret doing this to you, but it's the only one I know of.  If anyone can come up with something more friendly, please let me know...

SWEATY  - FEET  - WILL  - GIVE - HORRIBLE - ODORS

Good luck getting that out of your head.
 

Crawl


1. Simple Aggregate: Give me the Sum / Count of something


-- All sales by sale date
                      SELECT s.id, s.dt, COUNT(si.ext_price) as sales_count, SUM(si.ext_price) as sales_price
                      FROM sale s
                         JOIN sale_line_items si ON s.id = si.fk_sale_id 
                      GROUP BY s.id, s.dt
                      
                      -- All Sales by customer and sale date
                      SELECT s.id, c.name, s.dt, COUNT(si.ext_price) as sales_count, SUM(si.ext_price) as sales_revenue
                      FROM sale s
                         JOIN customer c ON s.fk_customer_id = c.id
                         JOIN sale_line_items si ON s.id = si.fk_sale_id 
                      GROUP BY s.id, c.name, s.dt

Open in new window

1 resultsNotice that in the above queries all columns in the SELECT clause that are NOT involved in an aggregate such as Sum and Count are in the GROUP BY clause (see red rectangles below).  Other than hard-coded values, any deviation will result in an error. Non aggregated columnsOther common examples:

  • What is the total orders per day?
  • How many times did users log in to my application?
  • How many times did people vote in this election?
 

2. Include rows whose totals equal zero: LEFT JOIN

This is typical when the rows involve an ordered list, such as January, February ..., December, states in a country, etc. where the business wishes that ALL rows are in the set, regardless of if they have any values.

-- Total Sales by Customer, including the customers that didn't buy anything:  customers LEFT JOIN
                      SELECT c.id, c.name, SUM(COALESCE(si.ext_price,0)) as sales_price
                      FROM customer c
                         LEFT JOIN sale s ON c.id = s.fk_customer_id   -- Include all customer in results
                         LEFT JOIN sale_line_items si ON s.id = si.fk_sale_id   -- Include all customer-sale in results
                      GROUP BY c.id, c.name
                      ORDER BY c.name

Open in new window

2 resultsCOALESCE is a function used to find the first NULL value.  Here I use it to replace NULL values with zero to make the result more readable.  If the difference between NULL and zero matters to you, remove the COALESCE.
for more info on Joins see SQL Server: Table Joins Explained!
 

WALK


3. Filter rows by a condition on an aggregate value: HAVING

To filter on a single column / single values, use the WHERE clause.

-- Total Sales by Customer, where each sale is greater than $100:  WHERE
                      SELECT c.id, c.name, SUM(si.ext_price) as sales_price
                      FROM customer c
                         JOIN sale s ON c.id = s.fk_customer_id 
                         JOIN sale_line_items si ON s.id = si.fk_sale_id 
                      WHERE si.ext_price > 100
                      GROUP BY c.id, c.name
                      ORDER BY Sum(si.ext_price) DESC

Open in new window

3 results - WHEREOther common examples:

  • Give me all sales that are below $0 (credits, errors).
  • Give me all activity for a specific customer (state, month, product, you get the idea).
If you need to filter on a calculated column, such as the sum of sales, counts of visits, or count of claims, use the HAVING clause exactly the same way you would use the WHERE clause.
-- Total Sales by Customer, where total sales are over $500:  HAVING
                      SELECT c.id, c.name, SUM(si.ext_price) as sales_price
                      FROM customer c
                         JOIN sale s ON c.id = s.fk_customer_id 
                         JOIN sale_line_items si ON s.id = si.fk_sale_id 
                      GROUP BY c.id, c.name
                      HAVING SUM(si.ext_price) > 500
                      ORDER BY Sum(si.ext_price) DESC

Open in new window


3 results - HAVINGNotice that in the above queries all columns in the SELECT clause that are NOT involved in an aggregate such as Sum and Count are in the GROUP BY clause (see red rectangles below).  Other than hard-coded values, any deviation will result in an error.

Examples:
  • All sales agents whose combined sales were greater than $1 million --> HAVING SUM(Sales) > 1000000
  • All NFL players that rushed for more than 2,000 yards in a single season --> HAVING SUM(yards) > 2000
  • All Customers with duplicate rows --> HAVING COUNT(some_column) > 0
 

RUN!!!


4. Order rows based on a value within each group: RANK with PARTITION BY

SQL Server ranking functions can be used to generate a rank order of a column, and PARTITION BY performs this within a group.  

With a single column of values..
 
SELECT ct.name as customer_type, c.name as customer, si.ext_price, 
                         RANK() OVER (PARTITION BY ct.name ORDER BY si.ext_price DESC) as rank_order
                      FROM sale s
                         JOIN sale_line_items si ON s.id = si.fk_sale_id 
                         JOIN customer c ON s.fk_customer_id = c.id
                         JOIN customer_type ct ON c.fk_customer_type_id = ct.id
                      GROUP BY ct.name, c.name, si.ext_price

Open in new window

4 results - RANK on a single column.. and with a Sum() of values to give totals within groups
 
SELECT 
                         customer_type, 
                         RANK() OVER (PARTITION BY customer_type ORDER BY sales_revenue DESC) as rank_order,
                         customer, 
                         sales_count, 
                         sales_revenue
                      FROM (
                         -- The inner query gets the sales count and sum
                         SELECT ct.name as customer_type, c.name as customer, 
                         COUNT(si.ext_price) as sales_count, 
                         SUM(si.ext_price) as sales_revenue
                         FROM sale s
                         JOIN sale_line_items si ON s.id = si.fk_sale_id 
                         JOIN customer c ON s.fk_customer_id = c.id
                         JOIN customer_type ct ON c.fk_customer_type_id = ct.id
                         GROUP BY ct.name, c.name) a

Open in new window


4 results - RANK with PARTITION BY on a SUM value 

5. Aggregate AND values from a single row that make up the aggregate: Subquery

The subquery in the middle returns

  • id and name -- The columns that are grouped
  • total_sales - The Max(si.ext_price), which is the biggest sale
The outer query returns all of the subquery, and
  • s.dt -- The date of that last sale.
SELECT a.id, a.name, a.max_ext_price, s.dt as last_sale
                      FROM (
                         -- Subquery that generates sales_price
                         SELECT c.id, c.name, MAX(si.ext_price) as max_ext_price
                         FROM customer c
                            LEFT JOIN sale s ON c.id = s.fk_customer_id 
                            JOIN sale_line_items si ON s.id = si.fk_sale_id 
                            GROUP BY c.id, c.name) a 
                            LEFT JOIN sale s ON a.id = s.fk_customer_id 

Open in new window


5 results

6. PIVOT: One column as rows, another column as columns, with an aggregate value as the 'cell'

By Day in Month

SELECT name as customer_name, 
                         [1] as '1', [2] as '2', [3] as '3', [4] as '4', [5] as '5',  [6] as '6', [7] as '7', [8] as '8', [9] as '9', [10] as '10' 
                      FROM (
                         SELECT c.name, DATEPART(d, s.dt) as day_of_month, si.ext_price as sales
                         FROM sale_line_items si
                         JOIN sale s ON si.fk_sale_id = s.id
                         JOIN customer c ON c.id = s.fk_customer_id) p
                      PIVOT
                      (
                      SUM(sales)
                      FOR day_of_month IN (
                         [1], [2], [3], [4], [5],  [6], [7], [8], [9], [10])
                      ) as pivottable
                      ORDER BY name

Open in new window

6 results - daysBy day in week

SELECT name as customer_name, 
                         [1] as Sunday, [2] as Monday, [3] as Tuesday, [4] as Wednesday, [5] as Thursday,  [6] as Friday, [7] as Saturday
                      FROM (
                         SELECT c.name, DATEPART(dw, s.dt) as day_of_week, si.ext_price as sales
                         FROM sale_line_items si
                         JOIN sale s ON si.fk_sale_id = s.id
                         JOIN customer c ON c.id = s.fk_customer_id) p
                      PIVOT
                      (
                      SUM(sales)
                      FOR day_of_week IN ([1], [2], [3], [4], [5],  [6], [7])) as pivottable
                      ORDER BY name

Open in new window

6 results - weekdays
Thank you for reading my article, please leave valuable feedback.
If you liked this article please click the 'Good Article' button. 

I look forward to hearing from you. -  Jim - ( LinkedIn )  ( Twitter )
16
7,296 Views
Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Microsoft SQL Server database developer, architect, and author specializing in Business Intelligence, ETL, and Data Warehousing.

Comments (1)

Hi Jim,
Pity I didn't read this earlier.

I know there is so much more that SQL Server queries can do than I know how to do, so I'm always on the lookout for tips like this.

Well done.

Cheers
Bernard

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.