Chapter 6 of 18

GROUP BY & HAVING

Group rows into sets and compute aggregates per group, then filter groups with HAVING.

Meritshot8 min read
SQLGROUP BYHAVINGAggregationAnalytics
All SQL Chapters

GROUP BY

GROUP BY divides rows into groups based on one or more columns, then lets you apply aggregate functions to each group separately.

Without GROUP BY, aggregate functions return one row for the entire table. With GROUP BY, you get one row per group.

Basic Syntax

SELECT grouping_column, aggregate_function(column)
FROM table
WHERE condition          -- optional: filter rows BEFORE grouping
GROUP BY grouping_column
HAVING aggregate_condition  -- optional: filter AFTER grouping
ORDER BY ...;

Sample Data

CREATE TABLE employees (
    id         INTEGER PRIMARY KEY,
    name       TEXT    NOT NULL,
    department TEXT    NOT NULL,
    salary     REAL    NOT NULL,
    hire_date  TEXT    NOT NULL
);

INSERT INTO employees VALUES
(1,  'Priya Sharma',   'Finance',    75000, '2022-03-15'),
(2,  'Raj Patel',      'Technology', 92000, '2021-07-01'),
(3,  'Meera Singh',    'Marketing',  68000, '2023-01-10'),
(4,  'Arjun Nair',     'Finance',    81000, '2020-11-22'),
(5,  'Sunita Rao',     'Technology', 88000, '2022-09-05'),
(6,  'Dev Kumar',      'Marketing',  72000, '2021-04-18'),
(7,  'Kiran Mehta',    'Finance',    95000, '2019-06-30'),
(8,  'Ananya Das',     'Technology', 63000, '2024-02-14'),
(9,  'Vikram Joshi',   'Marketing',  77000, '2020-08-12'),
(10, 'Ritu Bansal',    'Finance',    69000, '2023-07-01');

GROUP BY One Column

-- Count of employees per department
SELECT
    department,
    COUNT(*) AS headcount
FROM employees
GROUP BY department;

Result:

department | headcount
-----------+----------
Finance    | 4
Marketing  | 3
Technology | 3
-- Average salary by department
SELECT
    department,
    ROUND(AVG(salary), 0) AS avg_salary,
    MIN(salary)           AS min_salary,
    MAX(salary)           AS max_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

Result:

department | avg_salary | min_salary | max_salary
-----------+------------+------------+-----------
Finance    | 80000      | 69000      | 95000
Technology | 81000      | 63000      | 92000
Marketing  | 72333      | 68000      | 77000

GROUP BY Multiple Columns

Group by a combination of columns — each unique combination becomes one group:

-- Headcount by department AND hire year
SELECT
    department,
    SUBSTR(hire_date, 1, 4) AS hire_year,
    COUNT(*)                AS headcount
FROM employees
GROUP BY department, SUBSTR(hire_date, 1, 4)
ORDER BY department, hire_year;

Result:

department | hire_year | headcount
-----------+-----------+----------
Finance    | 2019      | 1
Finance    | 2020      | 1
Finance    | 2022      | 1
Finance    | 2023      | 1
Marketing  | 2020      | 1
Marketing  | 2021      | 1
Marketing  | 2023      | 1
Technology | 2021      | 1
Technology | 2022      | 1
Technology | 2024      | 1

The Golden Rule of GROUP BY

Any column in SELECT that is NOT inside an aggregate function MUST appear in GROUP BY.

-- Wrong — 'name' is not in GROUP BY and not aggregated
SELECT name, department, COUNT(*)
FROM employees
GROUP BY department;
-- Error in PostgreSQL/MySQL

-- Correct — only group by what's in SELECT (non-aggregated)
SELECT department, COUNT(*)
FROM employees
GROUP BY department;

WHERE vs HAVING

This is one of the most important distinctions in SQL:

ClauseFiltersApplied ToCan Reference Aggregates?
WHERERows before groupingIndividual rowsNo
HAVINGGroups after groupingGroups (results of GROUP BY)Yes
-- WHERE filters BEFORE grouping
-- Only count active employees (filtered before grouping)
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE salary > 70000     -- rows first
GROUP BY department;

-- HAVING filters AFTER grouping
-- Only show departments with more than 2 employees
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;     -- groups second

Using Both WHERE and HAVING

-- Departments where the average salary of high earners (>70k) exceeds 80k
SELECT
    department,
    COUNT(*)              AS headcount,
    ROUND(AVG(salary), 0) AS avg_salary
FROM employees
WHERE salary > 70000          -- 1. First, exclude low earners
GROUP BY department           -- 2. Group the remaining rows
HAVING AVG(salary) > 80000   -- 3. Keep only departments with avg > 80k
ORDER BY avg_salary DESC;

HAVING Without GROUP BY

You can use HAVING without GROUP BY — it filters the single aggregate result for the entire table:

-- Does the company have more than 5 employees?
SELECT COUNT(*) AS total
FROM employees
HAVING COUNT(*) > 5;

This is uncommon. Usually WHERE or a simple SELECT suffices.

Practical Examples

Example 1: Department Payroll Report

SELECT
    department,
    COUNT(*)                       AS headcount,
    SUM(salary)                    AS total_payroll,
    ROUND(AVG(salary), 0)          AS avg_salary,
    MIN(salary)                    AS min_salary,
    MAX(salary)                    AS max_salary
FROM employees
GROUP BY department
ORDER BY total_payroll DESC;

Example 2: Large Departments Only

-- Departments with 3 or more employees
SELECT
    department,
    COUNT(*) AS headcount,
    ROUND(AVG(salary), 0) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) >= 3
ORDER BY headcount DESC;

Example 3: Hire Year Trend

-- Hiring trend by year
SELECT
    SUBSTR(hire_date, 1, 4) AS hire_year,
    COUNT(*)                AS new_hires,
    ROUND(AVG(salary), 0)   AS avg_starting_salary
FROM employees
GROUP BY SUBSTR(hire_date, 1, 4)
ORDER BY hire_year ASC;

Example 4: Departments Under Budget

-- Departments where average salary is below company average
SELECT
    department,
    ROUND(AVG(salary), 0) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) < (SELECT AVG(salary) FROM employees)
ORDER BY avg_salary;

Example 5: Top Department by Max Salary

-- Which department has the highest single salary?
SELECT
    department,
    MAX(salary) AS top_salary
FROM employees
GROUP BY department
ORDER BY top_salary DESC
LIMIT 1;

NULL Handling in GROUP BY

NULL values form their own group. If a column has NULLs, they are grouped together:

-- Example: if manager_id has NULLs
SELECT manager_id, COUNT(*) AS reports
FROM employees
GROUP BY manager_id;
-- NULLs appear as a group: manager_id = NULL, reports = N

ROLLUP — Subtotals and Grand Totals

ROLLUP adds subtotal and grand total rows automatically (PostgreSQL, MySQL, SQL Server):

-- PostgreSQL / MySQL
SELECT
    department,
    SUBSTR(hire_date, 1, 4) AS year,
    COUNT(*) AS headcount
FROM employees
GROUP BY ROLLUP(department, SUBSTR(hire_date, 1, 4))
ORDER BY department, year;

This produces department subtotals and a grand total row (where the grouping column is NULL in the total rows).

Execution Order with GROUP BY

1. FROM      — identify table(s)
2. WHERE     — filter individual rows
3. GROUP BY  — group remaining rows
4. HAVING    — filter groups
5. SELECT    — compute output
6. ORDER BY  — sort output
7. LIMIT     — restrict count

This explains why:

  • Aliases from SELECT can be used in ORDER BY (both run after grouping)
  • Aliases from SELECT cannot be used in HAVING in PostgreSQL (HAVING runs before SELECT)
-- PostgreSQL: HAVING must use the full expression, not alias
SELECT department, ROUND(AVG(salary), 0) AS avg_sal
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000;  -- not 'avg_sal > 75000'

-- MySQL/SQLite allow the alias in HAVING (as an extension)
HAVING avg_sal > 75000;       -- works in MySQL, not standard SQL

Common Mistakes

1. Non-aggregated column missing from GROUP BY

-- Wrong
SELECT name, department, COUNT(*)
FROM employees
GROUP BY department;

-- Correct
SELECT department, COUNT(*)
FROM employees
GROUP BY department;

2. Using WHERE to filter on an aggregate

-- Wrong
SELECT department, COUNT(*) AS n
FROM employees
WHERE COUNT(*) > 2;   -- Error!

-- Correct
SELECT department, COUNT(*) AS n
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;

3. Forgetting GROUP BY when using multiple aggregates

-- This is fine — one row for the whole table
SELECT SUM(salary), AVG(salary), COUNT(*) FROM employees;

-- This is wrong — mixes aggregate with non-aggregate without GROUP BY
SELECT department, SUM(salary), AVG(salary) FROM employees;  -- Error!

Practice Exercises

  1. Count the number of employees in each department.
  2. Calculate the total salary payroll for each department, ordered from highest to lowest.
  3. Find departments where the minimum salary is above 65,000.
  4. Show how many employees were hired in each year, only for years with 2 or more hires.
  5. Find the department with the lowest average salary.

Summary

In this chapter you learned:

  • GROUP BY col divides rows into groups; one output row per unique value of col
  • Any non-aggregated column in SELECT must appear in GROUP BY
  • HAVING filters groups after aggregation (use for conditions on aggregate results)
  • WHERE filters rows before grouping (use for conditions on individual columns)
  • Both WHERE and HAVING can appear in the same query — they apply at different stages
  • NULL values form their own group in GROUP BY
  • ROLLUP adds subtotal and grand total rows

Next up: Joins — the most powerful SQL feature — combining data from multiple tables.