GROUP BY
GROUP BY divides rows into groups based on one or more columns, then lets you apply aggregate functions to each group separately.
Without GROUP BY, aggregate functions return one row for the entire table. With GROUP BY, you get one row per group.
Basic Syntax
SELECT grouping_column, aggregate_function(column)
FROM table
WHERE condition -- optional: filter rows BEFORE grouping
GROUP BY grouping_column
HAVING aggregate_condition -- optional: filter AFTER grouping
ORDER BY ...;
Sample Data
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department TEXT NOT NULL,
salary REAL NOT NULL,
hire_date TEXT NOT NULL
);
INSERT INTO employees VALUES
(1, 'Priya Sharma', 'Finance', 75000, '2022-03-15'),
(2, 'Raj Patel', 'Technology', 92000, '2021-07-01'),
(3, 'Meera Singh', 'Marketing', 68000, '2023-01-10'),
(4, 'Arjun Nair', 'Finance', 81000, '2020-11-22'),
(5, 'Sunita Rao', 'Technology', 88000, '2022-09-05'),
(6, 'Dev Kumar', 'Marketing', 72000, '2021-04-18'),
(7, 'Kiran Mehta', 'Finance', 95000, '2019-06-30'),
(8, 'Ananya Das', 'Technology', 63000, '2024-02-14'),
(9, 'Vikram Joshi', 'Marketing', 77000, '2020-08-12'),
(10, 'Ritu Bansal', 'Finance', 69000, '2023-07-01');
GROUP BY One Column
-- Count of employees per department
SELECT
department,
COUNT(*) AS headcount
FROM employees
GROUP BY department;
Result:
department | headcount
-----------+----------
Finance | 4
Marketing | 3
Technology | 3
-- Average salary by department
SELECT
department,
ROUND(AVG(salary), 0) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
Result:
department | avg_salary | min_salary | max_salary
-----------+------------+------------+-----------
Finance | 80000 | 69000 | 95000
Technology | 81000 | 63000 | 92000
Marketing | 72333 | 68000 | 77000
GROUP BY Multiple Columns
Group by a combination of columns — each unique combination becomes one group:
-- Headcount by department AND hire year
SELECT
department,
SUBSTR(hire_date, 1, 4) AS hire_year,
COUNT(*) AS headcount
FROM employees
GROUP BY department, SUBSTR(hire_date, 1, 4)
ORDER BY department, hire_year;
Result:
department | hire_year | headcount
-----------+-----------+----------
Finance | 2019 | 1
Finance | 2020 | 1
Finance | 2022 | 1
Finance | 2023 | 1
Marketing | 2020 | 1
Marketing | 2021 | 1
Marketing | 2023 | 1
Technology | 2021 | 1
Technology | 2022 | 1
Technology | 2024 | 1
The Golden Rule of GROUP BY
Any column in SELECT that is NOT inside an aggregate function MUST appear in GROUP BY.
-- Wrong — 'name' is not in GROUP BY and not aggregated
SELECT name, department, COUNT(*)
FROM employees
GROUP BY department;
-- Error in PostgreSQL/MySQL
-- Correct — only group by what's in SELECT (non-aggregated)
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
WHERE vs HAVING
This is one of the most important distinctions in SQL:
| Clause | Filters | Applied To | Can Reference Aggregates? |
|---|---|---|---|
WHERE | Rows before grouping | Individual rows | No |
HAVING | Groups after grouping | Groups (results of GROUP BY) | Yes |
-- WHERE filters BEFORE grouping
-- Only count active employees (filtered before grouping)
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE salary > 70000 -- rows first
GROUP BY department;
-- HAVING filters AFTER grouping
-- Only show departments with more than 2 employees
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 2; -- groups second
Using Both WHERE and HAVING
-- Departments where the average salary of high earners (>70k) exceeds 80k
SELECT
department,
COUNT(*) AS headcount,
ROUND(AVG(salary), 0) AS avg_salary
FROM employees
WHERE salary > 70000 -- 1. First, exclude low earners
GROUP BY department -- 2. Group the remaining rows
HAVING AVG(salary) > 80000 -- 3. Keep only departments with avg > 80k
ORDER BY avg_salary DESC;
HAVING Without GROUP BY
You can use HAVING without GROUP BY — it filters the single aggregate result for the entire table:
-- Does the company have more than 5 employees?
SELECT COUNT(*) AS total
FROM employees
HAVING COUNT(*) > 5;
This is uncommon. Usually WHERE or a simple SELECT suffices.
Practical Examples
Example 1: Department Payroll Report
SELECT
department,
COUNT(*) AS headcount,
SUM(salary) AS total_payroll,
ROUND(AVG(salary), 0) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
GROUP BY department
ORDER BY total_payroll DESC;
Example 2: Large Departments Only
-- Departments with 3 or more employees
SELECT
department,
COUNT(*) AS headcount,
ROUND(AVG(salary), 0) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) >= 3
ORDER BY headcount DESC;
Example 3: Hire Year Trend
-- Hiring trend by year
SELECT
SUBSTR(hire_date, 1, 4) AS hire_year,
COUNT(*) AS new_hires,
ROUND(AVG(salary), 0) AS avg_starting_salary
FROM employees
GROUP BY SUBSTR(hire_date, 1, 4)
ORDER BY hire_year ASC;
Example 4: Departments Under Budget
-- Departments where average salary is below company average
SELECT
department,
ROUND(AVG(salary), 0) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) < (SELECT AVG(salary) FROM employees)
ORDER BY avg_salary;
Example 5: Top Department by Max Salary
-- Which department has the highest single salary?
SELECT
department,
MAX(salary) AS top_salary
FROM employees
GROUP BY department
ORDER BY top_salary DESC
LIMIT 1;
NULL Handling in GROUP BY
NULL values form their own group. If a column has NULLs, they are grouped together:
-- Example: if manager_id has NULLs
SELECT manager_id, COUNT(*) AS reports
FROM employees
GROUP BY manager_id;
-- NULLs appear as a group: manager_id = NULL, reports = N
ROLLUP — Subtotals and Grand Totals
ROLLUP adds subtotal and grand total rows automatically (PostgreSQL, MySQL, SQL Server):
-- PostgreSQL / MySQL
SELECT
department,
SUBSTR(hire_date, 1, 4) AS year,
COUNT(*) AS headcount
FROM employees
GROUP BY ROLLUP(department, SUBSTR(hire_date, 1, 4))
ORDER BY department, year;
This produces department subtotals and a grand total row (where the grouping column is NULL in the total rows).
Execution Order with GROUP BY
1. FROM — identify table(s)
2. WHERE — filter individual rows
3. GROUP BY — group remaining rows
4. HAVING — filter groups
5. SELECT — compute output
6. ORDER BY — sort output
7. LIMIT — restrict count
This explains why:
- Aliases from
SELECTcan be used inORDER BY(both run after grouping) - Aliases from
SELECTcannot be used inHAVINGin PostgreSQL (HAVING runs before SELECT)
-- PostgreSQL: HAVING must use the full expression, not alias
SELECT department, ROUND(AVG(salary), 0) AS avg_sal
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000; -- not 'avg_sal > 75000'
-- MySQL/SQLite allow the alias in HAVING (as an extension)
HAVING avg_sal > 75000; -- works in MySQL, not standard SQL
Common Mistakes
1. Non-aggregated column missing from GROUP BY
-- Wrong
SELECT name, department, COUNT(*)
FROM employees
GROUP BY department;
-- Correct
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
2. Using WHERE to filter on an aggregate
-- Wrong
SELECT department, COUNT(*) AS n
FROM employees
WHERE COUNT(*) > 2; -- Error!
-- Correct
SELECT department, COUNT(*) AS n
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;
3. Forgetting GROUP BY when using multiple aggregates
-- This is fine — one row for the whole table
SELECT SUM(salary), AVG(salary), COUNT(*) FROM employees;
-- This is wrong — mixes aggregate with non-aggregate without GROUP BY
SELECT department, SUM(salary), AVG(salary) FROM employees; -- Error!
Practice Exercises
- Count the number of employees in each department.
- Calculate the total salary payroll for each department, ordered from highest to lowest.
- Find departments where the minimum salary is above 65,000.
- Show how many employees were hired in each year, only for years with 2 or more hires.
- Find the department with the lowest average salary.
Summary
In this chapter you learned:
GROUP BY coldivides rows into groups; one output row per unique value ofcol- Any non-aggregated column in
SELECTmust appear inGROUP BY HAVINGfilters groups after aggregation (use for conditions on aggregate results)WHEREfilters rows before grouping (use for conditions on individual columns)- Both
WHEREandHAVINGcan appear in the same query — they apply at different stages - NULL values form their own group in
GROUP BY ROLLUPadds subtotal and grand total rows
Next up: Joins — the most powerful SQL feature — combining data from multiple tables.