Common Table Expressions (CTEs)
A Common Table Expression (CTE) is a temporary named result set defined with the WITH keyword. It makes complex queries far more readable by breaking them into named steps.
WITH cte_name AS (
SELECT ... -- the CTE definition
)
SELECT * FROM cte_name; -- use the CTE like a table
CTEs are similar to derived tables (subqueries in FROM) but are:
- Named and reusable within the same query
- Written at the top — easier to read and reason about
- Required for recursive queries
Why Use CTEs?
Before — Nested Subqueries (Hard to Read)
SELECT dept, avg_sal
FROM (
SELECT department AS dept, ROUND(AVG(salary), 0) AS avg_sal
FROM (
SELECT name, department, salary FROM employees WHERE is_active = 1
) active
GROUP BY department
) dept_stats
WHERE avg_sal > 75000;
After — CTEs (Easy to Read)
WITH active_employees AS (
SELECT name, department, salary
FROM employees
WHERE is_active = 1
),
dept_stats AS (
SELECT department AS dept, ROUND(AVG(salary), 0) AS avg_sal
FROM active_employees
GROUP BY department
)
SELECT dept, avg_sal
FROM dept_stats
WHERE avg_sal > 75000;
Same logic, dramatically clearer structure.
Sample Data
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT,
department TEXT,
salary REAL,
hire_date TEXT,
manager_id INTEGER
);
INSERT INTO employees VALUES
(1, 'Priya Sharma', 'Finance', 75000, '2022-03-15', 4),
(2, 'Raj Patel', 'Technology', 92000, '2021-07-01', NULL),
(3, 'Meera Singh', 'Marketing', 68000, '2023-01-10', NULL),
(4, 'Arjun Nair', 'Finance', 81000, '2020-11-22', NULL),
(5, 'Sunita Rao', 'Technology', 88000, '2022-09-05', 2),
(6, 'Dev Kumar', 'Marketing', 72000, '2021-04-18', 3),
(7, 'Kiran Mehta', 'Finance', 95000, '2019-06-30', 4),
(8, 'Ananya Das', 'Technology', 63000, '2024-02-14', 2);
Multiple CTEs
Define multiple CTEs by separating them with commas:
WITH
finance_team AS (
SELECT name, salary FROM employees WHERE department = 'Finance'
),
tech_team AS (
SELECT name, salary FROM employees WHERE department = 'Technology'
),
comparison AS (
SELECT 'Finance' AS dept, ROUND(AVG(salary), 0) AS avg_sal FROM finance_team
UNION ALL
SELECT 'Technology', ROUND(AVG(salary), 0) FROM tech_team
)
SELECT * FROM comparison;
Each CTE can reference CTEs defined before it.
Practical CTE Examples
Example 1: Top Earner per Department
WITH ranked AS (
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn = 1;
Example 2: Above-Average Earners with Department Stats
WITH dept_stats AS (
SELECT
department,
ROUND(AVG(salary), 0) AS dept_avg,
COUNT(*) AS headcount
FROM employees
GROUP BY department
),
enriched AS (
SELECT
e.name,
e.department,
e.salary,
d.dept_avg,
e.salary - d.dept_avg AS diff_from_avg
FROM employees e
JOIN dept_stats d ON e.department = d.department
)
SELECT *
FROM enriched
WHERE diff_from_avg > 0
ORDER BY diff_from_avg DESC;
Example 3: Multi-Step Data Pipeline
WITH
-- Step 1: Filter and clean
base AS (
SELECT id, name, department, salary
FROM employees
WHERE salary IS NOT NULL AND salary > 0
),
-- Step 2: Compute metrics
with_metrics AS (
SELECT
*,
ROUND(salary / 12, 2) AS monthly,
CASE
WHEN salary >= 90000 THEN 'A'
WHEN salary >= 75000 THEN 'B'
ELSE 'C'
END AS band
FROM base
),
-- Step 3: Count by band
band_summary AS (
SELECT band, COUNT(*) AS count, ROUND(AVG(salary), 0) AS avg_sal
FROM with_metrics
GROUP BY band
)
SELECT * FROM band_summary ORDER BY band;
Recursive CTEs
A recursive CTE calls itself — useful for hierarchical or tree-structured data (org charts, category trees, bill of materials).
Syntax
WITH RECURSIVE cte_name AS (
-- Base case (non-recursive): starting point
SELECT ...
UNION ALL
-- Recursive case: references cte_name itself
SELECT ... FROM source JOIN cte_name ON ...
)
SELECT * FROM cte_name;
Example 1: Organisational Hierarchy
-- Find all reports under Arjun Nair (id=4), including indirect reports
WITH RECURSIVE org_chart AS (
-- Base: start with Arjun
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE id = 4
UNION ALL
-- Recursive: find their direct reports
SELECT e.id, e.name, e.manager_id, oc.depth + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT
REPEAT(' ', depth) || name AS hierarchy,
depth AS level
FROM org_chart
ORDER BY depth, name;
Result:
hierarchy | level
---------------------+------
Arjun Nair | 0
Priya Sharma | 1
Kiran Mehta | 1
Example 2: Generate a Number Series (SQLite/PostgreSQL)
-- PostgreSQL has generate_series() — but here's the CTE approach
WITH RECURSIVE numbers AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM numbers WHERE n < 10
)
SELECT n FROM numbers;
-- Returns: 1, 2, 3, ..., 10
Example 3: Generate a Date Range
WITH RECURSIVE date_series AS (
SELECT DATE('2026-01-01') AS dt
UNION ALL
SELECT DATE(dt, '+1 day') FROM date_series WHERE dt < DATE('2026-01-10')
)
SELECT dt FROM date_series;
-- 2026-01-01 through 2026-01-10
Prevent Infinite Recursion
Recursive CTEs can loop infinitely if there's a cycle in your data. Most databases have a max recursion depth (typically 100 or 1000):
-- PostgreSQL: limit depth
WITH RECURSIVE org AS (
SELECT id, name, manager_id, 0 AS depth FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, o.depth + 1
FROM employees e JOIN org o ON e.manager_id = o.id
WHERE o.depth < 10 -- safety guard
)
SELECT * FROM org;
Set Operations
Set operations combine the results of two or more SELECT queries into one result set.
Rules
- Both queries must return the same number of columns
- Corresponding columns must have compatible data types
- Column names come from the first query
UNION and UNION ALL
-- UNION: combine and remove duplicates (slower)
SELECT name, department FROM finance_employees
UNION
SELECT name, department FROM marketing_employees;
-- UNION ALL: combine keeping all rows including duplicates (faster)
SELECT name, department FROM finance_employees
UNION ALL
SELECT name, department FROM marketing_employees;
Use UNION ALL unless you specifically need deduplication — it's significantly faster.
Practical UNION Example
-- Combine active and former employees into one report
SELECT name, department, salary, 'Active' AS status FROM employees WHERE is_active = 1
UNION ALL
SELECT name, department, salary, 'Former' AS status FROM former_employees;
INTERSECT
Returns only rows that appear in both result sets:
-- Employees who appear in both the 'high_earners' view and the 'long_tenure' view
SELECT name FROM high_earners
INTERSECT
SELECT name FROM long_tenure_employees;
EXCEPT (MINUS in Oracle)
Returns rows from the first query that do not appear in the second:
-- Employees in Finance but NOT in the bonus recipient list
SELECT name FROM employees WHERE department = 'Finance'
EXCEPT
SELECT emp_name FROM bonus_recipients;
ORDER BY with Set Operations
ORDER BY applies to the final combined result — place it at the very end:
SELECT name, 'Finance' AS dept FROM finance_employees
UNION ALL
SELECT name, 'Technology' FROM tech_employees
ORDER BY name ASC; -- applies to the whole UNION result
Set Operations Comparison
| Operation | Returns |
|---|---|
UNION | All rows from both, deduped |
UNION ALL | All rows from both, including duplicates |
INTERSECT | Only rows in both |
EXCEPT / MINUS | Rows in first but not second |
CTE vs Subquery vs View
| CTE | Subquery | View | |
|---|---|---|---|
| Scope | Current query only | Current query only | Persistent in DB |
| Reuse | Within same query | Cannot be reused | Any future query |
| Recursive | Yes | No | No |
| Performance | Same as subquery | Same | Same (unless materialised) |
| Readability | Best | Worst (nested) | Good (single table) |
Use CTEs for complex one-off queries; views for queries you need repeatedly.
Common Mistakes
1. UNION with different column counts
-- Wrong
SELECT name, salary FROM employees
UNION
SELECT name FROM former_employees; -- different columns!
-- Correct — pad with NULL or literal
SELECT name, salary FROM employees
UNION
SELECT name, NULL AS salary FROM former_employees;
2. Referencing a later CTE from an earlier one
-- Wrong — CTEs must reference only previously defined CTEs
WITH
b AS (SELECT * FROM a), -- Error: 'a' not yet defined
a AS (SELECT * FROM employees)
SELECT * FROM b;
-- Correct — define in dependency order
WITH
a AS (SELECT * FROM employees),
b AS (SELECT * FROM a)
SELECT * FROM b;
3. Infinite recursion without a termination condition
-- Always include a WHERE condition in the recursive part
UNION ALL
SELECT e.id FROM employees e JOIN org o ON e.manager_id = o.id
WHERE o.depth < 20 -- termination guard
Practice Exercises
- Write a CTE to find employees earning above the company average salary.
- Use multiple CTEs: first compute the average salary per department, then find employees earning above their department's average.
- Use UNION ALL to create a combined list of all employees from all departments with a department label column.
- Using a recursive CTE, build the org chart starting from the top-level manager (no manager_id).
- Use EXCEPT to find all department names that have no employees.
Summary
In this chapter you learned:
WITH name AS (SELECT ...)— define a CTE; use it like a table in the main query- Multiple CTEs: separate with commas; each can reference previous CTEs
- CTEs improve readability of complex, multi-step queries
WITH RECURSIVE— for hierarchical/tree data; needs base case + recursive case + termination- UNION — combines result sets, removing duplicates
- UNION ALL — combines result sets, keeps all rows (faster than UNION)
- INTERSECT — rows common to both queries
- EXCEPT / MINUS — rows in first query not in second
- All set operations require matching column count and compatible types
- ORDER BY at the end applies to the entire set operation result
Next up: Transactions & ACID — ensure data integrity with BEGIN, COMMIT, ROLLBACK.