Chapter 16 of 18

CTEs & Set Operations

Write readable multi-step queries with Common Table Expressions (WITH), recursive CTEs, and combine result sets with UNION, INTERSECT, and EXCEPT.

Meritshot9 min read
SQLCTEWITHRecursiveUNIONINTERSECTEXCEPT
All SQL Chapters

Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a temporary named result set defined with the WITH keyword. It makes complex queries far more readable by breaking them into named steps.

WITH cte_name AS (
    SELECT ...   -- the CTE definition
)
SELECT * FROM cte_name;   -- use the CTE like a table

CTEs are similar to derived tables (subqueries in FROM) but are:

  • Named and reusable within the same query
  • Written at the top — easier to read and reason about
  • Required for recursive queries

Why Use CTEs?

Before — Nested Subqueries (Hard to Read)

SELECT dept, avg_sal
FROM (
    SELECT department AS dept, ROUND(AVG(salary), 0) AS avg_sal
    FROM (
        SELECT name, department, salary FROM employees WHERE is_active = 1
    ) active
    GROUP BY department
) dept_stats
WHERE avg_sal > 75000;

After — CTEs (Easy to Read)

WITH active_employees AS (
    SELECT name, department, salary
    FROM employees
    WHERE is_active = 1
),
dept_stats AS (
    SELECT department AS dept, ROUND(AVG(salary), 0) AS avg_sal
    FROM active_employees
    GROUP BY department
)
SELECT dept, avg_sal
FROM dept_stats
WHERE avg_sal > 75000;

Same logic, dramatically clearer structure.

Sample Data

CREATE TABLE employees (
    id         INTEGER PRIMARY KEY,
    name       TEXT,
    department TEXT,
    salary     REAL,
    hire_date  TEXT,
    manager_id INTEGER
);

INSERT INTO employees VALUES
(1,  'Priya Sharma',  'Finance',    75000, '2022-03-15', 4),
(2,  'Raj Patel',     'Technology', 92000, '2021-07-01', NULL),
(3,  'Meera Singh',   'Marketing',  68000, '2023-01-10', NULL),
(4,  'Arjun Nair',    'Finance',    81000, '2020-11-22', NULL),
(5,  'Sunita Rao',    'Technology', 88000, '2022-09-05', 2),
(6,  'Dev Kumar',     'Marketing',  72000, '2021-04-18', 3),
(7,  'Kiran Mehta',   'Finance',    95000, '2019-06-30', 4),
(8,  'Ananya Das',    'Technology', 63000, '2024-02-14', 2);

Multiple CTEs

Define multiple CTEs by separating them with commas:

WITH
finance_team AS (
    SELECT name, salary FROM employees WHERE department = 'Finance'
),
tech_team AS (
    SELECT name, salary FROM employees WHERE department = 'Technology'
),
comparison AS (
    SELECT 'Finance' AS dept, ROUND(AVG(salary), 0) AS avg_sal FROM finance_team
    UNION ALL
    SELECT 'Technology', ROUND(AVG(salary), 0) FROM tech_team
)
SELECT * FROM comparison;

Each CTE can reference CTEs defined before it.

Practical CTE Examples

Example 1: Top Earner per Department

WITH ranked AS (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn = 1;

Example 2: Above-Average Earners with Department Stats

WITH dept_stats AS (
    SELECT
        department,
        ROUND(AVG(salary), 0) AS dept_avg,
        COUNT(*)              AS headcount
    FROM employees
    GROUP BY department
),
enriched AS (
    SELECT
        e.name,
        e.department,
        e.salary,
        d.dept_avg,
        e.salary - d.dept_avg AS diff_from_avg
    FROM employees e
    JOIN dept_stats d ON e.department = d.department
)
SELECT *
FROM enriched
WHERE diff_from_avg > 0
ORDER BY diff_from_avg DESC;

Example 3: Multi-Step Data Pipeline

WITH
-- Step 1: Filter and clean
base AS (
    SELECT id, name, department, salary
    FROM employees
    WHERE salary IS NOT NULL AND salary > 0
),
-- Step 2: Compute metrics
with_metrics AS (
    SELECT
        *,
        ROUND(salary / 12, 2) AS monthly,
        CASE
            WHEN salary >= 90000 THEN 'A'
            WHEN salary >= 75000 THEN 'B'
            ELSE 'C'
        END AS band
    FROM base
),
-- Step 3: Count by band
band_summary AS (
    SELECT band, COUNT(*) AS count, ROUND(AVG(salary), 0) AS avg_sal
    FROM with_metrics
    GROUP BY band
)
SELECT * FROM band_summary ORDER BY band;

Recursive CTEs

A recursive CTE calls itself — useful for hierarchical or tree-structured data (org charts, category trees, bill of materials).

Syntax

WITH RECURSIVE cte_name AS (
    -- Base case (non-recursive): starting point
    SELECT ...

    UNION ALL

    -- Recursive case: references cte_name itself
    SELECT ... FROM source JOIN cte_name ON ...
)
SELECT * FROM cte_name;

Example 1: Organisational Hierarchy

-- Find all reports under Arjun Nair (id=4), including indirect reports
WITH RECURSIVE org_chart AS (
    -- Base: start with Arjun
    SELECT id, name, manager_id, 0 AS depth
    FROM employees
    WHERE id = 4

    UNION ALL

    -- Recursive: find their direct reports
    SELECT e.id, e.name, e.manager_id, oc.depth + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT
    REPEAT('  ', depth) || name AS hierarchy,
    depth AS level
FROM org_chart
ORDER BY depth, name;

Result:

hierarchy            | level
---------------------+------
Arjun Nair           | 0
  Priya Sharma       | 1
  Kiran Mehta        | 1

Example 2: Generate a Number Series (SQLite/PostgreSQL)

-- PostgreSQL has generate_series() — but here's the CTE approach
WITH RECURSIVE numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 10
)
SELECT n FROM numbers;
-- Returns: 1, 2, 3, ..., 10

Example 3: Generate a Date Range

WITH RECURSIVE date_series AS (
    SELECT DATE('2026-01-01') AS dt
    UNION ALL
    SELECT DATE(dt, '+1 day') FROM date_series WHERE dt < DATE('2026-01-10')
)
SELECT dt FROM date_series;
-- 2026-01-01 through 2026-01-10

Prevent Infinite Recursion

Recursive CTEs can loop infinitely if there's a cycle in your data. Most databases have a max recursion depth (typically 100 or 1000):

-- PostgreSQL: limit depth
WITH RECURSIVE org AS (
    SELECT id, name, manager_id, 0 AS depth FROM employees WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, e.name, e.manager_id, o.depth + 1
    FROM employees e JOIN org o ON e.manager_id = o.id
    WHERE o.depth < 10   -- safety guard
)
SELECT * FROM org;

Set Operations

Set operations combine the results of two or more SELECT queries into one result set.

Rules

  1. Both queries must return the same number of columns
  2. Corresponding columns must have compatible data types
  3. Column names come from the first query

UNION and UNION ALL

-- UNION: combine and remove duplicates (slower)
SELECT name, department FROM finance_employees
UNION
SELECT name, department FROM marketing_employees;

-- UNION ALL: combine keeping all rows including duplicates (faster)
SELECT name, department FROM finance_employees
UNION ALL
SELECT name, department FROM marketing_employees;

Use UNION ALL unless you specifically need deduplication — it's significantly faster.

Practical UNION Example

-- Combine active and former employees into one report
SELECT name, department, salary, 'Active' AS status FROM employees WHERE is_active = 1
UNION ALL
SELECT name, department, salary, 'Former' AS status FROM former_employees;

INTERSECT

Returns only rows that appear in both result sets:

-- Employees who appear in both the 'high_earners' view and the 'long_tenure' view
SELECT name FROM high_earners
INTERSECT
SELECT name FROM long_tenure_employees;

EXCEPT (MINUS in Oracle)

Returns rows from the first query that do not appear in the second:

-- Employees in Finance but NOT in the bonus recipient list
SELECT name FROM employees WHERE department = 'Finance'
EXCEPT
SELECT emp_name FROM bonus_recipients;

ORDER BY with Set Operations

ORDER BY applies to the final combined result — place it at the very end:

SELECT name, 'Finance' AS dept FROM finance_employees
UNION ALL
SELECT name, 'Technology' FROM tech_employees
ORDER BY name ASC;   -- applies to the whole UNION result

Set Operations Comparison

OperationReturns
UNIONAll rows from both, deduped
UNION ALLAll rows from both, including duplicates
INTERSECTOnly rows in both
EXCEPT / MINUSRows in first but not second

CTE vs Subquery vs View

CTESubqueryView
ScopeCurrent query onlyCurrent query onlyPersistent in DB
ReuseWithin same queryCannot be reusedAny future query
RecursiveYesNoNo
PerformanceSame as subquerySameSame (unless materialised)
ReadabilityBestWorst (nested)Good (single table)

Use CTEs for complex one-off queries; views for queries you need repeatedly.

Common Mistakes

1. UNION with different column counts

-- Wrong
SELECT name, salary FROM employees
UNION
SELECT name FROM former_employees;  -- different columns!

-- Correct — pad with NULL or literal
SELECT name, salary FROM employees
UNION
SELECT name, NULL AS salary FROM former_employees;

2. Referencing a later CTE from an earlier one

-- Wrong — CTEs must reference only previously defined CTEs
WITH
b AS (SELECT * FROM a),   -- Error: 'a' not yet defined
a AS (SELECT * FROM employees)
SELECT * FROM b;

-- Correct — define in dependency order
WITH
a AS (SELECT * FROM employees),
b AS (SELECT * FROM a)
SELECT * FROM b;

3. Infinite recursion without a termination condition

-- Always include a WHERE condition in the recursive part
UNION ALL
SELECT e.id FROM employees e JOIN org o ON e.manager_id = o.id
WHERE o.depth < 20  -- termination guard

Practice Exercises

  1. Write a CTE to find employees earning above the company average salary.
  2. Use multiple CTEs: first compute the average salary per department, then find employees earning above their department's average.
  3. Use UNION ALL to create a combined list of all employees from all departments with a department label column.
  4. Using a recursive CTE, build the org chart starting from the top-level manager (no manager_id).
  5. Use EXCEPT to find all department names that have no employees.

Summary

In this chapter you learned:

  • WITH name AS (SELECT ...) — define a CTE; use it like a table in the main query
  • Multiple CTEs: separate with commas; each can reference previous CTEs
  • CTEs improve readability of complex, multi-step queries
  • WITH RECURSIVE — for hierarchical/tree data; needs base case + recursive case + termination
  • UNION — combines result sets, removing duplicates
  • UNION ALL — combines result sets, keeps all rows (faster than UNION)
  • INTERSECT — rows common to both queries
  • EXCEPT / MINUS — rows in first query not in second
  • All set operations require matching column count and compatible types
  • ORDER BY at the end applies to the entire set operation result

Next up: Transactions & ACID — ensure data integrity with BEGIN, COMMIT, ROLLBACK.