Subqueries & Nested Queries

What is a Subquery?

A subquery (also called a nested query or inner query) is a SELECT statement embedded inside another SQL statement. The inner query runs first; its result is used by the outer query.

-- Subquery in WHERE
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

The inner query (SELECT AVG(salary) FROM employees) returns a single value (e.g., 78000). The outer query then uses that value in its WHERE clause.

Types of Subqueries

Type	Returns	Used In
Scalar	One value (one row, one column)	`WHERE`, `SELECT`, `HAVING`
Row	One row, multiple columns	`WHERE col = (row)`
Column	Multiple rows, one column	`WHERE col IN (...)`
Table	Multiple rows and columns	`FROM` clause (derived table)
Correlated	Depends on outer query row	`WHERE EXISTS`

Sample Data

CREATE TABLE employees (
    id            INTEGER PRIMARY KEY,
    name          TEXT    NOT NULL,
    department    TEXT    NOT NULL,
    salary        REAL    NOT NULL,
    manager_id    INTEGER
);

CREATE TABLE orders (
    id          INTEGER PRIMARY KEY,
    employee_id INTEGER,
    amount      REAL,
    order_date  TEXT
);

INSERT INTO employees VALUES
(1, 'Priya Sharma',  'Finance',    75000, 4),
(2, 'Raj Patel',     'Technology', 92000, NULL),
(3, 'Meera Singh',   'Marketing',  68000, NULL),
(4, 'Arjun Nair',    'Finance',    81000, NULL),
(5, 'Sunita Rao',    'Technology', 88000, 2),
(6, 'Dev Kumar',     'Marketing',  72000, 3),
(7, 'Kiran Mehta',   'Finance',    95000, 4);

INSERT INTO orders VALUES
(1, 1, 15000, '2026-01-10'),
(2, 1, 22000, '2026-02-15'),
(3, 2, 45000, '2026-01-20'),
(4, 5, 38000, '2026-03-05'),
(5, 6, 12000, '2026-02-28'),
(6, 3, 8000,  '2026-03-10');

Scalar Subqueries

A scalar subquery returns exactly one value (one row, one column).

In WHERE

-- Employees earning above the company average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;

In SELECT

You can use a scalar subquery as a computed column:

-- Show each employee's salary vs. company average
SELECT
    name,
    salary,
    (SELECT ROUND(AVG(salary), 0) FROM employees) AS company_avg,
    salary - (SELECT ROUND(AVG(salary), 0) FROM employees) AS diff_from_avg
FROM employees;

In HAVING

-- Departments where average salary beats the overall company average
SELECT department, ROUND(AVG(salary), 0) AS dept_avg
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

Column Subqueries — IN and NOT IN

A column subquery returns multiple rows, one column — use with IN or NOT IN:

-- Employees who have placed at least one order
SELECT name
FROM employees
WHERE id IN (SELECT DISTINCT employee_id FROM orders);

-- Employees who have never placed an order
SELECT name
FROM employees
WHERE id NOT IN (SELECT DISTINCT employee_id FROM orders
                 WHERE employee_id IS NOT NULL);

Warning: NOT IN with a subquery that returns NULLs will return zero rows. Always add WHERE col IS NOT NULL inside the subquery when using NOT IN.

Derived Tables — Subquery in FROM

A subquery in the FROM clause creates a temporary virtual table (a "derived table"). It must have an alias:

-- Average salary by department, then find departments above company average
SELECT dept_stats.department, dept_stats.avg_salary
FROM (
    SELECT department, ROUND(AVG(salary), 0) AS avg_salary
    FROM employees
    GROUP BY department
) AS dept_stats
WHERE dept_stats.avg_salary > (SELECT AVG(salary) FROM employees);

The inner query creates a temporary dept_stats table; the outer query filters it.

More Complex Derived Table

-- Top earner per department
SELECT outer_q.department, outer_q.name, outer_q.salary
FROM (
    SELECT
        department,
        name,
        salary,
        RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
    FROM employees
) AS outer_q
WHERE outer_q.rnk = 1;

(Window functions like RANK() are covered in Chapter 15.)

Correlated Subqueries

A correlated subquery references a column from the outer query. It runs once for each row in the outer query — so it can be slow on large tables.

-- Find employees earning more than the average of their own department
SELECT name, department, salary
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department = e.department   -- correlated: uses outer row's department
);

For each row in the outer query, the subquery computes the average salary for that row's department. An employee appears in results only if their salary beats their own department's average.

EXISTS and NOT EXISTS

EXISTS checks whether the subquery returns any rows — it returns TRUE if the subquery has at least one result, FALSE if empty.

-- Employees who have placed at least one order (using EXISTS)
SELECT name
FROM employees e
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.employee_id = e.id
);

SELECT 1 is conventional — the outer query only cares whether any row exists, not what it contains.

NOT EXISTS

-- Employees who have never placed an order
SELECT name
FROM employees e
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.employee_id = e.id
);

EXISTS vs IN — Which to Use?

Situation	Prefer
Right-side list might contain NULLs	`EXISTS` (safer)
Large right-side result set	`EXISTS` (stops at first match)
Small right-side list, no NULLs	`IN` (often more readable)

Both approaches give the same logical result when there are no NULLs. EXISTS is generally safer and often performs better.

Row Subqueries

A subquery returning one complete row can be used with row-value constructors (PostgreSQL):

-- Find employees with the same (department, salary) as employee 4
SELECT name
FROM employees
WHERE (department, salary) = (
    SELECT department, salary FROM employees WHERE id = 4
);

Subqueries in INSERT, UPDATE, DELETE

Subqueries aren't only for SELECT:

-- Give a 10% raise to employees earning below the company average
UPDATE employees
SET salary = salary * 1.10
WHERE salary < (SELECT AVG(salary) FROM employees);

-- Delete all orders from employees in Marketing
DELETE FROM orders
WHERE employee_id IN (
    SELECT id FROM employees WHERE department = 'Marketing'
);

-- Insert a summary into an audit table
INSERT INTO dept_summary (department, headcount, avg_salary)
SELECT department, COUNT(*), AVG(salary)
FROM employees
GROUP BY department;

Practical Examples

Example 1: Second Highest Salary

-- Classic interview question
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Example 2: Departments with Above-Average Headcount

SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > (
    SELECT AVG(cnt) FROM (
        SELECT COUNT(*) AS cnt FROM employees GROUP BY department
    ) AS dept_counts
);

Example 3: Employees Whose Manager Earns Less

SELECT e.name AS employee, e.salary, m.name AS manager, m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Example 4: Customers with Above-Average Order Value

-- Employees with total order amount above average
SELECT e.name, SUM(o.amount) AS total_orders
FROM employees e
JOIN orders o ON e.id = o.employee_id
GROUP BY e.name
HAVING SUM(o.amount) > (
    SELECT AVG(total) FROM (
        SELECT SUM(amount) AS total FROM orders GROUP BY employee_id
    ) AS totals
);

Common Mistakes

1. Subquery returns more than one row when scalar expected

-- Wrong if department is not unique
WHERE salary = (SELECT salary FROM employees WHERE department = 'Finance');
-- Error: subquery returns more than one row

-- Correct — use IN for multiple rows
WHERE salary IN (SELECT salary FROM employees WHERE department = 'Finance');
-- Or use MAX/MIN/ANY
WHERE salary = (SELECT MAX(salary) FROM employees WHERE department = 'Finance');

2. NOT IN with NULLs in subquery

-- If orders.employee_id has any NULL, this returns 0 rows
WHERE id NOT IN (SELECT employee_id FROM orders);

-- Safe version
WHERE id NOT IN (SELECT employee_id FROM orders WHERE employee_id IS NOT NULL);
-- Or use NOT EXISTS

3. Forgetting alias on derived table

-- Wrong
SELECT * FROM (SELECT department, COUNT(*) FROM employees GROUP BY department);
-- Error: derived table must have its own alias

-- Correct
SELECT * FROM (SELECT department, COUNT(*) FROM employees GROUP BY department) AS summary;

Practice Exercises

Find employees earning above the average salary of their own department (correlated subquery).
List employees who have NOT placed any orders (use NOT IN or NOT EXISTS).
Find the department with the highest total payroll using a subquery in WHERE.
Write a query to find the top-paid employee in each department using a derived table.
Give a 5% raise to all Technology employees earning less than the Technology department average.

Summary

In this chapter you learned:

Scalar subquery — returns one value; used in WHERE, SELECT, HAVING
Column subquery — returns a list; used with IN or NOT IN
Derived table — subquery in FROM; must have an alias; treated as a temp table
Correlated subquery — references outer query columns; runs per row
EXISTS / NOT EXISTS — checks if subquery returns any rows; safer than NOT IN with NULLs
Subqueries can appear in UPDATE, DELETE, and INSERT too
NOT IN is dangerous with NULLs — prefer NOT EXISTS or add WHERE col IS NOT NULL
Derived tables must have an alias

Next up: INSERT, UPDATE & DELETE — modify data in your tables.