Fundamentals of SQL — Interview Questions & Answers

50 essential SQL interview questions covering basics, queries, joins, subqueries, aggregations, and database design.

Meritshot18 min read
SQLDatabaseInterview QuestionsData AnalysisQueries
Back to Interview Guides

SQL Basics

1. What is SQL?

SQL stands for Structured Query Language and is the standard language used to communicate with and manage relational databases. It allows users to create, read, update, and delete data stored in tables, and is supported by virtually all relational database management systems such as MySQL, PostgreSQL, SQL Server, and Oracle.

2. What are the main types of SQL commands?

SQL commands are broadly categorized into five types: DDL (Data Definition Language) for defining database structures, DML (Data Manipulation Language) for manipulating data, DCL (Data Control Language) for managing permissions, TCL (Transaction Control Language) for managing transactions, and DQL (Data Query Language) for querying data. Examples include CREATE and ALTER for DDL, INSERT and UPDATE for DML, GRANT and REVOKE for DCL, and COMMIT and ROLLBACK for TCL.

3. What is the difference between CREATE, ALTER, and DROP statements?

CREATE is used to create new database objects such as tables, views, or indexes, for example CREATE TABLE employees (id INT, name VARCHAR(100)). ALTER modifies an existing database object, such as adding or removing a column, while DROP permanently deletes a database object and all of its data.

4. What do the SELECT, INSERT, UPDATE, and DELETE statements do?

SELECT retrieves data from one or more tables, INSERT adds new rows of data into a table, UPDATE modifies existing rows based on a condition, and DELETE removes rows from a table. These four statements form the core of DML operations and are the most frequently used SQL commands in day-to-day database interaction.

5. What are the common data types in SQL?

Common SQL data types include INT and BIGINT for integers, DECIMAL and FLOAT for numeric values with decimals, VARCHAR and CHAR for text strings, and DATE, DATETIME, and TIMESTAMP for date and time values. The BOOLEAN type stores true/false values, and TEXT or BLOB types handle large text or binary data respectively.

6. What is a primary key?

A primary key is a column or set of columns that uniquely identifies each row in a table. It must contain unique values and cannot contain NULL values. Every table should have a primary key to ensure data integrity and to serve as a reference point for relationships with other tables.

7. What is a foreign key?

A foreign key is a column or set of columns in one table that references the primary key of another table, establishing a relationship between the two tables. It enforces referential integrity by ensuring that the value in the foreign key column must either match an existing value in the referenced primary key column or be NULL. For example, an orders table might have a customer_id foreign key that references the id column in a customers table.

8. What are constraints in SQL?

Constraints are rules applied to columns or tables that enforce data integrity and validity. Common constraints include NOT NULL (prevents NULL values), UNIQUE (ensures all values in a column are distinct), PRIMARY KEY (uniquely identifies each row), FOREIGN KEY (enforces referential integrity), CHECK (validates data against a condition), and DEFAULT (assigns a default value when none is specified).

9. What is the difference between CHAR and VARCHAR data types?

CHAR is a fixed-length string data type that always uses the specified number of characters, padding shorter strings with spaces, for example CHAR(10) always stores exactly 10 characters. VARCHAR is a variable-length string type that only uses as much storage as the actual string requires, up to the defined maximum, making it more storage-efficient for strings of varying lengths.

10. What is the difference between DELETE, TRUNCATE, and DROP?

DELETE removes specific rows from a table based on a WHERE condition and can be rolled back within a transaction, while TRUNCATE removes all rows from a table quickly without logging individual row deletions and typically cannot be rolled back. DROP goes further by removing the entire table structure along with all its data, indexes, and constraints from the database permanently.

Querying Data

11. How does the SELECT statement work?

The SELECT statement retrieves data from one or more tables and is the most commonly used SQL command. Its basic syntax is SELECT column1, column2 FROM table_name, and you can use SELECT * to retrieve all columns. The result is returned as a result set, which is a temporary table containing the queried data.

12. What is the WHERE clause and how is it used?

The WHERE clause filters rows based on a specified condition, returning only the rows that satisfy the criteria. For example, SELECT * FROM employees WHERE department = 'Sales' returns only employees in the Sales department. It supports comparison operators such as =, !=, AND, OR, and NOT to build complex filtering conditions.

13. How does the ORDER BY clause work?

The ORDER BY clause sorts the result set by one or more columns in ascending (ASC) or descending (DESC) order. By default, sorting is ascending if no direction is specified. You can sort by multiple columns, for example SELECT * FROM products ORDER BY category ASC, price DESC sorts by category alphabetically and then by price from highest to lowest within each category.

14. What do LIMIT and TOP do in SQL?

LIMIT (used in MySQL and PostgreSQL) and TOP (used in SQL Server) restrict the number of rows returned by a query. For example, SELECT * FROM employees LIMIT 10 returns only the first 10 rows, which is useful for pagination or previewing data. In SQL Server, the equivalent would be SELECT TOP 10 * FROM employees.

15. What does SELECT DISTINCT do?

SELECT DISTINCT removes duplicate rows from the result set, returning only unique values for the specified columns. For example, SELECT DISTINCT department FROM employees returns a list of all unique departments. When applied to multiple columns, it returns distinct combinations of values across those columns.

16. What are aliases in SQL?

Aliases are temporary names assigned to tables or columns using the AS keyword to make query results more readable or to simplify complex queries. For example, SELECT first_name AS name FROM employees renames the output column to "name," and SELECT e.name FROM employees AS e assigns a short alias to the table. Aliases exist only for the duration of the query and do not change the actual table or column names.

17. How do the LIKE operator and wildcards work?

The LIKE operator is used in a WHERE clause to search for a pattern in a column using wildcard characters. The percent sign % matches any sequence of zero or more characters, and the underscore _ matches exactly one character. For example, SELECT * FROM customers WHERE name LIKE 'J%' returns all customers whose names start with "J," and LIKE '_an' matches any three-letter string ending in "an."

18. How do the IN and BETWEEN operators work?

The IN operator checks whether a value matches any value in a specified list, for example SELECT * FROM orders WHERE status IN ('Shipped', 'Delivered'). The BETWEEN operator filters values within a range (inclusive), for example SELECT * FROM products WHERE price BETWEEN 10 AND 50 returns products priced from 10 to 50 inclusive.

19. How do you handle NULL values in SQL?

NULL represents a missing or unknown value and cannot be compared using standard operators like = or !=. Instead, you must use IS NULL or IS NOT NULL to check for NULL values, for example SELECT * FROM employees WHERE manager_id IS NULL. Functions like COALESCE and IFNULL/ISNULL can replace NULL with a default value in query results.

20. What are comparison operators in SQL?

Comparison operators are used in WHERE clauses to compare values and filter rows. The standard operators include = (equal), != or the alternative not-equal syntax (not equal), LIKE (pattern match), and the standard range operators for less than, greater than, less than or equal to, and greater than or equal to. These operators can be combined with AND, OR, and NOT to create complex filtering conditions.

Joins and Relationships

21. What is a JOIN in SQL?

A JOIN is an operation that combines rows from two or more tables based on a related column between them. Joins allow you to query data spread across multiple tables in a single statement by specifying how the tables are related. The most common type is the INNER JOIN, which returns only matching rows from both tables.

22. What is an INNER JOIN?

An INNER JOIN returns only the rows where there is a matching value in both tables being joined. For example, SELECT * FROM orders INNER JOIN customers ON orders.customer_id = customers.id returns only orders that have a corresponding customer record. Rows in either table that do not have a match in the other table are excluded from the result.

23. What is a LEFT JOIN?

A LEFT JOIN (or LEFT OUTER JOIN) returns all rows from the left table and the matching rows from the right table. If there is no match in the right table, the result contains NULL values for the right table columns. For example, SELECT * FROM employees LEFT JOIN departments ON employees.dept_id = departments.id returns all employees, including those not assigned to any department.

24. What is a RIGHT JOIN?

A RIGHT JOIN (or RIGHT OUTER JOIN) returns all rows from the right table and the matching rows from the left table, filling in NULLs where there is no match in the left table. It is essentially the reverse of a LEFT JOIN. In practice, RIGHT JOIN is less commonly used because the same result can be achieved by swapping the table order and using a LEFT JOIN.

25. What is a FULL OUTER JOIN?

A FULL OUTER JOIN returns all rows from both tables, matching rows where possible and filling in NULLs where there is no match on either side. It is essentially a combination of LEFT JOIN and RIGHT JOIN. Note that MySQL does not natively support FULL OUTER JOIN, so you would need to combine a LEFT JOIN and a RIGHT JOIN using UNION.

26. What is a CROSS JOIN?

A CROSS JOIN returns the Cartesian product of two tables, meaning every row from the first table is paired with every row from the second table. If table A has 10 rows and table B has 5 rows, the result will contain 50 rows. Cross joins are useful for generating all possible combinations, such as pairing every product with every color option.

27. What is a self join?

A self join is when a table is joined with itself, which is useful for comparing rows within the same table or querying hierarchical data. For example, to find each employee and their manager from a single employees table, you would write SELECT e.name, m.name AS manager FROM employees e INNER JOIN employees m ON e.manager_id = m.id. Aliases are essential in self joins to distinguish between the two references to the same table.

28. How do you join more than two tables in a single query?

You can join multiple tables by chaining additional JOIN clauses in a single query. For example, SELECT o.id, c.name, p.product_name FROM orders o INNER JOIN customers c ON o.customer_id = c.id INNER JOIN products p ON o.product_id = p.id joins three tables. Each subsequent JOIN specifies its own ON condition linking it to one of the previously joined tables.

29. What is the difference between the ON clause and the WHERE clause in a JOIN?

The ON clause specifies the join condition that determines how two tables are matched together, while the WHERE clause filters the result set after the join has been performed. For INNER JOIN the results are the same, but for outer joins, placing a condition in WHERE instead of ON can inadvertently filter out NULL rows and effectively convert the outer join into an inner join.

30. What are the different types of relationships between tables?

The three main relationship types are one-to-one, one-to-many, and many-to-many. A one-to-one relationship means each row in table A corresponds to exactly one row in table B (e.g., a user and their profile). A one-to-many relationship means one row in table A can relate to multiple rows in table B (e.g., one customer with many orders). A many-to-many relationship requires a junction table because rows in both tables can relate to multiple rows in the other (e.g., students and courses).

Aggregations and Grouping

31. What are aggregate functions in SQL?

Aggregate functions perform calculations on a set of values and return a single result. The most common aggregate functions are COUNT (counts rows), SUM (totals numeric values), AVG (calculates the mean), MIN (finds the smallest value), and MAX (finds the largest value). They are often used with the GROUP BY clause to compute summaries for each group of rows.

32. How does the COUNT function work?

COUNT returns the number of rows that match a specified condition. COUNT(*) counts all rows including those with NULL values, while COUNT(column_name) counts only non-NULL values in that column. For example, SELECT COUNT(*) FROM orders returns the total number of orders, and SELECT COUNT(email) FROM customers returns the number of customers who have an email address on file.

33. How do SUM and AVG functions work?

SUM adds up all numeric values in a column, for example SELECT SUM(amount) FROM payments returns the total of all payments. AVG calculates the arithmetic mean of numeric values, for example SELECT AVG(salary) FROM employees returns the average salary. Both functions ignore NULL values in their calculations.

34. How do MIN and MAX functions work?

MIN returns the smallest value in a column and MAX returns the largest value. They work with numeric, date, and string data types. For example, SELECT MIN(hire_date) FROM employees returns the earliest hire date, and SELECT MAX(price) FROM products returns the highest product price.

35. What is the GROUP BY clause?

GROUP BY groups rows that share the same values in specified columns into summary rows, and is typically used with aggregate functions. For example, SELECT department, COUNT(*) FROM employees GROUP BY department returns the number of employees in each department. Every column in the SELECT list that is not inside an aggregate function must be included in the GROUP BY clause.

36. What is the HAVING clause and how does it differ from WHERE?

The HAVING clause filters groups created by GROUP BY based on aggregate conditions, while WHERE filters individual rows before grouping occurs. For example, SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) BETWEEN 50000 AND 100000 returns only departments with an average salary in that range. You cannot use aggregate functions in a WHERE clause, which is why HAVING is needed.

37. Can you use GROUP BY with multiple columns?

Yes, GROUP BY can group by multiple columns to create more granular groupings. For example, SELECT department, job_title, COUNT(*) FROM employees GROUP BY department, job_title returns the count of employees for each unique combination of department and job title. The result contains one row for each unique combination of the grouped columns.

38. How does DISTINCT work with aggregate functions?

You can use DISTINCT inside an aggregate function to consider only unique values in the calculation. For example, SELECT COUNT(DISTINCT department) FROM employees counts the number of unique departments rather than the total number of rows. Similarly, SELECT SUM(DISTINCT price) FROM products sums only the distinct price values, ignoring duplicates.

39. What is a CASE statement in SQL?

The CASE statement provides conditional logic within SQL queries, similar to if-else statements in programming languages. It evaluates conditions in order and returns a value when the first condition is met. For example, SELECT name, CASE WHEN score BETWEEN 90 AND 100 THEN 'A' WHEN score BETWEEN 80 AND 89 THEN 'B' ELSE 'C' END AS grade FROM students assigns letter grades based on scores.

40. What is the correct order of execution for a SQL query that uses SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY?

The logical order of execution is: FROM (identifies the tables), WHERE (filters individual rows), GROUP BY (groups the remaining rows), HAVING (filters groups), SELECT (determines which columns to return), and finally ORDER BY (sorts the result). Understanding this order explains why you cannot use column aliases defined in SELECT within the WHERE clause, and why HAVING can reference aggregate functions while WHERE cannot.

Subqueries and Advanced Topics

41. What is a subquery in SQL?

A subquery is a query nested inside another query, enclosed in parentheses, that provides data to the outer query. Subqueries can appear in the WHERE clause, FROM clause, or SELECT clause. For example, SELECT * FROM employees WHERE salary = (SELECT MAX(salary) FROM employees) uses a subquery to find the employee with the highest salary.

42. What is the difference between a correlated and a non-correlated subquery?

A non-correlated subquery runs independently of the outer query and executes only once, for example SELECT * FROM products WHERE price = (SELECT MAX(price) FROM products). A correlated subquery references columns from the outer query and executes once for each row processed by the outer query, for example SELECT * FROM employees e WHERE salary = (SELECT MAX(salary) FROM employees WHERE department = e.department). Correlated subqueries are generally slower because they execute repeatedly.

43. How does the EXISTS operator work?

The EXISTS operator checks whether a subquery returns any rows and returns TRUE if at least one row is found, otherwise FALSE. It is commonly used in correlated subqueries, for example SELECT * FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id) returns only customers who have placed at least one order. EXISTS is often more efficient than IN for large datasets because it stops processing as soon as a match is found.

44. What is a view in SQL?

A view is a virtual table defined by a stored SELECT query that does not hold data itself but presents data from one or more underlying tables. You create a view using CREATE VIEW view_name AS SELECT ..., and then query it like a regular table. Views simplify complex queries, provide a layer of security by restricting access to specific columns or rows, and ensure consistent data presentation.

45. What is an index and why is it important?

An index is a data structure that improves the speed of data retrieval operations on a table at the cost of additional storage and slower write operations. It works similarly to a book index, allowing the database engine to find rows quickly without scanning the entire table. You create an index using CREATE INDEX index_name ON table_name(column_name), and indexes are most beneficial on columns frequently used in WHERE clauses, JOIN conditions, and ORDER BY clauses.

46. What is the difference between UNION and UNION ALL?

UNION combines the result sets of two or more SELECT statements and removes duplicate rows from the final result, while UNION ALL combines the results without removing duplicates. UNION ALL is faster than UNION because it skips the deduplication step. Both require that the SELECT statements have the same number of columns with compatible data types.

47. What are window functions and how do ROW_NUMBER and RANK work?

Window functions perform calculations across a set of rows related to the current row without collapsing them into a single output row, unlike aggregate functions with GROUP BY. ROW_NUMBER() assigns a unique sequential integer to each row within a partition, for example ROW_NUMBER() OVER (ORDER BY salary DESC) numbers rows from highest to lowest salary. RANK() is similar but assigns the same rank to tied values and skips subsequent ranks, so if two rows tie for rank 1, the next row receives rank 3.

48. What is a Common Table Expression (CTE)?

A CTE is a temporary named result set defined using the WITH keyword that exists only for the duration of a single query. For example, WITH high_earners AS (SELECT * FROM employees WHERE salary BETWEEN 80000 AND 200000) SELECT department, COUNT(*) FROM high_earners GROUP BY department first defines a CTE and then queries it. CTEs improve readability, allow recursive queries, and can be referenced multiple times within the same query.

49. What is normalization and what are the first three normal forms?

Normalization is the process of organizing a database to reduce redundancy and improve data integrity by dividing large tables into smaller, related tables. First Normal Form (1NF) requires that each column contains only atomic (indivisible) values and each row is unique. Second Normal Form (2NF) requires meeting 1NF and that all non-key columns depend on the entire primary key. Third Normal Form (3NF) requires meeting 2NF and that no non-key column depends on another non-key column (eliminating transitive dependencies).

50. What are ACID properties in database transactions?

ACID is an acronym for Atomicity, Consistency, Isolation, and Durability, which are the four key properties that guarantee reliable database transactions. Atomicity ensures a transaction is all-or-nothing (either all operations succeed or none do). Consistency ensures the database moves from one valid state to another. Isolation ensures concurrent transactions do not interfere with each other. Durability ensures that once a transaction is committed, the changes persist even in the event of a system failure.