What are the most common SQL interview questions in 2026?

The recurring ones cover join types (especially INNER vs LEFT), GROUP BY with HAVING, the difference between WHERE and HAVING, window functions like ROW_NUMBER and RANK, NULL handling, and basic index and query-optimization questions. Most loops also include one or two live querying tasks where you write SQL against a sample schema.

How do I prepare for a SQL interview?

Write real queries against a real schema instead of just reading. Drill the high-frequency patterns: nth-highest value, deduplication, running totals, gaps-and-islands, and top-N-per-group. Practice explaining tradeoffs out loud, like when an index helps versus hurts, and rehearse on a live SQL environment so the syntax is automatic under pressure.

What SQL topics get asked most for data and backend roles?

Joins and aggregation are the backbone of every loop. Beyond those, expect window functions, subqueries and CTEs, index design and query optimization (reading an EXPLAIN plan), NULL semantics, and transactions and isolation levels. Data roles lean harder on window functions and analytics patterns; backend roles lean harder on indexes, locking, and ACID.

Top 45 SQL Interview Questions and Answers (2026)

SQL is the one language that follows you across roles. Backend engineers write it against the application database, data engineers and analysts live in it, and even frontend and platform folks get handed a query in a system-design loop. In 2026 it shows up in more interview rounds than people expect, often as a live exercise where you have to write a correct query against an unfamiliar schema in a few minutes.

This guide collects 45 of the SQL questions interviewers actually ask, grouped by topic, with accurate answers and examples in standard ANSI SQL. Where a dialect matters (mostly PostgreSQL vs MySQL), it is called out. If you want to drill these hands-on, the SQL playground lets you write real queries against seeded databases and grades them, and the Data Engineer learning path sequences SQL alongside the rest of the data stack.

Core Queries

1. What is the logical order of execution of a SELECT statement?

You write SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY, but the database evaluates clauses in a different order: FROM and joins, then WHERE, then GROUP BY, then HAVING, then SELECT (including window functions and aliases), then DISTINCT, then ORDER BY, and finally LIMIT/OFFSET. This explains two classic gotchas: you cannot reference a SELECT alias in WHERE (the alias does not exist yet), but you can reference it in ORDER BY (which runs after SELECT).

2. What is the difference between WHERE and HAVING?

WHERE filters rows before grouping and cannot see aggregates. HAVING filters groups after GROUP BY and can reference aggregates.

SELECT department_id, COUNT(*) AS headcount
FROM employees
WHERE active = true            -- row filter, runs first
GROUP BY department_id
HAVING COUNT(*) > 10;          -- group filter, runs after aggregation

Putting COUNT(*) > 10 in WHERE is an error; putting active = true in HAVING works but is slower because you aggregate rows you could have dropped earlier.

3. What is the difference between DELETE, TRUNCATE, and DROP?

DELETE removes rows one at a time, can use a WHERE clause, fires triggers, and is fully transactional (rollback-able). TRUNCATE removes all rows by deallocating data pages, is much faster, usually cannot be filtered, resets identity counters, and in some engines is minimally logged. DROP removes the entire table object, including its structure, indexes, and permissions.

4. What is the difference between UNION and UNION ALL?

UNION concatenates two result sets and removes duplicate rows, which forces a sort or hash to dedupe. UNION ALL keeps every row, including duplicates, and is therefore faster. If you know the inputs are already disjoint, prefer UNION ALL.

SELECT id FROM current_users
UNION ALL
SELECT id FROM archived_users;

5. What is the difference between CHAR and VARCHAR?

CHAR(n) is fixed length and right-pads values with spaces to n characters. VARCHAR(n) stores only the characters supplied plus a small length prefix. Use CHAR for genuinely fixed-width codes (a two-letter country code) and VARCHAR for everything variable. In PostgreSQL, plain text is generally preferred over varchar(n) unless you actually need the length constraint.

6. What does the LIMIT / OFFSET clause do, and what is its performance trap?

LIMIT caps the number of returned rows; OFFSET skips rows first. The trap is deep pagination: OFFSET 100000 LIMIT 20 still scans and discards the first 100,000 rows, getting slower as users page deeper. The fix is keyset (cursor) pagination, which seeks by the last seen value instead of counting.

-- Slow on deep pages
SELECT * FROM events ORDER BY created_at LIMIT 20 OFFSET 100000;

-- Keyset pagination: fast and stable
SELECT * FROM events
WHERE created_at < '2026-01-01 12:00:00'
ORDER BY created_at DESC
LIMIT 20;

7. What is the difference between standard SQL and a dialect?

ANSI/ISO SQL defines a portable core (SELECT, joins, aggregation, window functions, CTEs). Each engine adds extensions and quirks: MySQL uses LIMIT while SQL Server uses TOP and Oracle historically used ROWNUM; string concatenation is || in PostgreSQL/Oracle but CONCAT() or + elsewhere; ILIKE for case-insensitive matching is PostgreSQL-specific. Interviewers usually accept any correct dialect as long as you note your assumption.

Joins

8. Explain the different types of JOIN.

INNER JOIN returns only rows with a match in both tables.
LEFT (OUTER) JOIN returns all left rows, with NULLs where the right side has no match.
RIGHT (OUTER) JOIN is the mirror image, keeping all right rows.
FULL (OUTER) JOIN returns all rows from both sides, NULL-filling non-matches.
CROSS JOIN returns the Cartesian product (every left row paired with every right row).

SELECT c.name, o.id AS order_id
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id;
-- customers with no orders still appear, order_id is NULL

9. How do you find rows in one table that have no match in another?

Use a LEFT JOIN and filter for the NULL on the right side (an anti-join).

SELECT c.id, c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;     -- customers who never ordered

NOT EXISTS is an equally good (often clearer) way to express the same thing and handles NULLs more safely than NOT IN.

10. What is a self join, and when do you need one?

A self join joins a table to itself using table aliases. It is the standard way to compare rows within the same table, such as an employee-manager hierarchy or finding pairs.

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

11. What is the difference between an INNER JOIN and a WHERE-clause join?

The old comma syntax FROM a, b WHERE a.id = b.a_id produces the same result as FROM a INNER JOIN b ON a.id = b.a_id. Explicit JOIN ... ON is strongly preferred because it separates join logic from filter logic, makes outer joins possible, and prevents accidental Cartesian products when you forget the join condition.

12. Why can a JOIN multiply your row count, and how do you avoid it?

If the join key is not unique on the joined side, each left row matches multiple right rows and the result fans out. This silently inflates SUM/COUNT. Either aggregate the right side to one row per key before joining, or join on a genuinely unique key.

-- Pre-aggregate to avoid fan-out before summing
SELECT c.id, c.name, t.total
FROM customers c
LEFT JOIN (
    SELECT customer_id, SUM(amount) AS total
    FROM orders
    GROUP BY customer_id
) t ON t.customer_id = c.id;

13. What is a CROSS JOIN useful for?

A CROSS JOIN generates every combination of two sets. It is useful for building a complete grid, such as every product paired with every month so you can left-join sparse sales data onto it and show zeros for months with no sales (a calendar/dimension fill).

Aggregation and GROUP BY

14. How do aggregate functions handle NULLs?

Every aggregate except COUNT(*) ignores NULLs. COUNT(*) counts rows; COUNT(column) counts non-NULL values of that column; AVG(column) divides the sum of non-NULLs by the count of non-NULLs (not by the total row count). This distinction is a frequent source of wrong averages.

SELECT
  COUNT(*)        AS all_rows,
  COUNT(bonus)    AS rows_with_bonus,   -- ignores NULL bonuses
  AVG(bonus)      AS avg_of_present_bonuses
FROM employees;

15. What is the difference between COUNT(*), COUNT(1), and COUNT(column)?

COUNT(*) and COUNT(1) are identical in every modern optimizer; both count rows and there is no performance difference. COUNT(column) counts only rows where that column is not NULL, so it can return a smaller number. Use COUNT(DISTINCT column) for unique values.

16. Every non-aggregated column in SELECT must appear in GROUP BY - why?

If you select a column that is not aggregated and not in GROUP BY, the engine cannot decide which value to return for the group. Standard SQL and PostgreSQL reject this. MySQL historically allowed it (returning an arbitrary value) but with ONLY_FULL_GROUP_BY enabled by default in modern versions, it now errors too. The fix is to either group by the column or wrap it in an aggregate like MAX().

17. How do you compute multiple aggregations conditionally in one pass?

Use conditional aggregation with CASE inside the aggregate (sometimes called pivoting). This avoids multiple scans.

SELECT
  COUNT(*) FILTER (WHERE status = 'paid')    AS paid_count,   -- ANSI / PostgreSQL
  COUNT(*) FILTER (WHERE status = 'refunded') AS refunded_count
FROM orders;

-- Portable equivalent that works in MySQL too:
SELECT
  SUM(CASE WHEN status = 'paid'     THEN 1 ELSE 0 END) AS paid_count,
  SUM(CASE WHEN status = 'refunded' THEN 1 ELSE 0 END) AS refunded_count
FROM orders;

18. What is the difference between DISTINCT and GROUP BY?

For pure deduplication they produce the same result, and optimizers often treat them identically. The difference is intent and capability: GROUP BY exists to compute aggregates per group, so use it when you need COUNT/SUM/AVG; use DISTINCT when you only want unique rows with no aggregation. SELECT DISTINCT col and SELECT col ... GROUP BY col return the same set.

19. How do you find the top N rows per group?

Use a window function (covered below) with ROW_NUMBER() partitioned by the group, then filter on the rank. This is one of the most-asked patterns in data interviews (top 3 earners per department, latest order per customer).

SELECT *
FROM (
  SELECT e.*,
         ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
  FROM employees e
) ranked
WHERE rn <= 3;

20. What does GROUP BY ROLLUP do?

ROLLUP adds super-aggregate (subtotal and grand-total) rows to a grouped result. GROUP BY ROLLUP(region, product) returns per-product rows, per-region subtotals, and a grand total, with NULL marking the rolled-up level. It is handy for reporting without a separate UNION ALL for totals.

Subqueries and CTEs

21. What is the difference between a correlated and a non-correlated subquery?

A non-correlated subquery runs once, independently, and its result is reused. A correlated subquery references the outer query and conceptually re-runs for each outer row, which can be slow if the optimizer cannot rewrite it into a join.

-- Correlated: references the outer row (e.alias)
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) FROM employees
    WHERE department_id = e.department_id   -- correlation
);

22. What is the difference between IN, EXISTS, and JOIN for membership tests?

All three can express "rows that have a match." IN is fine for small static lists but has surprising NULL behavior. EXISTS short-circuits on the first match and is usually the best choice for "does any matching row exist," especially with correlated conditions. A JOIN is right when you also need columns from the other table, but it can multiply rows if the key is not unique. Modern optimizers often produce identical plans for IN and EXISTS.

23. Why is NOT IN dangerous with NULLs?

If the subquery used by NOT IN returns even one NULL, the whole predicate evaluates to UNKNOWN for every row and the query returns no rows. NOT EXISTS does not have this problem and is the safer default.

-- Risky: any NULL in the subquery makes this return nothing
SELECT * FROM products WHERE category_id NOT IN (SELECT category_id FROM archived);

-- Safe equivalent
SELECT p.* FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM archived a WHERE a.category_id = p.category_id
);

24. What is a CTE (Common Table Expression) and when do you use one?

A CTE is a named, temporary result set defined with WITH that exists for the duration of one query. It improves readability by naming intermediate steps, lets you reference the same subquery multiple times, and is the gateway to recursion. It is not automatically a performance win; in some engines a CTE is an optimization fence, while PostgreSQL inlines non-recursive CTEs by default since version 12.

WITH monthly AS (
  SELECT date_trunc('month', created_at) AS m, SUM(amount) AS revenue
  FROM orders
  GROUP BY 1
)
SELECT m, revenue,
       revenue - LAG(revenue) OVER (ORDER BY m) AS mom_change
FROM monthly;

25. What is a recursive CTE, and what is it used for?

A recursive CTE references itself to walk hierarchical or graph data: org charts, category trees, bill-of-materials, or generating a series. It has an anchor member (the base case) joined by UNION ALL to a recursive member that references the CTE name.

WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL      -- anchor: top of tree
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.depth + 1
  FROM employees e
  JOIN org o ON e.manager_id = o.id            -- recurse one level down
)
SELECT * FROM org ORDER BY depth;

26. What is a derived table?

A derived table is a subquery in the FROM clause that is treated as an inline temporary table and must be given an alias. It is functionally similar to a non-recursive CTE and is often used for pre-aggregation before a join (see the fan-out example in question 12).

Window Functions

27. What is a window function, and how does it differ from GROUP BY?

A window function computes a value across a set of rows related to the current row, but unlike GROUP BY it does not collapse those rows. You keep every input row and get an extra computed column. The OVER clause defines the window with PARTITION BY (the grouping) and ORDER BY (the ordering within each partition).

SELECT name, department_id, salary,
       AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
FROM employees;   -- every employee row kept, plus their department average

28. What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?

All three rank rows within a partition, but they handle ties differently. ROW_NUMBER assigns a unique sequential number even to tied rows (arbitrary order among ties). RANK gives ties the same rank and then skips numbers (1, 2, 2, 4). DENSE_RANK gives ties the same rank with no gaps (1, 2, 2, 3).

SELECT name, score,
  ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,
  RANK()       OVER (ORDER BY score DESC) AS rnk,
  DENSE_RANK() OVER (ORDER BY score DESC) AS dense
FROM players;

29. How do you find the second-highest (nth-highest) salary?

The cleanest dialect-portable answer uses DENSE_RANK so ties are handled correctly.

SELECT DISTINCT salary
FROM (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS r
  FROM employees
) t
WHERE r = 2;        -- change to N for nth-highest

A common follow-up asks for the answer without window functions; a correlated subquery counting distinct higher salaries also works, and is worth knowing as a fallback.

30. What are LAG and LEAD used for?

LAG reads a value from a previous row in the ordered partition, and LEAD reads from a following row. They are the go-to tools for period-over-period comparisons, computing deltas, and detecting changes without a self join.

SELECT day, revenue,
       revenue - LAG(revenue) OVER (ORDER BY day) AS day_over_day
FROM daily_sales;

31. How do you compute a running total?

Use a windowed SUM with an ORDER BY and an explicit frame. The default frame for an ordered aggregate is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which gives a cumulative sum.

SELECT day, amount,
       SUM(amount) OVER (ORDER BY day
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
         AS running_total
FROM sales;

32. What is the difference between ROWS and RANGE in a window frame?

ROWS counts physical rows around the current row, so a 3-row moving window always uses 3 rows. RANGE works on logical value ranges and groups peer rows that share the same ORDER BY value, so tied rows are treated as one unit. The distinction matters most when computing moving averages over data with duplicate sort keys.

33. What does NTILE do?

NTILE(n) distributes the ordered rows into n roughly equal buckets and labels each row with its bucket number. It is the standard way to compute quartiles, deciles, or percentile bands (for example, splitting customers into spending quartiles).

Indexes and Performance

34. What is an index, and what is the tradeoff?

An index is a separate data structure (usually a B-tree) that lets the engine find rows by a key without scanning the whole table, turning O(n) scans into O(log n) lookups. The tradeoff: indexes consume storage and slow down INSERT/UPDATE/DELETE because every write must also maintain the index. Index the columns you filter, join, and sort on; do not index everything.

35. What is the difference between a clustered and a non-clustered index?

A clustered index defines the physical order of the table's rows, so there is at most one per table (in SQL Server the primary key is clustered by default; in InnoDB/MySQL the table is physically organized by the primary key). A non-clustered (secondary) index is a separate structure pointing back to the rows. PostgreSQL has no clustered index in this sense; its heap is unordered and all indexes are secondary, though CLUSTER can reorder a table once.

36. What is a composite index, and why does column order matter?

A composite index covers multiple columns in a defined order. It can serve queries that filter on a left-prefix of those columns: an index on (a, b, c) helps queries filtering on a, on a, b, or on a, b, c, but generally not one filtering on b alone. So put the most selective and most frequently filtered columns first, and match the order to your query patterns.

37. What is a covering index?

A covering index includes every column a query needs (in the key or as included columns), so the engine answers the query entirely from the index without touching the table heap. This is called an index-only scan and is a major speedup for hot read paths.

38. Why might the database ignore your index?

Common reasons: the query touches a large fraction of the table (a full scan is cheaper), a function or implicit cast wraps the indexed column (WHERE LOWER(email) = ... cannot use a plain index on email), a leading wildcard LIKE '%foo' prevents B-tree use, stale statistics mislead the planner, or low selectivity (a boolean column) makes the index pointless. Fixing this often means a functional/expression index or rewriting the predicate to be sargable.

-- Not sargable: index on created_at unused because of the function
WHERE date(created_at) = '2026-01-01';

-- Sargable rewrite: range that the index can seek
WHERE created_at >= '2026-01-01' AND created_at < '2026-01-02';

39. What does EXPLAIN (or EXPLAIN ANALYZE) tell you?

EXPLAIN shows the planner's chosen execution plan: scan types (sequential vs index scan), join algorithms (nested loop, hash join, merge join), estimated row counts, and cost. EXPLAIN ANALYZE actually runs the query and adds real timings and actual row counts, so a big gap between estimated and actual rows usually points to stale statistics. Reading a plan is a core skill for the optimization questions in the coding challenges and in interviews.

40. What is query optimization in practice?

Beyond indexing, the high-leverage moves are: select only the columns you need (avoid SELECT * so covering indexes and less I/O apply), filter as early as possible, avoid functions on indexed columns, pre-aggregate before large joins, replace correlated subqueries with joins or window functions where the optimizer cannot, keep table statistics fresh with ANALYZE/ANALYZE TABLE, and watch for fan-out and N+1 query patterns from the application layer.

Transactions and ACID

41. What does ACID stand for?

Atomicity: a transaction is all-or-nothing; partial work is rolled back.
Consistency: a transaction moves the database from one valid state to another, respecting constraints.
Isolation: concurrent transactions do not corrupt each other's view of the data, governed by the isolation level.
Durability: once committed, the change survives crashes (typically via a write-ahead log).

42. What are the SQL isolation levels and the anomalies they prevent?

From weakest to strongest: READ UNCOMMITTED (allows dirty reads), READ COMMITTED (prevents dirty reads; PostgreSQL's default), REPEATABLE READ (also prevents non-repeatable reads; MySQL InnoDB's default), and SERIALIZABLE (also prevents phantom reads, behaving as if transactions ran one at a time). Higher isolation means more consistency but more locking or more rollbacks, so it is a correctness-versus-concurrency tradeoff.

43. What is the difference between optimistic and pessimistic locking?

Pessimistic locking takes a lock up front (SELECT ... FOR UPDATE) so no one else can modify the row while you work, trading concurrency for safety. Optimistic locking takes no lock; it reads a version number or timestamp, then on write checks that the version has not changed and retries if it has. Optimistic suits low-contention, read-heavy workloads; pessimistic suits high-contention hotspots like inventory decrements.

44. What is a deadlock and how do you avoid it?

A deadlock occurs when two transactions each hold a lock the other needs, so neither can proceed; the database detects this and kills one as the victim. Avoid it by acquiring locks in a consistent global order across all code paths, keeping transactions short, touching fewer rows, and using appropriate isolation. Application code should be ready to catch the deadlock error and retry.

Tricky Gotchas

45. Why does comparing anything to NULL with = return nothing, and how do you handle it?

NULL means "unknown," so x = NULL and even NULL = NULL evaluate to UNKNOWN, not TRUE, and rows with UNKNOWN predicates are filtered out. You must use IS NULL / IS NOT NULL, and to coalesce unknowns into a value use COALESCE(x, default). This same three-valued logic is why NOT IN with NULLs (question 23) and ordinary inequality filters silently drop NULL rows.

-- Wrong: returns no rows
SELECT * FROM users WHERE deleted_at = NULL;

-- Right
SELECT * FROM users WHERE deleted_at IS NULL;

-- Beware: this also excludes rows where status IS NULL
SELECT * FROM users WHERE status <> 'banned';
-- Include them explicitly if intended:
SELECT * FROM users WHERE status <> 'banned' OR status IS NULL;

Practice These for Real

Reading answers builds recognition; writing queries builds the recall you need when an interviewer shares a schema and a timer. Drill the high-frequency patterns until they are automatic: nth-highest value, top-N-per-group, running totals, deduplication, and anti-joins.

The best way to lock these in is to write them against a real database. Head to the SQL playground to solve graded problems across joins, aggregation, subqueries, CTEs, and window functions, work through the broader practice questions, and when you are ready to be put on the spot, run a few AI mock interviews that grill you the way a real loop will. If Python is also on your list, pair this with our top 50 Python interview questions.

Train for the role, not just the interview. Open the SQL playground and write your first query now.