Tools

SQL Essentials

ToolsFREELast updated: June 2026 · By gitGood Editorial

Query clause order, every JOIN type and when to use it, aggregates vs window functions, what indexes actually buy you, transaction isolation levels, and the NULL / WHERE-vs-HAVING / EXISTS-vs-IN gotchas interviewers fish for.

How SQL shows up in interviews

Three places: a dedicated SQL screen (data / analytics / backend roles), a data-modeling segment inside system design, and the "how would you make this query fast" follow-up after you propose a relational store. The screens reward knowing the logical clause order, window functions, and NULL semantics cold. The follow-ups reward knowing what an index does and what an isolation level costs. Everything below is ANSI-standard unless flagged - Postgres / MySQL differences are noted where they matter.

Query clauses - logical execution order

SQL executes in a different order than you write it. This single fact explains most "why doesn't my alias work" confusion.

1. FROM / JOIN: Build the working row set first. All joins resolve before any filtering. Join conditions live in ON; for OUTER joins, putting a right-table filter in WHERE instead of ON silently turns the join into an INNER join.
2. WHERE: Filter individual rows before grouping. Cannot reference SELECT aliases (they don't exist yet) and cannot contain aggregates - that's HAVING's job.
3. GROUP BY: Collapse rows into groups. Every non-aggregated SELECT column must appear here (MySQL's loose mode that tolerated this is off by default since 5.7 - don't rely on it).
4. HAVING: Filter groups after aggregation. HAVING COUNT(*) > 5 is valid; WHERE COUNT(*) > 5 is a syntax error.
5. SELECT: Compute output expressions and aliases. This is why ORDER BY can use your aliases but WHERE cannot.
6. DISTINCT: Deduplicate the projected rows. Applies to the whole selected row, not one column. Often a smell - a missing join condition produces the duplicates DISTINCT then hides.
7. ORDER BY: Sort the final result. Can use SELECT aliases. NULLs sort last in Postgres ASC, first in MySQL ASC - say "NULLS FIRST/LAST" if it matters.
8. LIMIT / OFFSET: Truncate the sorted result. LIMIT without ORDER BY returns arbitrary rows - nondeterministic, and a classic code-review catch.

JOIN types

Know the shape of the output (which side keeps unmatched rows) and the canonical use case for each.

INNER JOIN: Only rows with a match on both sides. The default meaning of bare JOIN. Use when the relationship is required (order -> customer).
LEFT (OUTER) JOIN: All left rows; right columns are NULL where unmatched. Use when the right side is optional (user -> latest_login). The workhorse outer join - most people write LEFT and never RIGHT.
RIGHT (OUTER) JOIN: Mirror of LEFT. Almost always rewritten as a LEFT JOIN with the tables swapped for readability.
FULL OUTER JOIN: All rows from both sides, NULLs where unmatched. Use for reconciliation / diff queries (what's in A not B, in B not A, in both). MySQL lacks it - emulate with LEFT JOIN UNION RIGHT JOIN.
CROSS JOIN: Cartesian product - every left row paired with every right row. Legit for generating combinations (all dates x all stores for a report scaffold); otherwise it's a missing ON clause bug.
Self join: A table joined to itself with aliases (employees e JOIN employees m ON e.manager_id = m.id). Use for hierarchies one level deep, or comparing rows within a table (events to prior events).
Semi-join (EXISTS): Filter left rows by whether a match exists, without duplicating output rows: WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id). Use instead of JOIN + DISTINCT when you only need the left side.
Anti-join (NOT EXISTS): Left rows with no match: WHERE NOT EXISTS (...). Also written LEFT JOIN ... WHERE right.id IS NULL. The classic "customers who never ordered" question. Prefer NOT EXISTS over NOT IN (see NULL gotchas).

Aggregates, GROUP BY, HAVING

COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col): COUNT(*) counts rows. COUNT(col) counts rows where col IS NOT NULL - a silent filter people forget. COUNT(DISTINCT col) counts unique non-NULL values.
SUM / AVG / MIN / MAX and NULL: All aggregates skip NULLs. AVG(col) divides by the count of non-NULL rows, not all rows - use SUM(col) / COUNT(*) or COALESCE(col, 0) when NULL should mean zero. Aggregating zero rows yields NULL, not 0 (except COUNT, which yields 0).
GROUP BY rule: Every SELECT column is either aggregated or listed in GROUP BY. Want "the whole row of the max per group"? That's not GROUP BY - that's a top-N-per-group pattern (see patterns below).
HAVING: WHERE for groups. Filter on aggregates here: HAVING SUM(amount) > 1000. Filtering on a plain column in HAVING works but wastes work - push it down to WHERE so rows are dropped before grouping.
Conditional aggregation: Pivot without PIVOT: SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) AS failed_count, or Postgres COUNT(*) FILTER (WHERE status = 'failed'). One pass over the table instead of N subqueries.
GROUP BY expression: You can group by computed values: GROUP BY DATE_TRUNC('day', created_at). Standard for time-bucketed metrics questions.

Window functions

Aggregates that don't collapse rows - every input row survives, with the computed value attached. The single highest-leverage SQL topic for interviews.

Anatomy: fn() OVER (PARTITION BY group_cols ORDER BY sort_cols [frame]). PARTITION BY = restart per group (like GROUP BY without collapsing). ORDER BY = ordering within the partition. Omit PARTITION BY to window over the whole result.
ROW_NUMBER vs RANK vs DENSE_RANK: On ties: ROW_NUMBER gives 1,2,3 (arbitrary tiebreak), RANK gives 1,1,3 (gaps), DENSE_RANK gives 1,1,2 (no gaps). "Second highest salary" wants DENSE_RANK = 2; "top 3 per group, exactly 3" wants ROW_NUMBER <= 3.
LAG / LEAD: Previous / next row's value within the partition: LAG(price) OVER (PARTITION BY ticker ORDER BY day). The backbone of day-over-day deltas, sessionization, and gap detection. Third arg sets a default for the first row: LAG(price, 1, 0).
Running totals / moving averages: SUM(amount) OVER (ORDER BY day) = running total. AVG(amount) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) = 7-day moving average. With ORDER BY, the default frame is UNBOUNDED PRECEDING to CURRENT ROW.
NTILE(n): Bucket rows into n equal groups - quartiles, deciles, percentile cohorts: NTILE(4) OVER (ORDER BY revenue).
Where you can use them: SELECT and ORDER BY only. Not in WHERE, GROUP BY, or HAVING - windows compute after grouping. To filter on one ("rows where ROW_NUMBER = 1"), wrap it in a subquery or CTE and filter the outer query. Interviewers test this constantly.

Indexes

Default mental model: a B-tree index is a sorted copy of the indexed columns with pointers back to the rows.

What they speed up: Equality and range lookups on the indexed column (WHERE, JOIN ON), ORDER BY that matches the index order, and sometimes GROUP BY. Without one, every lookup is a full table scan.
What they cost: Every INSERT / UPDATE / DELETE must also update each index - write amplification - plus disk space. A write-heavy table with ten indexes is a design smell. Index what you query, not everything.
Composite indexes - leftmost prefix: An index on (a, b, c) serves queries filtering on (a), (a, b), or (a, b, c) - not (b) or (b, c) alone. Order columns by equality filters first, then the range / sort column. This is the most-asked index question.
Covering index: If the index contains every column the query needs, the engine answers from the index alone and never touches the table (index-only scan). Postgres: INCLUDE (col) adds payload columns without affecting key order.
When the index gets skipped: Function or cast on the column (WHERE LOWER(email) = ... needs an expression index), leading-wildcard LIKE ('%foo'), low selectivity (status with 3 values - planner prefers a scan), or implicit type mismatch. EXPLAIN / EXPLAIN ANALYZE confirms what actually ran.
Primary key / unique: Both are backed by indexes automatically - don't add a duplicate index on the PK. Foreign key columns are NOT auto-indexed in Postgres; index them or pay on every join and parent delete.

Transactions and isolation levels

ACID = Atomicity (all or nothing), Consistency (constraints hold), Isolation (concurrent transactions don't trample), Durability (committed = survives crash). Isolation is the tunable one.

READ UNCOMMITTED: Can see other transactions' uncommitted writes (dirty reads). Effectively never the right answer; Postgres doesn't even implement it (treats it as READ COMMITTED).
READ COMMITTED: Only see committed data, but two reads in one transaction can disagree (non-repeatable read). Postgres default. Fine for most OLTP.
REPEATABLE READ: Rows you've read won't change under you for the rest of the transaction. MySQL/InnoDB default. Classic anomaly remaining: phantoms (new rows matching your WHERE appear) - though Postgres's snapshot implementation prevents phantoms too.
SERIALIZABLE: Result is as if transactions ran one at a time. Strongest, safest, slowest - expect serialization failures your app must retry. Use for invariants spanning multiple rows (no double-booking, balanced ledgers).
Anomaly cheat line: Dirty read (see uncommitted) -> blocked from READ COMMITTED up. Non-repeatable read (row changes between reads) -> blocked from REPEATABLE READ up. Phantom (new matching rows appear) -> blocked at SERIALIZABLE.
Locking vs MVCC: Modern engines use MVCC: readers see a snapshot, writers don't block readers. Explicit locks when you need them: SELECT ... FOR UPDATE locks rows you're about to modify (the standard fix for read-then-update races like seat booking).

Interview gotchas

·NULL is not a value - it's "unknown." NULL = NULL evaluates to NULL (not true), so WHERE col = NULL matches nothing. Use IS NULL / IS DISTINCT FROM.
·NOT IN with a NULL in the list returns zero rows, always - x NOT IN (1, 2, NULL) can never be true. This is why NOT EXISTS is the safe anti-join.
·WHERE filters rows before grouping; HAVING filters groups after. Aggregate conditions go in HAVING, everything else belongs in WHERE.
·EXISTS vs IN: equivalent on clean data; EXISTS short-circuits, handles NULLs sanely, and modern planners usually optimize both the same. Say "NOT EXISTS over NOT IN because of NULLs" and you've answered the real question.
·UNION deduplicates (with a sort/hash cost); UNION ALL just concatenates. Default to UNION ALL unless you need dedup.
·COUNT(col) silently drops NULL rows. If a count looks too low, this is why.
·Integer division: 1/2 = 0 in Postgres and SQL Server. Multiply by 1.0 or cast before dividing for ratios.
·BETWEEN is inclusive on both ends - wrong for timestamp ranges. Use >= start AND < end (half-open) for dates.
·ORDER BY inside a subquery or view is not guaranteed to survive to the outer query. Order at the outermost level.
·DELETE / UPDATE without WHERE hits every row, and most consoles won't stop you. Write the WHERE first, or wrap in a transaction and check with SELECT.

Query patterns

The three shapes that anchor most intermediate SQL screens. Templates are Postgres-flavored ANSI.

Top-N per group

When to use: "Highest-paid employee per department," "latest order per customer," "top 3 products per category." Any time GROUP BY can't work because you need whole rows, not aggregates.
Template: SELECT * FROM ( SELECT e.*, ROW_NUMBER() OVER ( PARTITION BY department_id ORDER BY salary DESC ) AS rn FROM employees e ) ranked WHERE rn <= 3; -- ties matter? swap ROW_NUMBER for DENSE_RANK. -- N = 1 in Postgres: SELECT DISTINCT ON (department_id) * -- FROM employees ORDER BY department_id, salary DESC;
Complexity: One scan + a sort per partition; an index on (department_id, salary DESC) makes it near-free.
Examples: Department Highest Salary (LeetCode 184/185)
Latest event per user / per session
Most recent price per ticker

Dedup - keep one row per key

When to use: A table accumulated duplicates (no unique constraint, double ingestion) and you must keep exactly one row per logical key - usually the newest.
Template: DELETE FROM users WHERE id IN ( SELECT id FROM ( SELECT id, ROW_NUMBER() OVER ( PARTITION BY email ORDER BY created_at DESC ) AS rn FROM users ) d WHERE rn > 1 ); -- then add the missing constraint: -- ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);
Complexity: One scan + window sort; the DELETE cost scales with duplicates removed.
Examples: Delete Duplicate Emails (LeetCode 196)
Idempotent re-ingestion cleanup
Pick the canonical row before adding a UNIQUE constraint

Gaps and islands

When to use: Find missing values in a sequence (gaps: skipped IDs, missed days) or group consecutive runs (islands: login streaks, continuous sensor uptime).
Template: -- Islands: consecutive-day streaks per user. -- Trick: date minus ROW_NUMBER is constant within a run. SELECT user_id, MIN(login_date) AS streak_start, MAX(login_date) AS streak_end, COUNT(*) AS streak_len FROM ( SELECT user_id, login_date, login_date - CAST(ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY login_date ) AS int) AS grp FROM logins ) t GROUP BY user_id, grp; -- Gaps: where does the next row jump? SELECT id + 1 AS gap_start FROM seq s WHERE NOT EXISTS (SELECT 1 FROM seq WHERE id = s.id + 1);
Complexity: One window pass + GROUP BY. The row_number-difference trick is the expected answer; LAG-based variants also pass.
Examples: Active user streaks
Missing invoice numbers
Continuous uptime intervals from heartbeat pings

Other cheat sheets

Big-O Reference

Algorithms

Time and space complexity for the data structures, sorting algorithms, and search routines that show up in coding interviews. Skim the row, remember the row, defend the row in an interview.

Interview Patterns

Patterns

The recurring shapes - sliding window, two pointers, fast/slow, BFS/DFS, backtracking, DP, divide & conquer, binary search variants, union-find, topological sort. Each entry: when to reach for it, the template, complexity, and which classic problems use it.

Design Tradeoffs

Systems

The recurring forks in system design interviews. CAP, PACELC, sync vs async, push vs pull, SQL vs NoSQL, sharding shapes, consistency models, cache strategies, idempotency, and rate limiting. For each, the options and when to choose each.

Unix Essentials

Tools

Filesystem layout, the commands you actually use (find / grep / awk / sed / xargs), processes and signals, networking, permissions, basic shell scripting, and a vi survival kit.

Git Essentials

Tools

The everyday commands, every undo scenario mapped to its fix, rebase vs merge with a side to pick, interactive rebase, bisect, the reflog safety net, stash, and the flags worth aliasing.

Docker & K8s

Tools

The docker and kubectl commands you reach for daily, Dockerfile best practices, how layer caching actually works, the core k8s objects in one screen, requests vs limits, liveness vs readiness, and a step-by-step CrashLoopBackOff debug flow.

REST API Design

Systems

Method semantics and idempotency, the ~15 status codes that matter, resource naming rules, offset vs cursor pagination, versioning and auth tradeoffs, error body conventions, rate-limit headers, and the smells reviewers flag.

STAR Method

Patterns

The STAR structure with timing, what interviewers actually grade, eight question archetypes and how to frame each, the anti-patterns that sink answers (rambling, "we" instead of "I", no metrics), and a 30-second answer skeleton.

Networking

Systems

TCP vs UDP, the TLS and TCP handshakes, HTTP versions, status codes, DNS resolution, the OSI and TCP/IP layer models, and the ports you are expected to know in an interview.

Regex

Tools

Anchors, character classes, quantifiers, groups, alternation, lookarounds, backreferences, and flags - plus practical patterns and the gotchas that trip people up in interviews.

Linux Perf

Tools

The USE method, a first-five-minutes triage runbook, and the CPU, memory, disk, network, and tracing commands you reach for when a Linux box is misbehaving.

Concurrency

Patterns

A fast reference for concurrency primitives, synchronization tradeoffs, the memory model, and the classic bugs that show up in systems interviews and real code.

Distributed Systems

Systems

A reference for the theorems, consistency models, replication and partitioning strategies, delivery guarantees, and resilience patterns that come up in system design interviews.

Kafka & Messaging

Systems

Topics, partitions, and consumer groups, the three delivery semantics and how Kafka actually achieves each, ordering guarantees, rebalancing, retention vs compaction, and a straight Kafka vs SQS vs RabbitMQ vs Kinesis comparison.

GraphQL

Systems

Schema, types, and resolvers, the three operation kinds, the N+1 problem and DataLoader, cursor vs offset pagination, error handling that actually works, security (depth limiting, query cost), and an honest answer to 'when does REST beat GraphQL'.

Terraform & IaC

Tools

State and why it must be remote and locked, the init/plan/apply lifecycle, modules and variables, count vs for_each, workspaces, import and drift, a command table, and the gotchas (prevent_destroy, secrets in state) that mark real production experience.

LLMs & Prompting

Systems

How LLMs work in one paragraph, the knobs (context window, temperature, top-p), system vs user prompts, few-shot and chain-of-thought, RAG and embeddings, the fine-tune-vs-prompt decision, tool calling, eval basics, and the interview questions teams actually ask now.

SQL Query Tuning

Tools

How B-tree indexes actually work, composite index column order, covering indexes, reading EXPLAIN ANALYZE, why the planner ignores your index, join algorithms, N+1, keyset pagination, and the 'why is this query slow' scenarios interviews are built on.

Practice the patterns

Reading is the floor. The signal in interviews comes from working problems out loud and defending your tradeoffs. Spin up an AI mock interview or run a coding challenge to put these to work.

Coding challenges AI mock interview SQL practice problems by topic Joins practice problems