gitGood.dev
Tools

SQL Essentials

ToolsFREELast updated: June 2026 · By gitGood Editorial

Query clause order, every JOIN type and when to use it, aggregates vs window functions, what indexes actually buy you, transaction isolation levels, and the NULL / WHERE-vs-HAVING / EXISTS-vs-IN gotchas interviewers fish for.

How SQL shows up in interviews

Three places: a dedicated SQL screen (data / analytics / backend roles), a data-modeling segment inside system design, and the "how would you make this query fast" follow-up after you propose a relational store. The screens reward knowing the logical clause order, window functions, and NULL semantics cold. The follow-ups reward knowing what an index does and what an isolation level costs. Everything below is ANSI-standard unless flagged - Postgres / MySQL differences are noted where they matter.

Query clauses - logical execution order

SQL executes in a different order than you write it. This single fact explains most "why doesn't my alias work" confusion.

1. FROM / JOIN
Build the working row set first. All joins resolve before any filtering. Join conditions live in ON; for OUTER joins, putting a right-table filter in WHERE instead of ON silently turns the join into an INNER join.
2. WHERE
Filter individual rows before grouping. Cannot reference SELECT aliases (they don't exist yet) and cannot contain aggregates - that's HAVING's job.
3. GROUP BY
Collapse rows into groups. Every non-aggregated SELECT column must appear here (MySQL's loose mode that tolerated this is off by default since 5.7 - don't rely on it).
4. HAVING
Filter groups after aggregation. HAVING COUNT(*) > 5 is valid; WHERE COUNT(*) > 5 is a syntax error.
5. SELECT
Compute output expressions and aliases. This is why ORDER BY can use your aliases but WHERE cannot.
6. DISTINCT
Deduplicate the projected rows. Applies to the whole selected row, not one column. Often a smell - a missing join condition produces the duplicates DISTINCT then hides.
7. ORDER BY
Sort the final result. Can use SELECT aliases. NULLs sort last in Postgres ASC, first in MySQL ASC - say "NULLS FIRST/LAST" if it matters.
8. LIMIT / OFFSET
Truncate the sorted result. LIMIT without ORDER BY returns arbitrary rows - nondeterministic, and a classic code-review catch.

JOIN types

Know the shape of the output (which side keeps unmatched rows) and the canonical use case for each.

INNER JOIN
Only rows with a match on both sides. The default meaning of bare JOIN. Use when the relationship is required (order -> customer).
LEFT (OUTER) JOIN
All left rows; right columns are NULL where unmatched. Use when the right side is optional (user -> latest_login). The workhorse outer join - most people write LEFT and never RIGHT.
RIGHT (OUTER) JOIN
Mirror of LEFT. Almost always rewritten as a LEFT JOIN with the tables swapped for readability.
FULL OUTER JOIN
All rows from both sides, NULLs where unmatched. Use for reconciliation / diff queries (what's in A not B, in B not A, in both). MySQL lacks it - emulate with LEFT JOIN UNION RIGHT JOIN.
CROSS JOIN
Cartesian product - every left row paired with every right row. Legit for generating combinations (all dates x all stores for a report scaffold); otherwise it's a missing ON clause bug.
Self join
A table joined to itself with aliases (employees e JOIN employees m ON e.manager_id = m.id). Use for hierarchies one level deep, or comparing rows within a table (events to prior events).
Semi-join (EXISTS)
Filter left rows by whether a match exists, without duplicating output rows: WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id). Use instead of JOIN + DISTINCT when you only need the left side.
Anti-join (NOT EXISTS)
Left rows with no match: WHERE NOT EXISTS (...). Also written LEFT JOIN ... WHERE right.id IS NULL. The classic "customers who never ordered" question. Prefer NOT EXISTS over NOT IN (see NULL gotchas).

Aggregates, GROUP BY, HAVING

COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col)
COUNT(*) counts rows. COUNT(col) counts rows where col IS NOT NULL - a silent filter people forget. COUNT(DISTINCT col) counts unique non-NULL values.
SUM / AVG / MIN / MAX and NULL
All aggregates skip NULLs. AVG(col) divides by the count of non-NULL rows, not all rows - use SUM(col) / COUNT(*) or COALESCE(col, 0) when NULL should mean zero. Aggregating zero rows yields NULL, not 0 (except COUNT, which yields 0).
GROUP BY rule
Every SELECT column is either aggregated or listed in GROUP BY. Want "the whole row of the max per group"? That's not GROUP BY - that's a top-N-per-group pattern (see patterns below).
HAVING
WHERE for groups. Filter on aggregates here: HAVING SUM(amount) > 1000. Filtering on a plain column in HAVING works but wastes work - push it down to WHERE so rows are dropped before grouping.
Conditional aggregation
Pivot without PIVOT: SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) AS failed_count, or Postgres COUNT(*) FILTER (WHERE status = 'failed'). One pass over the table instead of N subqueries.
GROUP BY expression
You can group by computed values: GROUP BY DATE_TRUNC('day', created_at). Standard for time-bucketed metrics questions.

Window functions

Aggregates that don't collapse rows - every input row survives, with the computed value attached. The single highest-leverage SQL topic for interviews.

Anatomy
fn() OVER (PARTITION BY group_cols ORDER BY sort_cols [frame]). PARTITION BY = restart per group (like GROUP BY without collapsing). ORDER BY = ordering within the partition. Omit PARTITION BY to window over the whole result.
ROW_NUMBER vs RANK vs DENSE_RANK
On ties: ROW_NUMBER gives 1,2,3 (arbitrary tiebreak), RANK gives 1,1,3 (gaps), DENSE_RANK gives 1,1,2 (no gaps). "Second highest salary" wants DENSE_RANK = 2; "top 3 per group, exactly 3" wants ROW_NUMBER <= 3.
LAG / LEAD
Previous / next row's value within the partition: LAG(price) OVER (PARTITION BY ticker ORDER BY day). The backbone of day-over-day deltas, sessionization, and gap detection. Third arg sets a default for the first row: LAG(price, 1, 0).
Running totals / moving averages
SUM(amount) OVER (ORDER BY day) = running total. AVG(amount) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) = 7-day moving average. With ORDER BY, the default frame is UNBOUNDED PRECEDING to CURRENT ROW.
NTILE(n)
Bucket rows into n equal groups - quartiles, deciles, percentile cohorts: NTILE(4) OVER (ORDER BY revenue).
Where you can use them
SELECT and ORDER BY only. Not in WHERE, GROUP BY, or HAVING - windows compute after grouping. To filter on one ("rows where ROW_NUMBER = 1"), wrap it in a subquery or CTE and filter the outer query. Interviewers test this constantly.

Indexes

Default mental model: a B-tree index is a sorted copy of the indexed columns with pointers back to the rows.

What they speed up
Equality and range lookups on the indexed column (WHERE, JOIN ON), ORDER BY that matches the index order, and sometimes GROUP BY. Without one, every lookup is a full table scan.
What they cost
Every INSERT / UPDATE / DELETE must also update each index - write amplification - plus disk space. A write-heavy table with ten indexes is a design smell. Index what you query, not everything.
Composite indexes - leftmost prefix
An index on (a, b, c) serves queries filtering on (a), (a, b), or (a, b, c) - not (b) or (b, c) alone. Order columns by equality filters first, then the range / sort column. This is the most-asked index question.
Covering index
If the index contains every column the query needs, the engine answers from the index alone and never touches the table (index-only scan). Postgres: INCLUDE (col) adds payload columns without affecting key order.
When the index gets skipped
Function or cast on the column (WHERE LOWER(email) = ... needs an expression index), leading-wildcard LIKE ('%foo'), low selectivity (status with 3 values - planner prefers a scan), or implicit type mismatch. EXPLAIN / EXPLAIN ANALYZE confirms what actually ran.
Primary key / unique
Both are backed by indexes automatically - don't add a duplicate index on the PK. Foreign key columns are NOT auto-indexed in Postgres; index them or pay on every join and parent delete.

Transactions and isolation levels

ACID = Atomicity (all or nothing), Consistency (constraints hold), Isolation (concurrent transactions don't trample), Durability (committed = survives crash). Isolation is the tunable one.

READ UNCOMMITTED
Can see other transactions' uncommitted writes (dirty reads). Effectively never the right answer; Postgres doesn't even implement it (treats it as READ COMMITTED).
READ COMMITTED
Only see committed data, but two reads in one transaction can disagree (non-repeatable read). Postgres default. Fine for most OLTP.
REPEATABLE READ
Rows you've read won't change under you for the rest of the transaction. MySQL/InnoDB default. Classic anomaly remaining: phantoms (new rows matching your WHERE appear) - though Postgres's snapshot implementation prevents phantoms too.
SERIALIZABLE
Result is as if transactions ran one at a time. Strongest, safest, slowest - expect serialization failures your app must retry. Use for invariants spanning multiple rows (no double-booking, balanced ledgers).
Anomaly cheat line
Dirty read (see uncommitted) -> blocked from READ COMMITTED up. Non-repeatable read (row changes between reads) -> blocked from REPEATABLE READ up. Phantom (new matching rows appear) -> blocked at SERIALIZABLE.
Locking vs MVCC
Modern engines use MVCC: readers see a snapshot, writers don't block readers. Explicit locks when you need them: SELECT ... FOR UPDATE locks rows you're about to modify (the standard fix for read-then-update races like seat booking).

Interview gotchas

  • ·NULL is not a value - it's "unknown." NULL = NULL evaluates to NULL (not true), so WHERE col = NULL matches nothing. Use IS NULL / IS DISTINCT FROM.
  • ·NOT IN with a NULL in the list returns zero rows, always - x NOT IN (1, 2, NULL) can never be true. This is why NOT EXISTS is the safe anti-join.
  • ·WHERE filters rows before grouping; HAVING filters groups after. Aggregate conditions go in HAVING, everything else belongs in WHERE.
  • ·EXISTS vs IN: equivalent on clean data; EXISTS short-circuits, handles NULLs sanely, and modern planners usually optimize both the same. Say "NOT EXISTS over NOT IN because of NULLs" and you've answered the real question.
  • ·UNION deduplicates (with a sort/hash cost); UNION ALL just concatenates. Default to UNION ALL unless you need dedup.
  • ·COUNT(col) silently drops NULL rows. If a count looks too low, this is why.
  • ·Integer division: 1/2 = 0 in Postgres and SQL Server. Multiply by 1.0 or cast before dividing for ratios.
  • ·BETWEEN is inclusive on both ends - wrong for timestamp ranges. Use >= start AND < end (half-open) for dates.
  • ·ORDER BY inside a subquery or view is not guaranteed to survive to the outer query. Order at the outermost level.
  • ·DELETE / UPDATE without WHERE hits every row, and most consoles won't stop you. Write the WHERE first, or wrap in a transaction and check with SELECT.

Query patterns

The three shapes that anchor most intermediate SQL screens. Templates are Postgres-flavored ANSI.

Top-N per group

When to use
"Highest-paid employee per department," "latest order per customer," "top 3 products per category." Any time GROUP BY can't work because you need whole rows, not aggregates.
Template
SELECT *
FROM (
  SELECT e.*,
         ROW_NUMBER() OVER (
           PARTITION BY department_id
           ORDER BY salary DESC
         ) AS rn
  FROM employees e
) ranked
WHERE rn <= 3;
-- ties matter? swap ROW_NUMBER for DENSE_RANK.
-- N = 1 in Postgres: SELECT DISTINCT ON (department_id) *
--   FROM employees ORDER BY department_id, salary DESC;
Complexity
One scan + a sort per partition; an index on (department_id, salary DESC) makes it near-free.
Examples
  • Department Highest Salary (LeetCode 184/185)
  • Latest event per user / per session
  • Most recent price per ticker

Dedup - keep one row per key

When to use
A table accumulated duplicates (no unique constraint, double ingestion) and you must keep exactly one row per logical key - usually the newest.
Template
DELETE FROM users
WHERE id IN (
  SELECT id
  FROM (
    SELECT id,
           ROW_NUMBER() OVER (
             PARTITION BY email
             ORDER BY created_at DESC
           ) AS rn
    FROM users
  ) d
  WHERE rn > 1
);
-- then add the missing constraint:
-- ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);
Complexity
One scan + window sort; the DELETE cost scales with duplicates removed.
Examples
  • Delete Duplicate Emails (LeetCode 196)
  • Idempotent re-ingestion cleanup
  • Pick the canonical row before adding a UNIQUE constraint

Gaps and islands

When to use
Find missing values in a sequence (gaps: skipped IDs, missed days) or group consecutive runs (islands: login streaks, continuous sensor uptime).
Template
-- Islands: consecutive-day streaks per user.
-- Trick: date minus ROW_NUMBER is constant within a run.
SELECT user_id,
       MIN(login_date) AS streak_start,
       MAX(login_date) AS streak_end,
       COUNT(*)        AS streak_len
FROM (
  SELECT user_id, login_date,
         login_date - CAST(ROW_NUMBER() OVER (
           PARTITION BY user_id ORDER BY login_date
         ) AS int) AS grp
  FROM logins
) t
GROUP BY user_id, grp;

-- Gaps: where does the next row jump?
SELECT id + 1 AS gap_start
FROM seq s
WHERE NOT EXISTS (SELECT 1 FROM seq WHERE id = s.id + 1);
Complexity
One window pass + GROUP BY. The row_number-difference trick is the expected answer; LAG-based variants also pass.
Examples
  • Active user streaks
  • Missing invoice numbers
  • Continuous uptime intervals from heartbeat pings

Other cheat sheets

Big-O Reference

Algorithms

Time and space complexity for the data structures, sorting algorithms, and search routines that show up in coding interviews. Skim the row, remember the row, defend the row in an interview.

Interview Patterns

Patterns

The recurring shapes - sliding window, two pointers, fast/slow, BFS/DFS, backtracking, DP, divide & conquer, binary search variants, union-find, topological sort. Each entry: when to reach for it, the template, complexity, and which classic problems use it.

Design Tradeoffs

Systems

The recurring forks in system design interviews. CAP, PACELC, sync vs async, push vs pull, SQL vs NoSQL, sharding shapes, consistency models, cache strategies, idempotency, and rate limiting. For each, the options and when to choose each.

Unix Essentials

Tools

Filesystem layout, the commands you actually use (find / grep / awk / sed / xargs), processes and signals, networking, permissions, basic shell scripting, and a vi survival kit.

Git Essentials

Tools

The everyday commands, every undo scenario mapped to its fix, rebase vs merge with a side to pick, interactive rebase, bisect, the reflog safety net, stash, and the flags worth aliasing.

Docker & K8s

Tools

The docker and kubectl commands you reach for daily, Dockerfile best practices, how layer caching actually works, the core k8s objects in one screen, requests vs limits, liveness vs readiness, and a step-by-step CrashLoopBackOff debug flow.

REST API Design

Systems

Method semantics and idempotency, the ~15 status codes that matter, resource naming rules, offset vs cursor pagination, versioning and auth tradeoffs, error body conventions, rate-limit headers, and the smells reviewers flag.

STAR Method

Patterns

The STAR structure with timing, what interviewers actually grade, eight question archetypes and how to frame each, the anti-patterns that sink answers (rambling, "we" instead of "I", no metrics), and a 30-second answer skeleton.

Practice the patterns

Reading is the floor. The signal in interviews comes from working problems out loud and defending your tradeoffs. Spin up an AI mock interview or run a coding challenge to put these to work.