Patterns

Concurrency Primitives Cheat Sheet

PatternsFREELast updated: June 2026 · By gitGood Editorial

A fast reference for concurrency primitives, synchronization tradeoffs, the memory model, and the classic bugs that show up in systems interviews and real code.

Synchronization primitives

The building blocks. Pick the weakest primitive that still guarantees correctness.

Mutex (lock): Mutual exclusion - exactly one holder of the critical section at a time. Has an owner; only the locking thread should unlock. Blocking; waiters sleep.
Recursive mutex: A mutex the same thread can lock multiple times (reference-counted). Needs an equal number of unlocks. Usually a design smell - prefer restructuring.
Semaphore: A counter guarding N units of a resource. acquire() (down/P) decrements and blocks at zero; release() (up/V) increments. No ownership - any thread can release. A binary semaphore (count 1) resembles a lock but lacks owner semantics.
Spinlock: A lock where waiters busy-loop instead of sleeping. Cheap when contention is rare and hold times are nanoseconds (no context switch); wastes CPU otherwise. Never spin on a single core or while holding a sleeping lock.
Condition variable: Lets a thread wait for a predicate to become true while atomically releasing an associated mutex. Paired with signal/notify. Always re-check the predicate in a while loop (spurious + stolen wakeups).
Monitor: A higher-level construct bundling a mutex with one or more condition variables so that only one thread executes monitor code at a time. Java's 'synchronized' + wait/notify is a monitor.
RWLock: Allows many concurrent readers or one exclusive writer. Faster for read-heavy workloads; watch for writer starvation under constant reads.
Barrier: Blocks a fixed set of threads until all have arrived, then releases them together. Used to phase parallel algorithms.
Atomic: A variable with indivisible read-modify-write operations (no lock needed). Foundation for lock-free code.

Atomics and compare-and-swap (CAS)

An atomic operation completes indivisibly - no other thread sees a half-finished state. The core hardware primitive is compare-and-swap: CAS(address, expected, new) atomically checks that the memory holds 'expected' and, only if so, writes 'new', returning whether it succeeded. Lock-free algorithms wrap CAS in a retry loop: read the current value, compute the new value, CAS it in, and if another thread changed it in the meantime the CAS fails and you loop again. This gives progress without blocking but can livelock under heavy contention and is vulnerable to the ABA problem (see gotchas). Fetch-and-add and test-and-set are related atomic primitives; counters and reference counts are usually built from atomic increment rather than a mutex.

Coffman deadlock conditions and how to break them

All four must hold simultaneously for deadlock. Eliminate any one to prevent it.

·Mutual exclusion - resources can be held exclusively. Break it by using shareable / lock-free resources where possible (e.g. immutable data, RWLocks for readers).
·Hold and wait - a thread holds one resource while waiting for another. Break it by acquiring all needed locks at once, or releasing held locks before requesting more.
·No preemption - a resource cannot be forcibly taken from its holder. Break it with lock timeouts / try-lock that backs off and retries, or by allowing rollback.
·Circular wait - a cycle of threads each waiting on the next. Break it by imposing a global lock-ordering and always acquiring locks in that order (the most common practical fix).

Race conditions vs data races

A data race is the narrow, language-defined bug: two threads access the same memory location concurrently, at least one access is a write, and there is no synchronization (no happens-before) ordering them. In C++, Go, Rust, and Java a data race on non-atomic memory is undefined or unspecified behavior, and race detectors (ThreadSanitizer, Go's -race) hunt exactly these. A race condition is the broader correctness bug: the result depends on the timing/interleaving of operations, even when every individual access is properly synchronized. Classic example: a check-then-act like 'if (!exists) create()' where two threads both pass the check before either creates - each access can be atomic yet the outcome is still wrong. Fixing data races (add locks/atomics) does not automatically fix race conditions (you must make the whole logical operation atomic).

Memory model basics

On modern CPUs and optimizing compilers, memory operations do not necessarily execute in source order. Both the compiler and the hardware may reorder, cache, or coalesce reads and writes as long as a single thread's observable behavior is preserved - but other threads can observe these reorderings. The memory model is the contract that says which reorderings are allowed and what guarantees you get back. The key relation is happens-before: if operation A happens-before B, then B is guaranteed to see A's effects. You establish happens-before with synchronization - acquiring/releasing a lock, or atomic operations with the right ordering. A memory barrier (fence) is the low-level instruction that forbids reordering across it: an acquire fence stops later reads from moving up, a release fence stops earlier writes from moving down, and a full barrier blocks both. 'volatile' differs by language: in Java/C# it implies ordering + visibility (a real memory-model concept), but in C/C++ 'volatile' only prevents the compiler from optimizing away the access and gives NO thread-safety - use std::atomic instead. Sequential consistency (the strongest, most intuitive ordering where everything appears interleaved in one global order) is the default for atomics in most languages but is also the slowest; acquire/release is the common cheaper alternative.

Execution models

How you get concurrency at all shapes everything above it.

Threads vs async/event-loop vs processes

Threads (shared memory)

Multiple threads in one address space, scheduled preemptively by the OS. True parallelism on multiple cores (outside a GIL). Cheap data sharing but every shared mutation needs synchronization - this is where the bugs live. Each thread costs a stack (often ~1 MB).

Async / single-threaded event loop

One thread cooperatively multiplexes many tasks via non-blocking I/O and an event loop (Node.js, Python asyncio, Nginx). No data races within the loop because there is no preemption between awaits, so less locking. But a single blocking/CPU-bound call stalls everything, and it does not use multiple cores by itself.

Processes (isolated memory)

Separate address spaces, no shared memory by default - communicate via IPC, pipes, or message passing. Strong fault and security isolation (one crash does not corrupt another) and no data races, at the cost of higher memory use and slower, explicit communication.

When to choose each

Async for high-concurrency I/O-bound work (web servers, proxies). Threads for shared-state parallelism and CPU work when sharing is cheap. Processes for isolation, fault tolerance, or to sidestep a GIL - often combined (a pool of processes each running an event loop).

Locking vs lock-free

Lock-based

Use mutexes/RWLocks to serialize access. Simple to reason about and compose; correctness is local. But locks can deadlock, cause priority inversion, and serialize threads under contention; a thread killed while holding a lock can wedge the system.

Lock-free / CAS-based

Use atomics and CAS retry loops so at least one thread always makes progress regardless of others. No deadlock and great scalability under contention, but very hard to write correctly, subject to the ABA problem, and can livelock. 'Wait-free' is the strongest variant where every thread finishes in bounded steps.

When to choose each

Default to locks for clarity and correctness. Reach for lock-free only for proven hot paths (counters, queues, ring buffers) where contention is measured and the data structure is small - and prefer a battle-tested library over hand-rolling it.

Classic coordination patterns

Producer-consumer: Producers push work onto a bounded queue, consumers pull from it; the queue decouples their rates. Use a mutex + two condition variables (not-full, not-empty), or a thread-safe blocking queue. The bound provides natural backpressure when consumers fall behind.
Reader-writer: Many readers may share access but a writer needs exclusivity. Use an RWLock. Decide a policy up front: reader-preference (writers can starve), writer-preference (readers can starve), or fair/FIFO.
Thread pool: A fixed set of worker threads pulling tasks off a shared queue - amortizes thread creation cost and bounds parallelism. The work queue is itself a producer-consumer problem.
Future / promise: A handle to a value that will be produced later by another thread; the consumer blocks or awaits on it. Composes async results without raw locks.

Common concurrency bugs

Names interviewers expect you to distinguish.

·Deadlock - two or more threads each wait forever on a resource the other holds. No thread makes progress (see the four Coffman conditions).
·Livelock - threads are active and keep changing state in response to each other but make no useful progress (two people repeatedly stepping aside in a hallway). Often from naive retry/back-off without randomization.
·Starvation - a thread is perpetually denied a resource it needs because others keep being favored (e.g. low-priority threads under greedy high-priority ones, or readers starving a writer).
·Priority inversion - a high-priority thread is blocked on a lock held by a low-priority thread, which a medium-priority thread keeps preempting. Fix with priority inheritance (the holder temporarily inherits the waiter's priority). Famously hit the Mars Pathfinder.
·ABA problem - a CAS sees a value change from A to B and back to A and wrongly concludes nothing happened, so it succeeds on stale assumptions (common with reused pointers in lock-free stacks). Fix with a tagged/versioned pointer (counter bumped on every change) or hazard pointers.
·Lost wakeup / missed signal - a notify fires before the waiter starts waiting, so the signal is lost. Avoid by checking the predicate under the lock and waiting in a while loop.
·Torn read/write - a multi-word value read or written non-atomically is observed half-updated. Use atomics or lock the whole value.
·Double-checked locking done wrong - lazy init that reads a flag without proper memory ordering can publish a partially-constructed object. Use a properly ordered atomic or a language-provided lazy/once primitive.

How B-tree indexes actually work, composite index column order, covering indexes, reading EXPLAIN ANALYZE, why the planner ignores your index, join algorithms, N+1, keyset pagination, and the 'why is this query slow' scenarios interviews are built on.

Practice the patterns

Reading is the floor. The signal in interviews comes from working problems out loud and defending your tradeoffs. Spin up an AI mock interview or run a coding challenge to put these to work.

Coding challenges AI mock interview