Linux Performance & Debugging Cheat Sheet
The USE method, a first-five-minutes triage runbook, and the CPU, memory, disk, network, and tracing commands you reach for when a Linux box is misbehaving.
The USE method
Brendan Gregg's USE method gives you a systematic way to find bottlenecks instead of guessing. For every resource - CPU, memory, disk, network interfaces - check three things. Utilization: the percentage of time the resource was busy. Saturation: the degree of queued or waiting work it could not service immediately (run-queue length, swap activity, I/O wait). Errors: the count of error events (dropped packets, failed I/O, ECC errors). Walk each resource through U, S, and E and the bottleneck usually falls out. High utilization alone is not a problem; high saturation almost always is.
First 5 minutes triage
When you ssh into a sick box, run a fast top-down sweep before diving deep. Start with 'uptime' for load averages, then 'dmesg | tail' for recent kernel messages (OOM kills, disk errors, segfaults). 'vmstat 1' shows run queue, swap, and CPU split over time; 'mpstat -P ALL 1' shows whether load is on one core or spread out; 'pidstat 1' attributes CPU to processes. Then 'iostat -xz 1' for disk pressure, 'free -m' for memory headroom, and 'ss -s' plus 'sar -n DEV 1' for network. The point is breadth first - confirm which resource is the problem, then drill in with the focused tools below.
Performance and debugging commands
Grouped roughly by resource. Run most of these with a '1' interval argument to watch them update per second.
| Command | What it does | Common flags | Example |
|---|---|---|---|
| top | Live view of processes by CPU, plus load average and memory summary. | -o for sort field, -H for threads, press '1' to expand per-core CPU | top -o %MEM |
| htop | Interactive top with per-core bars, tree view, and easy signal sending. | -u user, -p pid, F5 tree, F6 sort | htop -u www-data |
| mpstat | Per-CPU utilization over time - reveals a single saturated core. | -P ALL for all cores, trailing interval | mpstat -P ALL 1 |
| pidstat | Per-process CPU, memory, and I/O sampled over an interval. | -u CPU, -r memory, -d disk, -t threads | pidstat -d 1 |
| vmstat | Run queue, swap in/out, blocks in/out, and CPU breakdown over time. | trailing interval, -s for a totals summary | vmstat 1 |
| free | Memory and swap usage - watch available, not just free. | -m or -h for human units, -s for repeat interval | free -h |
| iostat | Per-device disk throughput, IOPS, and queue depth (%util, await). | -x extended, -z hide idle, -m for MB/s | iostat -xz 1 |
| iotop | Live per-process disk I/O - finds which process is hammering the disk. | -o only active, -a accumulated, needs root | sudo iotop -o |
| df / du | df shows filesystem free space; du measures actual directory sizes. | df -h, df -i for inodes; du -sh, du -h --max-depth=1 | du -sh /var/log/* |
| ss | Socket statistics - connections, listening ports, states. Faster modern netstat. | -tulpn listening TCP/UDP with pids, -s summary | ss -tulpn |
| tcpdump | Captures and inspects packets on the wire for protocol-level debugging. | -i iface, -n no DNS, -w write pcap, port/host filters | tcpdump -ni eth0 port 443 |
| iftop | Live bandwidth usage broken out by connection pair. | -i iface, -n no DNS resolution, needs root | sudo iftop -i eth0 |
| strace | Traces system calls and signals - shows what a process asks the kernel for. | -p attach to pid, -f follow forks, -c count summary, -e filter | strace -fp 1234 |
| ltrace | Traces library (function) calls rather than syscalls. | -p attach, -c summary, -f follow | ltrace -c ./app |
| lsof | Lists open files, sockets, and the processes holding them. | -p pid, -i :port, +D dir, -u user | lsof -i :8080 |
| perf | CPU profiler and tracer - sample stacks to find hot code paths. | top live profile, record + report, stat for counters | perf top |
Reading the load average
The three load-average numbers (from 'uptime' or 'top') are the 1-, 5-, and 15-minute exponentially-weighted averages of the number of processes running plus waiting to run - and, on Linux specifically, processes in uninterruptible sleep (usually blocked on disk or network I/O). Compare against your core count: a load of 4.0 is fully busy on a 4-core box and badly oversubscribed on a 1-core box. The trend across the three numbers matters - 1-minute much higher than 15-minute means load is rising right now; the reverse means it is recovering. Because Linux counts I/O wait in load, a high load with idle CPUs points at disk or network, not compute.
Memory pressure and the OOM killer
Where to look when a box is running out of memory.
- free / available
- In 'free -h', look at the 'available' column, not 'free' - the kernel uses spare RAM for page cache that it can reclaim instantly, so low 'free' is normal and healthy.
- vmstat si / so
- Nonzero 'si' (swap in) and 'so' (swap out) columns mean the system is actively swapping - a strong saturation signal that hurts latency.
- OOM killer
- When memory is exhausted and unreclaimable, the kernel's out-of-memory killer terminates a process to free RAM, choosing by an oom_score heuristic. It logs to dmesg and the kernel log - grep for 'Out of memory' or 'oom-kill'.
- /proc/meminfo
- Authoritative memory breakdown - MemTotal, MemAvailable, Cached, Buffers, SwapTotal/SwapFree, and Slab. Where 'free' gets its numbers.
Reading /proc
The /proc pseudo-filesystem is the kernel's live window into the system, exposed as files. System-wide: '/proc/loadavg' (load), '/proc/stat' (CPU and boot counters), '/proc/meminfo' (memory), '/proc/cpuinfo' (cores and flags), '/proc/mounts', and '/proc/net/' (per-protocol socket tables). Per process under '/proc/<pid>/': 'status' and 'stat' for state and resource use, 'fd/' for open file descriptors, 'maps' for the memory layout, 'cmdline' for the launch arguments, 'environ' for the environment, and 'limits' for ulimits. Many standard tools are just friendly readers over these files.
Triage order and habits
- ·Go top-down: confirm which resource is saturated (CPU, memory, disk, network) before reaching for deep tools like strace or perf.
- ·Check 'dmesg | tail' early - OOM kills, disk read errors, and segfaults show up there and instantly explain a lot of symptoms.
- ·Distinguish utilization from saturation - 100% CPU utilization is fine if the run queue is short; a growing run queue or rising I/O wait is the real warning.
- ·On Linux, high load average with idle CPUs means processes are blocked in uninterruptible I/O sleep - look at disk and network, not compute.
- ·strace and ltrace add real overhead and can slow a hot process - attach with '-p' briefly, use '-c' for a summary, and detach.
- ·When the disk looks full, check inodes too ('df -i') - you can exhaust inodes with many tiny files while bytes-free still looks fine.
- ·Sample over an interval (the trailing '1') rather than trusting a single snapshot - instantaneous values lie.
Other cheat sheets
Big-O Reference
AlgorithmsTime and space complexity for the data structures, sorting algorithms, and search routines that show up in coding interviews. Skim the row, remember the row, defend the row in an interview.
Interview Patterns
PatternsThe recurring shapes - sliding window, two pointers, fast/slow, BFS/DFS, backtracking, DP, divide & conquer, binary search variants, union-find, topological sort. Each entry: when to reach for it, the template, complexity, and which classic problems use it.
Design Tradeoffs
SystemsThe recurring forks in system design interviews. CAP, PACELC, sync vs async, push vs pull, SQL vs NoSQL, sharding shapes, consistency models, cache strategies, idempotency, and rate limiting. For each, the options and when to choose each.
Unix Essentials
ToolsFilesystem layout, the commands you actually use (find / grep / awk / sed / xargs), processes and signals, networking, permissions, basic shell scripting, and a vi survival kit.
SQL Essentials
ToolsQuery clause order, every JOIN type and when to use it, aggregates vs window functions, what indexes actually buy you, transaction isolation levels, and the NULL / WHERE-vs-HAVING / EXISTS-vs-IN gotchas interviewers fish for.
Git Essentials
ToolsThe everyday commands, every undo scenario mapped to its fix, rebase vs merge with a side to pick, interactive rebase, bisect, the reflog safety net, stash, and the flags worth aliasing.
Docker & K8s
ToolsThe docker and kubectl commands you reach for daily, Dockerfile best practices, how layer caching actually works, the core k8s objects in one screen, requests vs limits, liveness vs readiness, and a step-by-step CrashLoopBackOff debug flow.
REST API Design
SystemsMethod semantics and idempotency, the ~15 status codes that matter, resource naming rules, offset vs cursor pagination, versioning and auth tradeoffs, error body conventions, rate-limit headers, and the smells reviewers flag.
STAR Method
PatternsThe STAR structure with timing, what interviewers actually grade, eight question archetypes and how to frame each, the anti-patterns that sink answers (rambling, "we" instead of "I", no metrics), and a 30-second answer skeleton.
Networking
SystemsTCP vs UDP, the TLS and TCP handshakes, HTTP versions, status codes, DNS resolution, the OSI and TCP/IP layer models, and the ports you are expected to know in an interview.
Regex
ToolsAnchors, character classes, quantifiers, groups, alternation, lookarounds, backreferences, and flags - plus practical patterns and the gotchas that trip people up in interviews.
Concurrency
PatternsA fast reference for concurrency primitives, synchronization tradeoffs, the memory model, and the classic bugs that show up in systems interviews and real code.
Distributed Systems
SystemsA reference for the theorems, consistency models, replication and partitioning strategies, delivery guarantees, and resilience patterns that come up in system design interviews.
Practice the patterns
Reading is the floor. The signal in interviews comes from working problems out loud and defending your tradeoffs. Spin up an AI mock interview or run a coding challenge to put these to work.