Tools

Linux Performance & Debugging Cheat Sheet

ToolsFREELast updated: June 2026 · By gitGood Editorial

The USE method, a first-five-minutes triage runbook, and the CPU, memory, disk, network, and tracing commands you reach for when a Linux box is misbehaving.

The USE method

Brendan Gregg's USE method gives you a systematic way to find bottlenecks instead of guessing. For every resource - CPU, memory, disk, network interfaces - check three things. Utilization: the percentage of time the resource was busy. Saturation: the degree of queued or waiting work it could not service immediately (run-queue length, swap activity, I/O wait). Errors: the count of error events (dropped packets, failed I/O, ECC errors). Walk each resource through U, S, and E and the bottleneck usually falls out. High utilization alone is not a problem; high saturation almost always is.

First 5 minutes triage

When you ssh into a sick box, run a fast top-down sweep before diving deep. Start with 'uptime' for load averages, then 'dmesg | tail' for recent kernel messages (OOM kills, disk errors, segfaults). 'vmstat 1' shows run queue, swap, and CPU split over time; 'mpstat -P ALL 1' shows whether load is on one core or spread out; 'pidstat 1' attributes CPU to processes. Then 'iostat -xz 1' for disk pressure, 'free -m' for memory headroom, and 'ss -s' plus 'sar -n DEV 1' for network. The point is breadth first - confirm which resource is the problem, then drill in with the focused tools below.

Performance and debugging commands

Grouped roughly by resource. Run most of these with a '1' interval argument to watch them update per second.

Command	What it does	Common flags	Example
top	Live view of processes by CPU, plus load average and memory summary.	-o for sort field, -H for threads, press '1' to expand per-core CPU	`top -o %MEM`
htop	Interactive top with per-core bars, tree view, and easy signal sending.	-u user, -p pid, F5 tree, F6 sort	`htop -u www-data`
mpstat	Per-CPU utilization over time - reveals a single saturated core.	-P ALL for all cores, trailing interval	`mpstat -P ALL 1`
pidstat	Per-process CPU, memory, and I/O sampled over an interval.	-u CPU, -r memory, -d disk, -t threads	`pidstat -d 1`
vmstat	Run queue, swap in/out, blocks in/out, and CPU breakdown over time.	trailing interval, -s for a totals summary	`vmstat 1`
free	Memory and swap usage - watch available, not just free.	-m or -h for human units, -s for repeat interval	`free -h`
iostat	Per-device disk throughput, IOPS, and queue depth (%util, await).	-x extended, -z hide idle, -m for MB/s	`iostat -xz 1`
iotop	Live per-process disk I/O - finds which process is hammering the disk.	-o only active, -a accumulated, needs root	`sudo iotop -o`
df / du	df shows filesystem free space; du measures actual directory sizes.	df -h, df -i for inodes; du -sh, du -h --max-depth=1	`du -sh /var/log/*`
ss	Socket statistics - connections, listening ports, states. Faster modern netstat.	-tulpn listening TCP/UDP with pids, -s summary	`ss -tulpn`
tcpdump	Captures and inspects packets on the wire for protocol-level debugging.	-i iface, -n no DNS, -w write pcap, port/host filters	`tcpdump -ni eth0 port 443`
iftop	Live bandwidth usage broken out by connection pair.	-i iface, -n no DNS resolution, needs root	`sudo iftop -i eth0`
strace	Traces system calls and signals - shows what a process asks the kernel for.	-p attach to pid, -f follow forks, -c count summary, -e filter	`strace -fp 1234`
ltrace	Traces library (function) calls rather than syscalls.	-p attach, -c summary, -f follow	`ltrace -c ./app`
lsof	Lists open files, sockets, and the processes holding them.	-p pid, -i :port, +D dir, -u user	`lsof -i :8080`
perf	CPU profiler and tracer - sample stacks to find hot code paths.	top live profile, record + report, stat for counters	`perf top`

Reading the load average

The three load-average numbers (from 'uptime' or 'top') are the 1-, 5-, and 15-minute exponentially-weighted averages of the number of processes running plus waiting to run - and, on Linux specifically, processes in uninterruptible sleep (usually blocked on disk or network I/O). Compare against your core count: a load of 4.0 is fully busy on a 4-core box and badly oversubscribed on a 1-core box. The trend across the three numbers matters - 1-minute much higher than 15-minute means load is rising right now; the reverse means it is recovering. Because Linux counts I/O wait in load, a high load with idle CPUs points at disk or network, not compute.

Memory pressure and the OOM killer

Where to look when a box is running out of memory.

free / available: In 'free -h', look at the 'available' column, not 'free' - the kernel uses spare RAM for page cache that it can reclaim instantly, so low 'free' is normal and healthy.
vmstat si / so: Nonzero 'si' (swap in) and 'so' (swap out) columns mean the system is actively swapping - a strong saturation signal that hurts latency.
OOM killer: When memory is exhausted and unreclaimable, the kernel's out-of-memory killer terminates a process to free RAM, choosing by an oom_score heuristic. It logs to dmesg and the kernel log - grep for 'Out of memory' or 'oom-kill'.
/proc/meminfo: Authoritative memory breakdown - MemTotal, MemAvailable, Cached, Buffers, SwapTotal/SwapFree, and Slab. Where 'free' gets its numbers.

Reading /proc

The /proc pseudo-filesystem is the kernel's live window into the system, exposed as files. System-wide: '/proc/loadavg' (load), '/proc/stat' (CPU and boot counters), '/proc/meminfo' (memory), '/proc/cpuinfo' (cores and flags), '/proc/mounts', and '/proc/net/' (per-protocol socket tables). Per process under '/proc/<pid>/': 'status' and 'stat' for state and resource use, 'fd/' for open file descriptors, 'maps' for the memory layout, 'cmdline' for the launch arguments, 'environ' for the environment, and 'limits' for ulimits. Many standard tools are just friendly readers over these files.

Triage order and habits

·Go top-down: confirm which resource is saturated (CPU, memory, disk, network) before reaching for deep tools like strace or perf.
·Check 'dmesg | tail' early - OOM kills, disk read errors, and segfaults show up there and instantly explain a lot of symptoms.
·Distinguish utilization from saturation - 100% CPU utilization is fine if the run queue is short; a growing run queue or rising I/O wait is the real warning.
·On Linux, high load average with idle CPUs means processes are blocked in uninterruptible I/O sleep - look at disk and network, not compute.
·strace and ltrace add real overhead and can slow a hot process - attach with '-p' briefly, use '-c' for a summary, and detach.
·When the disk looks full, check inodes too ('df -i') - you can exhaust inodes with many tiny files while bytes-free still looks fine.
·Sample over an interval (the trailing '1') rather than trusting a single snapshot - instantaneous values lie.

Other cheat sheets

Big-O Reference

Algorithms

Time and space complexity for the data structures, sorting algorithms, and search routines that show up in coding interviews. Skim the row, remember the row, defend the row in an interview.

Interview Patterns

Patterns

The recurring shapes - sliding window, two pointers, fast/slow, BFS/DFS, backtracking, DP, divide & conquer, binary search variants, union-find, topological sort. Each entry: when to reach for it, the template, complexity, and which classic problems use it.

Design Tradeoffs

Systems

The recurring forks in system design interviews. CAP, PACELC, sync vs async, push vs pull, SQL vs NoSQL, sharding shapes, consistency models, cache strategies, idempotency, and rate limiting. For each, the options and when to choose each.

Unix Essentials

Tools

Filesystem layout, the commands you actually use (find / grep / awk / sed / xargs), processes and signals, networking, permissions, basic shell scripting, and a vi survival kit.

SQL Essentials

Tools

Query clause order, every JOIN type and when to use it, aggregates vs window functions, what indexes actually buy you, transaction isolation levels, and the NULL / WHERE-vs-HAVING / EXISTS-vs-IN gotchas interviewers fish for.

Git Essentials

Tools

The everyday commands, every undo scenario mapped to its fix, rebase vs merge with a side to pick, interactive rebase, bisect, the reflog safety net, stash, and the flags worth aliasing.

Docker & K8s

Tools

The docker and kubectl commands you reach for daily, Dockerfile best practices, how layer caching actually works, the core k8s objects in one screen, requests vs limits, liveness vs readiness, and a step-by-step CrashLoopBackOff debug flow.

REST API Design

Systems

Method semantics and idempotency, the ~15 status codes that matter, resource naming rules, offset vs cursor pagination, versioning and auth tradeoffs, error body conventions, rate-limit headers, and the smells reviewers flag.

STAR Method

Patterns

The STAR structure with timing, what interviewers actually grade, eight question archetypes and how to frame each, the anti-patterns that sink answers (rambling, "we" instead of "I", no metrics), and a 30-second answer skeleton.

Networking

Systems

TCP vs UDP, the TLS and TCP handshakes, HTTP versions, status codes, DNS resolution, the OSI and TCP/IP layer models, and the ports you are expected to know in an interview.

Regex

Tools

Anchors, character classes, quantifiers, groups, alternation, lookarounds, backreferences, and flags - plus practical patterns and the gotchas that trip people up in interviews.

Concurrency

Patterns

A fast reference for concurrency primitives, synchronization tradeoffs, the memory model, and the classic bugs that show up in systems interviews and real code.

Distributed Systems

Systems

A reference for the theorems, consistency models, replication and partitioning strategies, delivery guarantees, and resilience patterns that come up in system design interviews.

Kafka & Messaging

Systems

Topics, partitions, and consumer groups, the three delivery semantics and how Kafka actually achieves each, ordering guarantees, rebalancing, retention vs compaction, and a straight Kafka vs SQS vs RabbitMQ vs Kinesis comparison.

GraphQL

Systems

Schema, types, and resolvers, the three operation kinds, the N+1 problem and DataLoader, cursor vs offset pagination, error handling that actually works, security (depth limiting, query cost), and an honest answer to 'when does REST beat GraphQL'.

Terraform & IaC

Tools

State and why it must be remote and locked, the init/plan/apply lifecycle, modules and variables, count vs for_each, workspaces, import and drift, a command table, and the gotchas (prevent_destroy, secrets in state) that mark real production experience.

LLMs & Prompting

Systems

How LLMs work in one paragraph, the knobs (context window, temperature, top-p), system vs user prompts, few-shot and chain-of-thought, RAG and embeddings, the fine-tune-vs-prompt decision, tool calling, eval basics, and the interview questions teams actually ask now.

SQL Query Tuning

Tools

How B-tree indexes actually work, composite index column order, covering indexes, reading EXPLAIN ANALYZE, why the planner ignores your index, join algorithms, N+1, keyset pagination, and the 'why is this query slow' scenarios interviews are built on.

Practice the patterns

Reading is the floor. The signal in interviews comes from working problems out loud and defending your tradeoffs. Spin up an AI mock interview or run a coding challenge to put these to work.

Coding challenges AI mock interview