gitGood.dev
Tools

Linux Performance & Debugging Cheat Sheet

ToolsFREELast updated: June 2026 · By gitGood Editorial

The USE method, a first-five-minutes triage runbook, and the CPU, memory, disk, network, and tracing commands you reach for when a Linux box is misbehaving.

The USE method

Brendan Gregg's USE method gives you a systematic way to find bottlenecks instead of guessing. For every resource - CPU, memory, disk, network interfaces - check three things. Utilization: the percentage of time the resource was busy. Saturation: the degree of queued or waiting work it could not service immediately (run-queue length, swap activity, I/O wait). Errors: the count of error events (dropped packets, failed I/O, ECC errors). Walk each resource through U, S, and E and the bottleneck usually falls out. High utilization alone is not a problem; high saturation almost always is.

First 5 minutes triage

When you ssh into a sick box, run a fast top-down sweep before diving deep. Start with 'uptime' for load averages, then 'dmesg | tail' for recent kernel messages (OOM kills, disk errors, segfaults). 'vmstat 1' shows run queue, swap, and CPU split over time; 'mpstat -P ALL 1' shows whether load is on one core or spread out; 'pidstat 1' attributes CPU to processes. Then 'iostat -xz 1' for disk pressure, 'free -m' for memory headroom, and 'ss -s' plus 'sar -n DEV 1' for network. The point is breadth first - confirm which resource is the problem, then drill in with the focused tools below.

Performance and debugging commands

Grouped roughly by resource. Run most of these with a '1' interval argument to watch them update per second.

CommandWhat it doesCommon flagsExample
topLive view of processes by CPU, plus load average and memory summary.-o for sort field, -H for threads, press '1' to expand per-core CPUtop -o %MEM
htopInteractive top with per-core bars, tree view, and easy signal sending.-u user, -p pid, F5 tree, F6 sorthtop -u www-data
mpstatPer-CPU utilization over time - reveals a single saturated core.-P ALL for all cores, trailing intervalmpstat -P ALL 1
pidstatPer-process CPU, memory, and I/O sampled over an interval.-u CPU, -r memory, -d disk, -t threadspidstat -d 1
vmstatRun queue, swap in/out, blocks in/out, and CPU breakdown over time.trailing interval, -s for a totals summaryvmstat 1
freeMemory and swap usage - watch available, not just free.-m or -h for human units, -s for repeat intervalfree -h
iostatPer-device disk throughput, IOPS, and queue depth (%util, await).-x extended, -z hide idle, -m for MB/siostat -xz 1
iotopLive per-process disk I/O - finds which process is hammering the disk.-o only active, -a accumulated, needs rootsudo iotop -o
df / dudf shows filesystem free space; du measures actual directory sizes.df -h, df -i for inodes; du -sh, du -h --max-depth=1du -sh /var/log/*
ssSocket statistics - connections, listening ports, states. Faster modern netstat.-tulpn listening TCP/UDP with pids, -s summaryss -tulpn
tcpdumpCaptures and inspects packets on the wire for protocol-level debugging.-i iface, -n no DNS, -w write pcap, port/host filterstcpdump -ni eth0 port 443
iftopLive bandwidth usage broken out by connection pair.-i iface, -n no DNS resolution, needs rootsudo iftop -i eth0
straceTraces system calls and signals - shows what a process asks the kernel for.-p attach to pid, -f follow forks, -c count summary, -e filterstrace -fp 1234
ltraceTraces library (function) calls rather than syscalls.-p attach, -c summary, -f followltrace -c ./app
lsofLists open files, sockets, and the processes holding them.-p pid, -i :port, +D dir, -u userlsof -i :8080
perfCPU profiler and tracer - sample stacks to find hot code paths.top live profile, record + report, stat for countersperf top

Reading the load average

The three load-average numbers (from 'uptime' or 'top') are the 1-, 5-, and 15-minute exponentially-weighted averages of the number of processes running plus waiting to run - and, on Linux specifically, processes in uninterruptible sleep (usually blocked on disk or network I/O). Compare against your core count: a load of 4.0 is fully busy on a 4-core box and badly oversubscribed on a 1-core box. The trend across the three numbers matters - 1-minute much higher than 15-minute means load is rising right now; the reverse means it is recovering. Because Linux counts I/O wait in load, a high load with idle CPUs points at disk or network, not compute.

Memory pressure and the OOM killer

Where to look when a box is running out of memory.

free / available
In 'free -h', look at the 'available' column, not 'free' - the kernel uses spare RAM for page cache that it can reclaim instantly, so low 'free' is normal and healthy.
vmstat si / so
Nonzero 'si' (swap in) and 'so' (swap out) columns mean the system is actively swapping - a strong saturation signal that hurts latency.
OOM killer
When memory is exhausted and unreclaimable, the kernel's out-of-memory killer terminates a process to free RAM, choosing by an oom_score heuristic. It logs to dmesg and the kernel log - grep for 'Out of memory' or 'oom-kill'.
/proc/meminfo
Authoritative memory breakdown - MemTotal, MemAvailable, Cached, Buffers, SwapTotal/SwapFree, and Slab. Where 'free' gets its numbers.

Reading /proc

The /proc pseudo-filesystem is the kernel's live window into the system, exposed as files. System-wide: '/proc/loadavg' (load), '/proc/stat' (CPU and boot counters), '/proc/meminfo' (memory), '/proc/cpuinfo' (cores and flags), '/proc/mounts', and '/proc/net/' (per-protocol socket tables). Per process under '/proc/<pid>/': 'status' and 'stat' for state and resource use, 'fd/' for open file descriptors, 'maps' for the memory layout, 'cmdline' for the launch arguments, 'environ' for the environment, and 'limits' for ulimits. Many standard tools are just friendly readers over these files.

Triage order and habits

  • ·Go top-down: confirm which resource is saturated (CPU, memory, disk, network) before reaching for deep tools like strace or perf.
  • ·Check 'dmesg | tail' early - OOM kills, disk read errors, and segfaults show up there and instantly explain a lot of symptoms.
  • ·Distinguish utilization from saturation - 100% CPU utilization is fine if the run queue is short; a growing run queue or rising I/O wait is the real warning.
  • ·On Linux, high load average with idle CPUs means processes are blocked in uninterruptible I/O sleep - look at disk and network, not compute.
  • ·strace and ltrace add real overhead and can slow a hot process - attach with '-p' briefly, use '-c' for a summary, and detach.
  • ·When the disk looks full, check inodes too ('df -i') - you can exhaust inodes with many tiny files while bytes-free still looks fine.
  • ·Sample over an interval (the trailing '1') rather than trusting a single snapshot - instantaneous values lie.

Other cheat sheets

Big-O Reference

Algorithms

Time and space complexity for the data structures, sorting algorithms, and search routines that show up in coding interviews. Skim the row, remember the row, defend the row in an interview.

Interview Patterns

Patterns

The recurring shapes - sliding window, two pointers, fast/slow, BFS/DFS, backtracking, DP, divide & conquer, binary search variants, union-find, topological sort. Each entry: when to reach for it, the template, complexity, and which classic problems use it.

Design Tradeoffs

Systems

The recurring forks in system design interviews. CAP, PACELC, sync vs async, push vs pull, SQL vs NoSQL, sharding shapes, consistency models, cache strategies, idempotency, and rate limiting. For each, the options and when to choose each.

Unix Essentials

Tools

Filesystem layout, the commands you actually use (find / grep / awk / sed / xargs), processes and signals, networking, permissions, basic shell scripting, and a vi survival kit.

SQL Essentials

Tools

Query clause order, every JOIN type and when to use it, aggregates vs window functions, what indexes actually buy you, transaction isolation levels, and the NULL / WHERE-vs-HAVING / EXISTS-vs-IN gotchas interviewers fish for.

Git Essentials

Tools

The everyday commands, every undo scenario mapped to its fix, rebase vs merge with a side to pick, interactive rebase, bisect, the reflog safety net, stash, and the flags worth aliasing.

Docker & K8s

Tools

The docker and kubectl commands you reach for daily, Dockerfile best practices, how layer caching actually works, the core k8s objects in one screen, requests vs limits, liveness vs readiness, and a step-by-step CrashLoopBackOff debug flow.

REST API Design

Systems

Method semantics and idempotency, the ~15 status codes that matter, resource naming rules, offset vs cursor pagination, versioning and auth tradeoffs, error body conventions, rate-limit headers, and the smells reviewers flag.

STAR Method

Patterns

The STAR structure with timing, what interviewers actually grade, eight question archetypes and how to frame each, the anti-patterns that sink answers (rambling, "we" instead of "I", no metrics), and a 30-second answer skeleton.

Networking

Systems

TCP vs UDP, the TLS and TCP handshakes, HTTP versions, status codes, DNS resolution, the OSI and TCP/IP layer models, and the ports you are expected to know in an interview.

Regex

Tools

Anchors, character classes, quantifiers, groups, alternation, lookarounds, backreferences, and flags - plus practical patterns and the gotchas that trip people up in interviews.

Concurrency

Patterns

A fast reference for concurrency primitives, synchronization tradeoffs, the memory model, and the classic bugs that show up in systems interviews and real code.

Distributed Systems

Systems

A reference for the theorems, consistency models, replication and partitioning strategies, delivery guarantees, and resilience patterns that come up in system design interviews.

Practice the patterns

Reading is the floor. The signal in interviews comes from working problems out loud and defending your tradeoffs. Spin up an AI mock interview or run a coding challenge to put these to work.