gitGood.dev
Back to Blog

Top 50 Kubernetes Interview Questions (2026)

P
Patrick Wilson
40 min read

Kubernetes has won. It's the default platform for running containerized workloads, and every DevOps, platform engineering, SRE, and backend engineering interview in 2026 expects you to know it. Not just "I did the tutorial" know it - actually understand how pods get scheduled, how networking works, and what happens when things break at 3am.

These 50 questions are the ones that actually come up in real interviews. Not obscure API trivia. Not "recite the YAML spec." The stuff that separates someone who has operated Kubernetes in production from someone who just read the docs.

If you're also brushing up on containers, check out our Docker commands cheat sheet. And if system design is on the agenda, our system design interview guide pairs well with this.

Let's get into it.


Section 1: Core Concepts (Questions 1-10)

These are table stakes. Every Kubernetes interview starts here. If you stumble on these, the interviewer won't bother going deeper.

1. What is a Pod, and why is it the smallest deployable unit in Kubernetes?

A Pod is one or more containers that share the same network namespace and storage volumes. They always get scheduled together on the same node.

The reason Kubernetes doesn't schedule individual containers is that many applications need tightly coupled helper processes - a logging sidecar, a proxy, a config reloader. Pods let these containers share localhost and the same filesystem mounts without any extra networking.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  containers:
    - name: app
      image: myapp:1.2.0
      ports:
        - containerPort: 8080
    - name: log-shipper
      image: fluentbit:latest
      volumeMounts:
        - name: logs
          mountPath: /var/log/app
  volumes:
    - name: logs
      emptyDir: {}

In practice, you almost never create Pods directly. You create Deployments, StatefulSets, or Jobs that manage Pods for you. If an interviewer asks "when would you create a bare Pod?" - the answer is almost never, except for one-off debugging.

2. What's the difference between a Deployment and a ReplicaSet?

A ReplicaSet ensures that a specified number of pod replicas are running at any given time. A Deployment is a higher-level abstraction that manages ReplicaSets and provides declarative updates, rollback history, and rolling update strategies.

Think of it this way: a ReplicaSet says "keep 3 copies running." A Deployment says "keep 3 copies running, and when I change the image, roll out the new version gradually while keeping the old one around in case I need to roll back."

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
        - name: api
          image: api-server:2.1.0
          ports:
            - containerPort: 8080

When you update the image, the Deployment creates a new ReplicaSet, scales it up, and scales the old one down. The old ReplicaSet sticks around (with 0 replicas) so you can rollback.

The real interview answer: You never create ReplicaSets directly. You create Deployments. The only reason to know about ReplicaSets is to understand what's happening under the hood during rollouts and rollbacks.

3. What is a Namespace and when would you use one?

Namespaces are virtual clusters within a physical cluster. They provide isolation for resource names and are the boundary for resource quotas and RBAC policies.

Common use cases:

  • Environment separation: dev, staging, prod on the same cluster (though separate clusters is safer for prod)
  • Team isolation: team-payments, team-search each get their own namespace
  • Resource quotas: limit CPU/memory consumption per team
kubectl get namespaces
kubectl create namespace team-payments
kubectl get pods -n team-payments

Kubernetes ships with four default namespaces: default, kube-system, kube-public, and kube-node-lease. Don't deploy your apps into kube-system - that's for cluster components.

4. How do labels and selectors work? Why are they important?

Labels are key-value pairs attached to Kubernetes objects. Selectors are queries that filter objects by their labels. Together, they are the glue that connects everything in Kubernetes - Services find Pods through selectors, Deployments manage Pods through selectors, NetworkPolicies target Pods through selectors.

# Pod with labels
metadata:
  labels:
    app: payment-service
    env: production
    version: v2
# Select pods by label
kubectl get pods -l app=payment-service
kubectl get pods -l "env=production,version=v2"
kubectl get pods -l "env in (production, staging)"

The real interview answer: Labels and selectors are Kubernetes' service discovery mechanism. There's no central registry. If a Service selector doesn't match your Pod labels exactly, traffic won't route. This is one of the most common debugging scenarios in production.

5. What happens when you run kubectl apply -f deployment.yaml?

This is a favorite question because it tests whether you understand the Kubernetes control loop. Here's the chain of events:

  1. kubectl sends the YAML to the API Server
  2. The API Server authenticates and authorizes the request
  3. Admission controllers validate and potentially mutate the resource
  4. The resource is persisted to etcd
  5. The Deployment controller creates a ReplicaSet
  6. The ReplicaSet controller creates Pod objects
  7. The Scheduler assigns Pods to nodes
  8. The kubelet on each node pulls images and starts containers

This entire flow is asynchronous. No component directly calls another - they all watch the API Server for changes. This is the "reconciliation loop" that makes Kubernetes self-healing.

6. Explain the Kubernetes control plane components.

The control plane runs the cluster's brain. Key components:

  • kube-apiserver: The front door. Every interaction goes through it - kubectl, controllers, kubelets. Handles authentication, authorization, and admission.
  • etcd: Distributed key-value store holding all cluster state. If etcd dies without a backup, your cluster is gone.
  • kube-scheduler: Assigns unscheduled Pods to nodes based on resource requirements, affinity rules, and taints/tolerations.
  • kube-controller-manager: Runs built-in controllers (Deployment, ReplicaSet, Node, Job). Each reconciles actual state with desired state.
  • cloud-controller-manager: Integrates with cloud APIs for LoadBalancer Services, node lifecycle, and routes.

7. What's the difference between a DaemonSet, a Deployment, and a StatefulSet?

These are the three main workload controllers, each designed for different patterns:

Deployment: Run N identical, stateless replicas. Pods are interchangeable. Use for web servers, APIs, workers.

DaemonSet: Run exactly one Pod on every node (or a subset of nodes). Use for node-level agents - log collectors, monitoring agents, network plugins.

StatefulSet: Run N replicas with stable identities. Each Pod gets a persistent hostname (db-0, db-1, db-2), stable network identity, and ordered startup/shutdown. Use for databases, message queues, anything that needs stable storage and identity.

# DaemonSet - runs on every node
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
        - name: exporter
          image: prom/node-exporter:latest

The real interview answer: "Deployment for stateless, StatefulSet for stateful, DaemonSet for per-node" is the short version. But the follow-up is usually "why can't you just use a Deployment for a database?" The answer: Deployments don't guarantee stable network identities or ordered scaling, which databases need for replication and leader election.

8. What are Init Containers and when would you use them?

Init containers run before the main containers in a Pod start. They run to completion sequentially - if an init container fails, Kubernetes restarts it until it succeeds (subject to the Pod's restart policy).

Common use cases:

  • Wait for a dependency to be ready (database, service)
  • Run database migrations
  • Download configuration or secrets from a vault
  • Set up filesystem permissions
spec:
  initContainers:
    - name: wait-for-db
      image: busybox:1.36
      command: ['sh', '-c', 'until nc -z postgres-service 5432; do sleep 2; done']
    - name: run-migrations
      image: myapp:1.2.0
      command: ['./migrate', '--up']
  containers:
    - name: app
      image: myapp:1.2.0

Init containers are different from sidecar containers. Init containers run to completion before the app starts. Sidecars run alongside the app for the entire Pod lifecycle. Kubernetes 1.28+ introduced native sidecar containers that start before and outlive the main containers.

9. What are taints and tolerations?

Taints are applied to nodes to repel Pods. Tolerations are applied to Pods to allow them to be scheduled on tainted nodes. They work together to control which Pods can run on which nodes.

# Taint a node
kubectl taint nodes node1 dedicated=gpu:NoSchedule

# Check taints on a node
kubectl describe node node1 | grep Taint
# Pod that tolerates the taint
spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "gpu"
      effect: "NoSchedule"
  containers:
    - name: ml-training
      image: training-job:latest

Three taint effects:

  • NoSchedule: New pods won't be scheduled (existing pods stay)
  • PreferNoSchedule: Scheduler tries to avoid, but it's not guaranteed
  • NoExecute: Existing pods get evicted too

Control plane nodes are tainted with node-role.kubernetes.io/control-plane:NoSchedule by default - that's why your workloads don't end up on master nodes.

10. How does the Kubernetes scheduler decide where to place a Pod?

The scheduler runs a two-phase process:

Filtering: Eliminate nodes that can't run the Pod. Checks include:

  • Does the node have enough CPU and memory?
  • Does the node satisfy nodeSelector or nodeAffinity rules?
  • Does the node have the right taints/tolerations?
  • Is the node ready and schedulable?

Scoring: Rank the remaining nodes. Factors include:

  • Spread Pods across zones for high availability
  • Prefer nodes with the image already cached
  • Balance resource utilization across nodes
  • Respect pod anti-affinity preferences
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values: ["us-east-1a", "us-east-1b"]
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: api-server
            topologyKey: kubernetes.io/hostname

That anti-affinity rule spreads api-server Pods across different nodes - great for high availability.


Section 2: Configuration (Questions 11-18)

Configuration management separates amateurs from professionals. Getting this right means your apps are portable, secure, and easy to operate.

11. What's the difference between a ConfigMap and a Secret?

ConfigMaps store non-sensitive configuration data. Secrets store sensitive data like passwords, tokens, and certificates. Both can be consumed as environment variables or mounted as files.

The key difference: Secrets are base64-encoded (not encrypted!) and have slightly different access controls. Kubernetes can also integrate Secrets with external stores like HashiCorp Vault or AWS Secrets Manager.

# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  DATABASE_HOST: "postgres.default.svc.cluster.local"
  LOG_LEVEL: "info"
  config.yaml: |
    server:
      port: 8080
      timeout: 30s
# Secret
apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  username: YWRtaW4=       # base64 of "admin"
  password: cDRzc3cwcmQ=   # base64 of "p4ssw0rd"

The real interview answer: Base64 is not encryption. Anyone with cluster access can decode Secrets. In production, you need encryption at rest (enable etcd encryption) and external secret management (Vault, AWS Secrets Manager, or the External Secrets Operator). This is one of the biggest misconceptions in Kubernetes security.

12. How do you inject configuration into a Pod?

Three main approaches, and interviewers want to see you know all of them:

Environment variables from ConfigMap/Secret:

spec:
  containers:
    - name: app
      env:
        - name: DATABASE_HOST
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: DATABASE_HOST
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: password

Mount as files (volume mount):

spec:
  containers:
    - name: app
      volumeMounts:
        - name: config-volume
          mountPath: /etc/config
  volumes:
    - name: config-volume
      configMap:
        name: app-config

envFrom to load all keys as env vars (not shown for brevity - uses envFrom with configMapRef/secretRef).

Volume-mounted ConfigMaps update automatically (with a delay). Environment variables do not - the Pod needs a restart to pick up changes.

13. How do resource requests and limits work?

Requests define the minimum resources a container needs. Limits define the maximum. The scheduler uses requests to decide where to place Pods. The kubelet enforces limits at runtime.

spec:
  containers:
    - name: app
      resources:
        requests:
          cpu: "250m"       # 0.25 CPU cores
          memory: "256Mi"   # 256 mebibytes
        limits:
          cpu: "1000m"      # 1 CPU core
          memory: "512Mi"   # 512 mebibytes

What happens when limits are exceeded:

  • CPU: The container gets throttled. It still runs, just slower.
  • Memory: The container gets OOM-killed. Kubernetes restarts it.

Important nuances:

  • If you set limits without requests, requests default to limits
  • A Pod in Pending state often means no node has enough resources to satisfy its requests
  • Overcommitting (total requests > node capacity) causes evictions under pressure

The real interview answer: Always set both requests and limits. Many production outages happen because teams forget memory limits, one Pod consumes all node memory, and every other Pod on that node gets evicted. CPU limits are more controversial - some teams prefer no CPU limits to avoid throttling, but always set memory limits.

14. What are Quality of Service (QoS) classes?

Kubernetes assigns a QoS class to each Pod based on its resource configuration. This determines eviction priority when a node runs out of resources.

Guaranteed: Every container has equal requests and limits for both CPU and memory. These Pods get evicted last.

Burstable: At least one container has a request or limit set, but they're not all equal. These get evicted after BestEffort.

BestEffort: No requests or limits set at all. These get evicted first.

# Check a pod's QoS class
kubectl get pod my-pod -o jsonpath='{.status.qosClass}'

For production workloads, aim for Guaranteed or Burstable. Never run BestEffort in production - it's asking for random evictions.

15. What are Probes (liveness, readiness, startup)?

Probes let Kubernetes know whether your application is healthy and ready to serve traffic.

Liveness probe: "Is the process alive?" If this fails, Kubernetes kills and restarts the container. Use for deadlock detection.

Readiness probe: "Can this container handle traffic?" If this fails, the Pod is removed from Service endpoints but not restarted. Use during warm-up or when temporarily overloaded.

Startup probe: "Has the app finished starting?" Disables liveness and readiness checks until the app is initialized. Use for slow-starting applications.

spec:
  containers:
    - name: app
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 15
        periodSeconds: 10
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
      startupProbe:
        httpGet:
          path: /healthz
          port: 8080
        failureThreshold: 30
        periodSeconds: 10

The real interview answer: The most common mistake is making the liveness probe hit the same endpoint as readiness, or making it depend on external services. If your liveness probe calls the database and the database is down, Kubernetes will restart all your Pods - making the outage worse. Liveness probes should check if the process itself is healthy, nothing more.

16. How do you manage environment-specific configuration across dev, staging, and prod?

Several approaches, and interviewers want to hear you know the tradeoffs:

Kustomize overlays (built into kubectl):

base/
  deployment.yaml
  kustomization.yaml
overlays/
  dev/
    kustomization.yaml    # patches for dev
  prod/
    kustomization.yaml    # patches for prod
kubectl apply -k overlays/prod/

Helm values files:

helm install myapp ./chart -f values-prod.yaml

External configuration (ConfigMaps per namespace):

kubectl create configmap app-config \
  --from-env-file=config/prod.env \
  -n production

In practice, most teams in 2026 use either Kustomize or Helm with GitOps (ArgoCD or Flux) to manage environment differences. Raw kubectl apply from a laptop is for development only.

17. What is a PodDisruptionBudget?

A PodDisruptionBudget (PDB) tells Kubernetes how many Pods from a set must remain running during voluntary disruptions - node drains, cluster upgrades, autoscaler scale-downs.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2        # at least 2 pods must stay running
  selector:
    matchLabels:
      app: api-server

Or specify maximum unavailable:

spec:
  maxUnavailable: 1       # at most 1 pod can be down at a time

Without a PDB, a node drain could take down all your replicas simultaneously. PDBs don't protect against involuntary disruptions like node crashes - they only affect controlled operations.

# This will respect PDBs
kubectl drain node1 --ignore-daemonsets --delete-emptydir-data

# Check PDB status
kubectl get pdb

18. What are Kubernetes Jobs and CronJobs?

Jobs run a task to completion. CronJobs run Jobs on a schedule. Unlike Deployments, Jobs are not meant to run forever.

# One-time Job
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  backoffLimit: 3         # retry up to 3 times on failure
  activeDeadlineSeconds: 600  # timeout after 10 minutes
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: migrate
          image: myapp:1.2.0
          command: ["./migrate", "--up"]
# CronJob - runs every day at 2am
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-cleanup
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid   # don't overlap runs
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: cleanup
              image: cleanup-tool:latest

Key settings interviewers ask about:

  • concurrencyPolicy: Allow (default), Forbid (skip if previous still running), Replace (kill previous)
  • backoffLimit: How many retries before marking the Job as failed
  • restartPolicy: Must be Never or OnFailure for Jobs (not Always)

Section 3: Networking (Questions 19-26)

Kubernetes networking trips up a lot of candidates. It's abstract until you've debugged it in production.

19. Explain the four types of Kubernetes Services.

Services provide stable networking for a set of Pods. Four types:

ClusterIP (default): Internal-only IP address. Only reachable from within the cluster. Use for service-to-service communication.

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  type: ClusterIP
  selector:
    app: api-server
  ports:
    - port: 80
      targetPort: 8080

NodePort: Exposes the service on a static port on every node. Range: 30000-32767. Traffic to <NodeIP>:<NodePort> gets forwarded to the Service.

LoadBalancer: Creates an external load balancer (in cloud environments). Gets a public IP. Under the hood, it creates a NodePort and a ClusterIP too.

ExternalName: Maps a Service to a DNS name. No proxying, just a CNAME record. Use for pointing to external services.

# See all services and their types
kubectl get svc -A

The real interview answer: In production, you almost always use ClusterIP for internal services and an Ingress controller (not LoadBalancer per service) for external access. Creating a LoadBalancer Service for every public endpoint gets expensive fast - each one provisions a cloud load balancer.

20. What is an Ingress and how does it differ from a Service?

An Ingress is a layer 7 (HTTP/HTTPS) routing rule. A Service is a layer 4 (TCP/UDP) load balancer. Ingress lets you route traffic based on hostnames and paths to different backend Services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - api.example.com
      secretName: tls-secret
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /v1
            pathType: Prefix
            backend:
              service:
                name: api-v1
                port:
                  number: 80
          - path: /v2
            pathType: Prefix
            backend:
              service:
                name: api-v2
                port:
                  number: 80

Important: Ingress resources do nothing on their own. You need an Ingress Controller (NGINX, Traefik, AWS ALB Controller) running in the cluster. The newer Gateway API is replacing Ingress in many clusters - it's more expressive and supports TCP/UDP routing and traffic splitting out of the box.

21. How does DNS work inside a Kubernetes cluster?

CoreDNS runs as a Deployment in kube-system and provides DNS resolution for all cluster resources.

Every Service gets a DNS entry:

<service-name>.<namespace>.svc.cluster.local

For example:

# From any Pod in the cluster:
curl http://postgres.database.svc.cluster.local:5432

# Short form works within the same namespace:
curl http://postgres:5432

Pods also get DNS entries, though they're less commonly used:

<pod-ip-dashed>.namespace.pod.cluster.local

For StatefulSets, each Pod gets a stable DNS name:

<pod-name>.<headless-service>.<namespace>.svc.cluster.local
# Example: db-0.db-headless.default.svc.cluster.local
# Debug DNS from inside a pod
kubectl run dns-debug --image=busybox:1.36 --rm -it -- nslookup postgres.default.svc.cluster.local

22. What are NetworkPolicies?

NetworkPolicies are firewall rules for Pods. By default, all Pods in a cluster can communicate with all other Pods. NetworkPolicies let you restrict this.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
        - namespaceSelector:
            matchLabels:
              name: monitoring
      ports:
        - port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432
    - to:                      # Allow DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP

This policy says: the api-server can only receive traffic from the frontend app and the monitoring namespace on port 8080, and can only send traffic to postgres on 5432 and DNS.

Critical caveat: NetworkPolicies only work if your CNI plugin supports them. Calico, Cilium, and Weave support them. The default kubenet and some versions of Flannel do not. If your CNI doesn't support them, the resources get created but do nothing.

23. What is a headless Service and when would you use it?

A headless Service is a Service with clusterIP: None. Instead of getting a single virtual IP, DNS resolves to the individual Pod IPs directly.

apiVersion: v1
kind: Service
metadata:
  name: db-headless
spec:
  clusterIP: None
  selector:
    app: database
  ports:
    - port: 5432
# Regular service DNS -> single ClusterIP
nslookup api-service
# Returns: 10.96.0.15

# Headless service DNS -> individual pod IPs
nslookup db-headless
# Returns: 10.244.1.5, 10.244.2.8, 10.244.3.2

Use cases:

  • StatefulSets: Clients need to connect to specific replicas (e.g., Kafka consumers to specific brokers)
  • Client-side load balancing: The application handles routing logic itself
  • Service discovery: Get all Pod IPs for custom health checking

24. How does kube-proxy work?

kube-proxy runs on every node and maintains network rules that allow communication to Pods through Services. It watches the API server for Service and Endpoint changes and updates routing rules.

Three modes: iptables (default, O(n) per service), IPVS (O(1), better at scale), and nftables (newer alternative). Modern CNI plugins like Cilium can replace kube-proxy entirely using eBPF.

25. How do you expose an application externally in production?

The standard production setup in 2026:

  1. Deploy an Ingress Controller (NGINX Ingress, Traefik, or cloud-native like AWS ALB Controller)
  2. Create Ingress resources (or Gateway API routes) for HTTP routing
  3. Use cert-manager for automatic TLS certificate management
  4. Point DNS to the Ingress Controller's load balancer
# Ingress with automatic TLS via cert-manager
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - app.example.com
      secretName: app-tls
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: app-service
                port:
                  number: 80

For TCP/UDP services (databases, gRPC without HTTP), you'd use a LoadBalancer Service or a Gateway API TCPRoute.

26. What is a service mesh and when would you use one?

A service mesh is an infrastructure layer that handles service-to-service communication with features like mutual TLS, traffic management, observability, and retries - without changing application code. It works by injecting a sidecar proxy (usually Envoy) into every Pod.

Popular options in 2026: Istio (feature-rich, complex), Linkerd (lighter, Rust-based proxy), Cilium Service Mesh (eBPF-based, no sidecar).

When you need one: mutual TLS between all services, fine-grained traffic control, distributed tracing, retry/circuit breaker policies.

When you don't: small clusters, when complexity outweighs benefits, when NetworkPolicies and app-level retries suffice.

The real interview answer: Don't say "we should always use a service mesh." Say "it depends on the complexity of the service architecture and the specific problems you're trying to solve." Many teams adopt Istio too early and spend more time debugging the mesh than their actual applications.


Section 4: Storage (Questions 27-32)

Storage in Kubernetes is where stateless simplicity meets the messy reality of data persistence. Interviewers use this section to see if you've dealt with real-world stateful workloads.

27. What's the difference between a PersistentVolume and a PersistentVolumeClaim?

A PersistentVolume (PV) is a piece of storage provisioned in the cluster - an EBS volume, an NFS share, a local disk. It's a cluster-level resource.

A PersistentVolumeClaim (PVC) is a request for storage by a user. It's a namespace-level resource. The PVC specifies how much storage, what access mode, and which StorageClass - and Kubernetes binds it to a matching PV.

Think of PVs as the supply and PVCs as the demand.

# PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp3
  resources:
    requests:
      storage: 50Gi
# Using the PVC in a Pod
spec:
  containers:
    - name: postgres
      volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: postgres-data

28. What are StorageClasses and dynamic provisioning?

A StorageClass defines a "class" of storage - the provisioner, parameters, and reclaim policy. When a PVC references a StorageClass, Kubernetes automatically provisions a PV. No admin intervention needed.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "5000"
  throughput: "250"
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Key settings:

  • provisioner: Which CSI driver creates the volumes
  • reclaimPolicy: Delete (destroy volume when PVC deleted) or Retain (keep volume for manual cleanup)
  • volumeBindingMode: Immediate (provision right away) or WaitForFirstConsumer (wait until a Pod needs it - better for topology-aware scheduling)
  • allowVolumeExpansion: Whether PVCs can be resized
# List storage classes
kubectl get storageclass

# Expand a PVC (if StorageClass allows it)
kubectl patch pvc postgres-data -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'

29. What are access modes and when does each apply?

Access modes define how a volume can be mounted:

  • ReadWriteOnce (RWO): Can be mounted as read-write by a single node. Most common. Used by EBS, Azure Disk, local storage.
  • ReadOnlyMany (ROX): Can be mounted as read-only by many nodes. Good for shared configuration or static assets.
  • ReadWriteMany (RWX): Can be mounted as read-write by many nodes. Required for shared storage. Needs EFS, NFS, CephFS, or similar.
  • ReadWriteOncePod (RWOP): Can be mounted as read-write by a single Pod. Stricter than RWO - ensures only one Pod in the entire cluster can write.
spec:
  accessModes:
    - ReadWriteOnce     # One node can mount read-write

Gotcha: RWO means one node, not one Pod. Multiple Pods on the same node can all mount an RWO volume. If you need single-Pod exclusivity, use RWOP.

30. How do StatefulSets handle storage differently from Deployments?

StatefulSets use volumeClaimTemplates to create a unique PVC for each Pod replica. When you scale a StatefulSet from 3 to 5, it creates 2 new PVCs. When you scale back down to 3, the extra PVCs are kept (not deleted) so the data persists if you scale back up.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

This creates PVCs named data-postgres-0, data-postgres-1, data-postgres-2. Each Pod always gets the same PVC, even after restarts. That's the key difference from Deployments, where Pods are disposable and interchangeable.

31. What are ephemeral volumes?

Ephemeral volumes have the same lifetime as the Pod. When the Pod dies, the data is gone. Three types you should know:

emptyDir: Empty directory created when Pod starts. Shared between containers in the same Pod. Destroyed when Pod is removed.

volumes:
  - name: cache
    emptyDir:
      sizeLimit: 1Gi
  - name: fast-cache
    emptyDir:
      medium: Memory    # backed by RAM (tmpfs)

configMap / secret: Mounts ConfigMap or Secret data as files.

projected: Combines multiple volume sources (ConfigMaps, Secrets, ServiceAccount tokens) into one mount.

Use emptyDir for scratch space, shared data between sidecar containers, and caches. Never use it for data you can't afford to lose.

32. What are CSI drivers and why do they matter?

CSI (Container Storage Interface) is the standard plugin interface for storage in Kubernetes. CSI drivers are out-of-tree plugins that let Kubernetes use any storage system - cloud block storage, network file systems, distributed storage systems.

Common CSI drivers:

  • ebs.csi.aws.com: Amazon EBS volumes
  • efs.csi.aws.com: Amazon EFS (NFS)
  • disk.csi.azure.com: Azure Managed Disks
  • pd.csi.storage.gke.io: Google Persistent Disks
# List CSI drivers in your cluster
kubectl get csidriver

# Check which storage classes use which drivers
kubectl get sc -o wide

Before CSI, storage plugins were compiled into Kubernetes itself ("in-tree"). That meant adding new storage support required changing Kubernetes core. CSI decoupled storage from the Kubernetes release cycle. All in-tree storage plugins are being migrated to CSI, and new storage plugins are CSI-only.

The real interview answer: If someone asks "how would you add storage support for X in Kubernetes?" the answer is always "use the CSI driver for X" or "write a CSI driver." The in-tree provisioner era is over.


Section 5: Security (Questions 33-40)

Security questions have become mandatory in 2026 interviews. With supply chain attacks and cluster compromises making headlines, interviewers need to know you won't leave the front door open.

33. How does RBAC work in Kubernetes?

RBAC (Role-Based Access Control) controls who can do what in the cluster. It has four key resources:

  • Role: Defines permissions within a namespace
  • ClusterRole: Defines permissions cluster-wide
  • RoleBinding: Grants a Role to a user/group/ServiceAccount in a namespace
  • ClusterRoleBinding: Grants a ClusterRole cluster-wide
# Role - can read pods and logs in the "production" namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: production
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list", "watch"]
# RoleBinding - grants the role to a user
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods-binding
  namespace: production
subjects:
  - kind: User
    name: alice@example.com
    apiGroup: rbac.authorization.k8s.io
  - kind: ServiceAccount
    name: monitoring-sa
    namespace: monitoring
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io
# Check what you can do
kubectl auth can-i create deployments --namespace production

# Check what a service account can do
kubectl auth can-i list pods --as system:serviceaccount:default:my-sa

The principle of least privilege applies: give only the permissions needed, scope to the narrowest namespace possible, and audit regularly.

34. What are ServiceAccounts and why do they matter?

ServiceAccounts provide an identity for processes running in Pods. Every Pod runs as a ServiceAccount - if you don't specify one, it uses the default ServiceAccount in its namespace.

# Create a service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-service-account
  namespace: production
  annotations:
    # For IRSA (IAM Roles for Service Accounts) on AWS
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/api-role
# Use it in a Pod
spec:
  serviceAccountName: api-service-account
  automountServiceAccountToken: false  # disable if not needed
  containers:
    - name: app
      image: myapp:1.0

Important security practice: set automountServiceAccountToken: false unless the Pod actually needs to talk to the Kubernetes API. The mounted token can be used to query and potentially modify cluster resources. Many attacks start with a compromised Pod using its default ServiceAccount token.

On cloud providers, ServiceAccounts also enable IAM integration: IRSA/EKS Pod Identity (AWS), Workload Identity (GCP), Workload Identity Federation (Azure).

35. What is a SecurityContext?

SecurityContext defines privilege and access control settings for a Pod or container. It's your first line of defense for container hardening.

spec:
  securityContext:                    # Pod-level
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      securityContext:                # Container-level
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
          add:
            - NET_BIND_SERVICE
      volumeMounts:
        - name: tmp
          mountPath: /tmp
  volumes:
    - name: tmp
      emptyDir: {}

Key settings to know:

  • runAsNonRoot: true - prevent running as root
  • readOnlyRootFilesystem: true - prevent writing to the container filesystem (mount writable volumes for paths that need writes)
  • allowPrivilegeEscalation: false - prevent gaining more privileges than the parent process
  • capabilities.drop: ALL - drop all Linux capabilities, add back only what's needed

The real interview answer: In production, every Pod should have a SecurityContext that drops all capabilities, runs as non-root, prevents privilege escalation, and uses a read-only root filesystem. If an interviewer asks "what's the minimum security configuration you'd set?" - this is it.

36. What are Pod Security Standards (PSS)?

Pod Security Standards replaced the deprecated PodSecurityPolicy. They define three policy levels enforced at the namespace level:

  • Privileged: No restrictions. For system-level workloads.
  • Baseline: Prevents known privilege escalations. Blocks hostNetwork, hostPID, privileged containers.
  • Restricted: Heavily restricted. Requires non-root, drops all capabilities, read-only root filesystem.

Applied via namespace labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Three modes:

  • enforce: Block Pods that violate the policy
  • audit: Log violations but allow the Pod
  • warn: Send warnings to the user but allow the Pod

A common rollout strategy: start with warn on restricted, fix violations, then switch to enforce.

37. How do you manage secrets securely in Kubernetes?

Kubernetes Secrets are base64-encoded, not encrypted. Here's the layered approach for production:

Layer 1 - Enable encryption at rest in etcd via EncryptionConfiguration.

Layer 2 - External secret management via the External Secrets Operator:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: db-credentials
  data:
    - secretKey: password
      remoteRef:
        key: prod/database/password

Layer 3 - RBAC on Secrets:
Restrict who can read Secrets. Default RBAC often gives too much access.

Layer 4 - Audit logging:
Enable audit logging to track who accessed which Secrets.

38. What is a NetworkPolicy for zero-trust networking?

Zero-trust means "deny by default, allow explicitly." In Kubernetes, you achieve this with NetworkPolicies:

# Default deny all ingress and egress in a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}          # selects all pods
  policyTypes:
    - Ingress
    - Egress

Then explicitly allow only the traffic your applications need:

# Allow api -> database on port 5432
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api-server
      ports:
        - port: 5432

Don't forget to allow DNS (port 53 UDP to kube-system) or nothing will resolve. This is the most common gotcha with default-deny policies.

39. How do you scan for vulnerabilities in a Kubernetes environment?

A multi-layer approach:

Container image scanning: Trivy for CVEs in CI pipelines

trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:1.2.0

Cluster scanning: kube-bench (CIS benchmarks), kubescape (misconfigurations)

kubescape scan framework nsa

Runtime security: Falco for anomaly detection, gVisor/Kata for sandboxed execution.

Admission control (prevent bad configs from deploying):

# Kyverno policy - block images without tags
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-image-tag
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-tag
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Images must have a tag that is not 'latest'"
        pattern:
          spec:
            containers:
              - image: "!*:latest"

40. What are Admission Controllers and why are they important for security?

Admission Controllers intercept requests to the API server after authentication and authorization but before the object is persisted. They can validate, mutate, or reject requests.

Two types:

  • Validating: Check if the request meets criteria, reject if not
  • Mutating: Modify the request (inject sidecars, add labels, set defaults)

Built-in examples: LimitRanger (default resource limits), PodSecurity (Pod Security Standards), MutatingAdmissionWebhook / ValidatingAdmissionWebhook (call external services). Custom webhooks are how Istio, cert-manager, and policy engines (Kyverno, OPA Gatekeeper) integrate with Kubernetes.

The real interview answer: Admission controllers are the enforcement point for security policies. Without them, you're relying on developers to always set SecurityContexts correctly, never use latest tags, and always include resource limits. That doesn't scale. Policy-as-code via admission controllers is how mature teams handle this.


Section 6: Operations (Questions 41-46)

This is where interviews shift from "do you know the concepts?" to "have you actually run this in production?"

41. How do rolling updates work?

When you update a Deployment (e.g., change the container image), Kubernetes performs a rolling update by default. It creates a new ReplicaSet, gradually scales it up while scaling the old one down.

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%          # max extra pods during update
      maxUnavailable: 25%    # max pods that can be down

With 4 replicas, maxSurge: 25% and maxUnavailable: 25%:

  • At most 5 Pods exist at once (4 + 25% surge)
  • At least 3 Pods are always available (4 - 25% unavailable)
  • Kubernetes creates 1 new Pod, waits for it to be ready, terminates 1 old Pod, repeat
# Trigger a rolling update
kubectl set image deployment/api-server api=api-server:2.0.0

# Watch the rollout
kubectl rollout status deployment/api-server

# Check rollout history
kubectl rollout history deployment/api-server

Alternative strategy: Recreate - kills all old Pods before creating new ones. Use when the app can't have two versions running simultaneously (e.g., database schema incompatibility).

42. How do you rollback a failed deployment?

Kubernetes keeps a history of ReplicaSets, so you can roll back to any previous revision.

# Rollback to the previous version
kubectl rollout undo deployment/api-server

# Rollback to a specific revision
kubectl rollout history deployment/api-server
kubectl rollout undo deployment/api-server --to-revision=3

# Check the current revision and image
kubectl describe deployment api-server

The number of revisions kept is controlled by spec.revisionHistoryLimit (default: 10). Set this too low and you lose rollback targets.

In practice, most teams in 2026 use GitOps for rollbacks - revert the commit in Git and let ArgoCD or Flux reconcile. This gives you a complete audit trail and doesn't rely on Kubernetes' built-in revision history.

The real interview answer: Know the kubectl commands, but also mention that production rollbacks should be automated. A good setup includes automatic rollback when health checks fail, either through a progressive delivery tool (Argo Rollouts, Flagger) or a CI/CD pipeline that monitors deployment health and reverts on failure.

43. How does the Horizontal Pod Autoscaler (HPA) work?

HPA automatically scales the number of Pod replicas based on observed metrics - CPU utilization, memory, or custom metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 25
          periodSeconds: 120

Key requirements:

  • Pods must have resource requests defined (HPA uses these as the baseline)
  • The Metrics Server must be installed in the cluster
  • For custom metrics, you need a metrics adapter (Prometheus Adapter, Datadog, etc.)

The behavior section controls scaling speed. Scale up fast (60s stabilization) and scale down slow (300s) to avoid flapping.

44. What is VPA and how does it differ from HPA?

HPA (Horizontal Pod Autoscaler) scales the number of replicas. VPA (Vertical Pod Autoscaler) adjusts the CPU and memory requests/limits of existing Pods.

VPA has three modes:

  • Off: Only provides recommendations, doesn't change anything
  • Initial: Sets resources only on Pod creation
  • Auto: Evicts and recreates Pods with updated resources (disruptive)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"        # recommendation only
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "4"
          memory: "8Gi"
# See VPA recommendations
kubectl describe vpa api-vpa

Important: You generally shouldn't use HPA and VPA on the same metric for the same Deployment. They'll fight each other. Common pattern: use HPA for CPU-based scaling and VPA (in recommendation mode) to right-size your resource requests over time.

45. How does the Cluster Autoscaler work?

The Cluster Autoscaler adjusts the number of nodes in your cluster. It:

Scales up when Pods are stuck in Pending because no node has enough resources to schedule them.

Scales down when nodes are underutilized (below a threshold, typically 50%) and all Pods on the node can be moved elsewhere.

# On EKS, configure via node group settings
# On GKE, enable per node pool
# Karpenter (AWS alternative) is increasingly popular

Karpenter (AWS-specific, gaining wide adoption in 2026) takes a different approach - instead of scaling node groups, it provisions individual nodes based on Pod requirements. It's faster and more efficient at bin-packing.

# Karpenter NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5.xlarge", "m6i.large"]
      expireAfter: 720h
  limits:
    cpu: "100"
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized

The real interview answer: Know the difference between the Cluster Autoscaler and Karpenter. CA scales node groups (slow, limited instance type flexibility). Karpenter provisions individual right-sized nodes (fast, flexible). If the company uses EKS, they probably want to hear about Karpenter.

46. What monitoring and observability tools do you use with Kubernetes?

The standard stack in 2026:

Metrics: Prometheus + Grafana

# Install kube-prometheus-stack (Prometheus + Grafana + alerting)
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

Key metrics: Pod CPU/memory vs requests/limits, restart counts, node utilization, API server latency, etcd health.

Logging: Fluentbit/Fluentd shipping to Loki or OpenSearch.

Tracing: OpenTelemetry Collector to Jaeger or Grafana Tempo.

Alerting examples:

# PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pod-alerts
spec:
  groups:
    - name: pod.rules
      rules:
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping"
        - alert: HighMemoryUsage
          expr: |
            container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.9
          for: 5m
          labels:
            severity: warning

The three pillars - metrics, logs, and traces - should correlate. When an alert fires, you should be able to jump from the metric to related logs to the distributed trace. OpenTelemetry is becoming the standard way to achieve this.


Section 7: Advanced Topics (Questions 47-50)

These questions separate senior engineers from everyone else. They test architectural thinking and real-world operational maturity.

47. What are Custom Resources and Operators?

Custom Resource Definitions (CRDs) extend the Kubernetes API with your own resource types. An Operator is a controller that watches those custom resources and takes action - it encodes operational knowledge in code.

Once a CRD is registered, you can create instances of it just like any built-in resource:

apiVersion: example.com/v1
kind: Database
metadata:
  name: orders-db
spec:
  engine: postgres
  version: "16"
  replicas: 3
  storage: 100Gi

The Operator watches for Database resources and handles the complex lifecycle - provisioning instances, configuring replication, performing backups, handling failover. Instead of writing runbooks, you encode operational knowledge into code.

Popular Operators in production:

  • CloudNativePG: PostgreSQL operator
  • Strimzi: Kafka operator
  • Prometheus Operator: Monitoring stack
  • cert-manager: TLS certificate management

The real interview answer: Operators are the answer to "how do we run stateful, complex software on Kubernetes without 3am pager duty?" They automate Day 2 operations - the stuff that happens after initial deployment. If you've built or significantly operated an Operator, definitely mention it. It signals senior-level Kubernetes experience.

48. What is Helm and how does it work?

Helm is the package manager for Kubernetes. It uses charts (packages of templated YAML) to deploy and manage applications.

# Add a chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami

# Search for charts
helm search repo nginx

# Install a chart
helm install my-nginx bitnami/nginx \
  --namespace web \
  --create-namespace \
  --set replicaCount=3 \
  --set service.type=ClusterIP \
  -f custom-values.yaml

# Upgrade a release
helm upgrade my-nginx bitnami/nginx -f custom-values.yaml

# Rollback
helm rollback my-nginx 1

# List releases
helm list -A

A chart contains Chart.yaml (metadata), values.yaml (defaults), and a templates/ directory with Go-templated Kubernetes YAML. Helm renders templates with values to produce final manifests.

Helm vs Kustomize is a common debate. Helm is better for distributing reusable packages. Kustomize is better for simple overlay-based customization. Many teams use both - Helm for third-party software, Kustomize for internal apps.

49. What is GitOps and how does it work with Kubernetes?

GitOps uses Git as the single source of truth for declarative infrastructure and applications. A GitOps operator runs in the cluster, continuously watches a Git repository, and reconciles the cluster state to match what's in Git.

Two popular tools:

ArgoCD:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: api-server
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/k8s-manifests.git
    targetRevision: main
    path: apps/api-server/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true           # delete resources removed from Git
      selfHeal: true        # revert manual cluster changes
    syncOptions:
      - CreateNamespace=true

Flux:

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: app-manifests
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/org/k8s-manifests.git
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: api-server
  namespace: flux-system
spec:
  interval: 5m
  path: ./apps/api-server/production
  sourceRef:
    kind: GitRepository
    name: app-manifests
  prune: true

Why GitOps matters: complete audit trail (every change is a Git commit), easy rollback (git revert), drift detection (manual changes get reverted), and consistent deployments across all environments.

The real interview answer: GitOps isn't just "store YAML in Git." The key principle is that the cluster state is continuously reconciled to match Git - not just deployed from Git. This means manual changes get reverted, drift gets detected, and the Git repo is always the truth. If someone asks "how do you deploy to production?" and you say "I run kubectl from my laptop," that's a red flag.

50. How do you manage multi-cluster Kubernetes?

Multi-cluster is the reality for most organizations in 2026. Reasons include geographic distribution, environment isolation, blast radius reduction, and compliance requirements.

Key patterns:

Federation / multi-cluster management:

  • Cluster API: Declarative cluster lifecycle management - provision, upgrade, and delete clusters
  • Fleet management: Rancher, Google Anthos, or Azure Arc for centralized control

Cross-cluster networking: Submariner, Cilium Cluster Mesh, or Istio multi-cluster for cross-cluster communication. DNS-based routing with weighted records for global load balancing.

Multi-cluster GitOps:

# ArgoCD ApplicationSet - deploy to multiple clusters
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: api-server
spec:
  generators:
    - clusters:
        selector:
          matchLabels:
            env: production
  template:
    metadata:
      name: 'api-server-{{name}}'
    spec:
      source:
        repoURL: https://github.com/org/k8s-manifests.git
        path: apps/api-server/production
      destination:
        server: '{{server}}'
        namespace: production

Configuration management across clusters:

# kubectx - switch between cluster contexts
kubectx production-us-east
kubectx staging-eu-west

# kubectl with explicit context
kubectl get pods --context=production-us-east

Common topologies: hub and spoke (management + workload clusters), active-active (traffic distributed across regions), active-passive (primary with standby failover).

The real interview answer: Multi-cluster is about managing complexity. The technology matters less than having a strategy for: how clusters are provisioned, how workloads deploy consistently, how services discover each other, and how failover works. If you've operated multi-cluster in production, walk through your architecture. That's far more impressive than listing tools.


Wrapping Up

These 50 questions cover the ground that Kubernetes interviews actually test in 2026. A few final tips:

For junior/mid-level roles: Nail sections 1-3. Know the core concepts cold, understand configuration management, and have a working knowledge of networking. That's 80% of what you'll be asked.

For senior/staff roles: Sections 4-7 are where you differentiate yourself. Interviewers want to hear about real production experience - storage trade-offs you've made, security incidents you've handled, scaling challenges you've solved. Tell stories, not definitions.

For all levels:

  • Set up a local cluster (kind or minikube) and actually run these examples
  • Break things on purpose - delete a Pod during a rolling update, exhaust node memory, misconfigure a NetworkPolicy - and learn how to diagnose and fix them
  • Know kubectl deeply: kubectl explain, kubectl get -o yaml, kubectl describe, kubectl logs, kubectl exec

Kubernetes is complex, but the interview questions are predictable. Study these 50 questions, understand the "why" behind each answer, and you'll be well-prepared.

If you're also preparing for other technical interviews, check out our top 50 Python interview questions, top 50 Java interview questions, and system design interview guide. Good luck out there.