Top 50 AWS Interview Questions in 2026 (With Real Answers)

AWS interview loops in 2026 have shifted. The trivia round (name 5 storage classes, recite IAM types) still exists at some shops, but the meat of every senior cloud or platform interview is now architecture judgment under cost pressure. Companies got tired of paying $80K/month for what should have cost $9K, and they want engineers who push back on bad designs.

Here are the 50 questions that actually came up in AWS interviews this cycle, organized from fundamentals to architecture.

Networking and Account Structure (1-10)

1. Walk me through a request from the public internet to a Lambda behind API Gateway.

DNS → CloudFront edge (if used) → API Gateway regional or edge endpoint → request validation, authorizer if configured → integration target (Lambda) → IAM trust check → Lambda execution context → response back through the chain. Watch for cold starts, payload limits (10 MB API Gateway, 6 MB Lambda sync), and 29-second timeout on API Gateway.

2. What's the difference between a public, private, and isolated subnet?

Public subnet: route table has a route to an internet gateway. Resources can be reachable from the internet if they have a public IP.
Private subnet: no route to IGW. Outbound internet via NAT Gateway in a public subnet.
Isolated subnet: no route to IGW or NAT. Used for databases or anything that should never touch the internet, even outbound.

3. NAT Gateway vs NAT Instance - which would you pick in 2026?

NAT Gateway, almost always. It's managed, scales to 45 Gbps, and is the AWS-supported path. NAT Instances are only worth considering for cost-sensitive dev environments, and even then a NAT Gateway in a small VPC is rarely the budget killer.

4. How do you give a private Lambda access to S3 without going over the internet?

A Gateway VPC Endpoint for S3. It's free, and routes S3 traffic over the AWS backbone. For services without gateway endpoints (DynamoDB also has one; most others use Interface Endpoints), use Interface Endpoints (PrivateLink), which cost $0.01/hour per AZ plus data.

5. Explain Transit Gateway vs VPC Peering.

VPC Peering is one-to-one and non-transitive. Fine for two VPCs. Transit Gateway is a hub-and-spoke router that connects many VPCs and on-prem networks (via VPN or Direct Connect). Use TGW once you have more than 3-4 VPCs or any hybrid networking.

6. What is AWS Organizations and why do you care?

Multi-account management. You get consolidated billing, Service Control Policies (preventive guardrails that even root can't bypass), and Organizational Units. In 2026, every serious AWS environment is multi-account: prod, staging, sandbox, security, log archive, at minimum.

7. What's a Service Control Policy?

A policy applied at the OU or account level in AWS Organizations that restricts what IAM principals in that account can do. SCPs do not grant permissions. They only deny. Common pattern: deny *:* on regions you don't operate in.

8. How would you set up cross-account access for a CI/CD role to deploy into prod?

In the prod account, create an IAM role with a trust policy that allows the CI/CD account's role to assume it. In the CI/CD account, attach sts:AssumeRole to the CI principal. Use ExternalId if a third party is involved. Never share long-lived access keys across accounts.

9. What's the difference between an SG (Security Group) and a NACL?

SG: stateful, attached to ENIs, allow rules only.
NACL: stateless, attached to subnets, allow and deny rules, evaluated in order.

Use SGs as your primary firewall. Use NACLs for blunt-force blocks (e.g., block a known bad IP at the subnet edge).

10. Describe a typical VPC layout you'd build for a new web app.

Three AZs. Public subnets for ALB and NAT Gateway. Private subnets for app tier (ECS/EKS/EC2). Isolated subnets for RDS. Single TGW attachment if multi-VPC. CIDR /16 with /20 subnets so you have room to grow. Flow logs enabled to S3 with 7-day retention.

IAM and Security (11-20)

11. What's the difference between IAM Users, Roles, and Federated Identities?

User: long-lived identity with credentials, mostly legacy.
Role: temporary credentials, assumed by services or federated principals. The right answer for almost everything in 2026.
Federated Identity: external IdP (Google, Okta, GitHub OIDC) mapped to a role via SAML or OIDC.

If you're creating IAM Users in 2026, you usually shouldn't be.

12. What is GitHub Actions OIDC and why does everyone use it now?

GitHub OIDC lets a workflow assume an IAM role using a short-lived token instead of stored access keys. You configure the GitHub OIDC provider in IAM and a role with a trust policy scoped to specific repos and branches. No more leaked AWS keys in Actions secrets.

13. Explain the difference between an identity policy and a resource policy.

Identity policies attach to principals (users/roles) and say "this principal can do X." Resource policies attach to resources (S3 buckets, Lambda functions, KMS keys) and say "these principals can do X to me." For cross-account access, you usually need both sides aligned.

14. What does `aws:PrincipalOrgID` do?

A condition key that restricts a resource policy to principals in a specific AWS Organization. Critical for securing S3 buckets that need to be accessible from any account in your org but no others. Without it, "Allow s3:GetObject from any principal in my account list" is a maintenance nightmare.

15. How do you rotate IAM access keys safely?

Don't have them in the first place if you can avoid it. If you must: create a second key, deploy it everywhere the first key is used, verify all callers are using the new key, then deactivate and delete the old one. Automate with AWS Secrets Manager rotation if you're doing this regularly.

16. What's the difference between KMS, Secrets Manager, and Parameter Store?

KMS: cryptographic keys for encryption at rest. Encrypts other things.
Parameter Store: free for standard parameters, holds config. Can hold SecureString backed by KMS.
Secrets Manager: paid ($0.40/secret/month), supports automatic rotation, integrates with RDS/Redshift.

Use Parameter Store for config. Use Secrets Manager for credentials that need rotation.

17. How would you encrypt an S3 bucket and prevent unencrypted uploads?

Set bucket default encryption (SSE-S3 or SSE-KMS) and add a bucket policy that denies s3:PutObject if s3:x-amz-server-side-encryption is missing or wrong. Belt and suspenders.

18. What is `iam:PassRole` and why do interviewers love it?

It's the permission to hand a role to a service. If a developer can PassRole a powerful role to a Lambda they create, they can effectively impersonate that role. Tightly scope iam:PassRole to the exact roles a principal needs to pass.

19. Walk me through your IAM policy for a developer who needs to read but not delete from production S3.

Allow s3:GetObject, s3:ListBucket on arn:aws:s3:::prod-bucket and arn:aws:s3:::prod-bucket/*. Explicit Deny on s3:DeleteObject, s3:PutObject, s3:DeleteBucket. Add an MFA condition if it's truly sensitive.

20. What's the AWS shared responsibility model and what does it actually mean for you?

AWS secures the cloud (hardware, hypervisor, managed service internals). You secure what's in the cloud (IAM, OS patches on EC2, network config, app code, data classification). Junior engineers think "AWS handles security." Senior engineers know exactly which line they're sitting on for each service.

Compute (21-30)

21. EC2 vs ECS vs EKS vs Lambda vs Fargate - how do you pick?

Lambda: event-driven, sub-second compute, no servers to manage. Cap at 15 min execution.
Fargate (with ECS or EKS): containers without managing nodes. Slightly more expensive per vCPU than EC2.
ECS on EC2: containers when you want full control or cheaper steady-state compute.
EKS: Kubernetes. Use it when you already speak Kubernetes or need its ecosystem.
EC2: when nothing else fits (legacy software, GPUs, very specific OS, sub-millisecond boot).

In 2026 the default for new web services is Fargate or Lambda. EC2 is legacy or specialized.

22. What's a spot instance and when would you use one?

Unused EC2 capacity at 60-90% discount. AWS can reclaim it with 2 minutes notice. Use for: batch jobs, CI runners, fault-tolerant workers, anything stateless that can retry. Don't use for: anything that holds state and can't checkpoint.

23. Explain Lambda concurrency, reserved concurrency, and provisioned concurrency.

Concurrency: the number of in-flight invocations across your account (default 1000).
Reserved concurrency: cap a function to N concurrent executions (also serves as a minimum guarantee).
Provisioned concurrency: pre-warm N execution environments to avoid cold starts. Costs money even when idle.

24. What is a Lambda cold start and how do you mitigate it?

When Lambda has to spin up a new execution environment, you eat hundreds of ms (worse with VPC, Java, large packages). Mitigations: provisioned concurrency, smaller packages, faster runtimes (Go, Rust, Node), lazy-load dependencies, avoid VPC if possible, or use SnapStart for Java.

25. Describe how an ALB routes traffic.

Listener (port + protocol) → rules (host header, path, query string, headers) → target group → registered targets. Supports weighted routing, sticky sessions, and HTTP/2 to clients. For gRPC and persistent connections you may want NLB instead.

26. ALB vs NLB - real differences?

ALB: layer 7, HTTP/HTTPS/gRPC, content-based routing, WAF integration.
NLB: layer 4, TCP/UDP/TLS pass-through, static IP per AZ, ultra-low latency, millions of connections per second.

Use ALB for typical web apps. Use NLB for static IPs, non-HTTP, or extreme throughput.

27. How do you do zero-downtime deploys on ECS?

Rolling update with minimumHealthyPercent and maximumPercent set so new tasks come up before old ones drain. Or blue/green via CodeDeploy with an ALB that flips listener rules. Health check grace period long enough for slow-starting containers.

28. EKS - what's the difference between managed node groups, self-managed nodes, and Fargate?

Managed node groups: AWS provisions and manages EC2 nodes for you, but you pay for them and they live in your account.
Self-managed: you run your own ASG of EC2 nodes joined to the cluster. Maximum control.
Fargate: serverless pods, one pod per micro-VM. No node management, slightly more cost per pod-hour.

In 2026 most teams default to Fargate plus a managed node group for workloads Fargate can't run (DaemonSets, GPU, large pods).

29. What is IRSA (IAM Roles for Service Accounts) on EKS?

The mechanism for giving Kubernetes pods specific IAM permissions. The OIDC provider for the cluster maps a Kubernetes service account to an IAM role. Pods using that SA get scoped credentials. Required for any pod that needs AWS API access without bleeding cluster-wide perms.

30. How do you autoscale on AWS in 2026?

EC2: ASG with target tracking on CPU, ALB request count, or custom CloudWatch metric.
ECS Service: target tracking on CPU/memory or step scaling on custom metrics.
EKS: Cluster Autoscaler (or Karpenter, which is faster and more flexible for spot mix) plus HPA on workloads.
Lambda: automatic, just watch concurrency limits.

Karpenter has effectively replaced Cluster Autoscaler at most modern shops.

Storage and Data (31-40)

31. Walk me through S3 storage classes and when you'd use each.

Standard: hot data.
Intelligent-Tiering: don't know access pattern? This is the safe pick.
Standard-IA: rarely accessed, available in milliseconds.
One Zone-IA: same as IA but single AZ, cheaper.
Glacier Instant Retrieval: archive accessed once a quarter or less.
Glacier Flexible Retrieval: minutes to hours to restore.
Glacier Deep Archive: cheapest, 12-hour restore. Compliance archives.

In 2026 the easy default is Intelligent-Tiering, which moves objects automatically.

32. What is S3 strong consistency and when did it ship?

S3 has had strong read-after-write consistency for all operations since 2020. Old interview answers about "eventual consistency for overwrites" are wrong. Don't say it.

33. RDS vs Aurora vs DynamoDB vs Redshift - one sentence each.

RDS: managed PostgreSQL/MySQL/MariaDB/Oracle/SQL Server, EBS-backed.
Aurora: AWS's reimplementation of PostgreSQL/MySQL with a distributed storage layer, faster and more durable.
DynamoDB: managed NoSQL key-value/document, single-digit ms at any scale, pay-per-request or provisioned.
Redshift: columnar data warehouse for analytics across petabytes.

34. When would you pick DynamoDB over Aurora?

When you have known access patterns, need predictable single-digit millisecond latency at any scale, and don't need ad-hoc joins or aggregations. DynamoDB demands you design your schema around your queries, not the other way around.

35. Explain DynamoDB partition keys and hot partitions.

Items are sharded by hash of partition key. Uneven access (e.g., one tenant taking 90% of traffic) creates a hot partition that throttles. Mitigations: pick high-cardinality partition keys, write-shard with random suffixes, use adaptive capacity (mostly automatic now).

36. What's a DynamoDB GSI and how is it different from an LSI?

GSI (Global Secondary Index): different partition and sort key, eventually consistent, can be created any time.
LSI (Local Secondary Index): same partition key, different sort key, strongly consistent, must be created at table creation.

Modern designs use GSIs almost exclusively because LSIs lock you in.

37. How do you back up an RDS database without downtime?

Automated backups (point-in-time recovery up to 35 days) + manual snapshots for longer retention or cross-region copies. Aurora supports continuous backup to S3 with 1-second granularity. Test the restore - a backup you've never restored isn't a backup.

38. What is S3 Object Lock and when would you use it?

WORM (write-once-read-many) at the object level. Two modes: Governance (privileged users can override) and Compliance (nobody, not even root, can delete until retention expires). Required for some regulatory regimes (SEC 17a-4, financial). Also useful as a ransomware safety net.

39. Explain S3 Lifecycle policies.

Rules that automatically transition objects between storage classes or expire them. Common pattern: Standard → Standard-IA at 30 days → Glacier at 90 days → expire at 7 years. Saves real money on log archives and large object stores.

40. How do you stream data from RDS to a data warehouse in 2026?

DMS for ongoing replication, or Aurora's zero-ETL integration to Redshift/Snowflake (announced in 2023, mature in 2026). For Postgres, logical replication via DMS or Debezium-on-MSK is the durable answer for analytics CDC.

Cost, Observability, and Architecture Judgment (41-50)

41. A team's monthly bill jumped from $20K to $80K. How do you investigate?

Cost Explorer with grouping by Service, Linked Account, and Usage Type. Usually one of: NAT Gateway data transfer, S3 data transfer, EBS gp2/io1 over-provisioned, Aurora I/O, an unbounded retry loop hammering DynamoDB, or someone left a c5.24xlarge on. Cost Anomaly Detection should have caught this; if it didn't, set it up.

42. What is a Savings Plan vs a Reserved Instance?

Savings Plans: commit to a $/hour spend on compute for 1 or 3 years. Apply automatically across EC2, Fargate, Lambda. The flexible default.
Reserved Instances: commit to specific instance family/size in a region. Stricter, more discount.

In 2026 Savings Plans cover most use cases. RIs are mostly for legacy or specific RDS reservations.

43. NAT Gateway is expensive. How do you reduce its cost?

VPC Endpoints for S3, DynamoDB, ECR, KMS, Secrets Manager - the big offenders.
Consolidate egress through fewer NAT Gateways across AZs (accept the AZ-failure tradeoff in non-critical envs).
Move chatty workloads (large container pulls) into the same region/AZ as the source.

VPC Endpoints often cut NAT cost by 70%+.

44. What metrics would you put on a dashboard for a Lambda-backed API?

Invocations, Errors, Throttles, Duration p50/p95/p99 from CloudWatch.
API Gateway 4xx, 5xx, IntegrationLatency, Latency.
Cold start ratio (custom metric or CloudWatch Logs Insights query).
Concurrent executions vs account/regional limit.
Downstream dependency error rate (DynamoDB throttles, RDS errors).

45. Explain CloudWatch Logs vs CloudTrail vs Config.

CloudWatch Logs: app and system logs.
CloudTrail: API call audit log (who did what to AWS).
Config: state of resources over time and compliance rules.

Stack all three for a serious environment. CloudTrail and Config logs must be locked down - they're forensic evidence.

46. Describe a multi-region strategy.

Three patterns:

Active/passive with DNS failover via Route 53 health checks. Cheaper, RTO in minutes.
Active/active with global services (Aurora Global Database, DynamoDB Global Tables, S3 CRR, Route 53 latency-based routing). Sub-second RTO, expensive.
Pilot light: scaled-down replica that scales up on disaster.

Pick based on RTO/RPO and budget. Most companies don't actually need active/active and discover this after spending six months building it.

47. What is AWS WAF and when do you use it?

A managed Web Application Firewall in front of CloudFront, ALB, or API Gateway. Rules for OWASP top 10, rate limiting, geo blocks, bot control. The Managed Rule Groups (AWS, F5, Fortinet) are the easy default. Always pair with rate-based rules for L7 DoS.

48. How do you handle secrets in a Lambda function?

Environment variables for non-secret config. Parameter Store SecureString or Secrets Manager for actual secrets, fetched at cold start (cache in module scope). Encrypt env vars with a customer KMS key if the secret can't be moved out. Never hardcode.

49. Walk me through a serverless architecture for a webhook ingestion pipeline at 10K req/sec.

API Gateway → Lambda authorizer → Kinesis Data Streams (or SQS if order doesn't matter at the partition level) → Lambda consumer → DynamoDB. Dead-letter queue on the consumer. Buffer in Kinesis to absorb bursts. CloudWatch alarms on iterator age. The throughput target shapes the partition count.

50. You're given a system bill of $200K/month and asked to cut it 40%. What's your playbook?

Tag audit. If resources aren't tagged, you can't attribute or kill them.
Cost Explorer by service, then drill. The top three line items are 80% of the bill.
Right-size compute (Compute Optimizer recommendations).
Savings Plans for steady-state compute.
Storage tiering: S3 Lifecycle, EBS gp3, RDS reserved I/O.
Data transfer audit. NAT Gateway, cross-AZ, internet egress.
Idle resource sweep: unattached EBS, idle ELBs, dev environments running 24/7.
Refactor hot spots: Aurora I/O, DynamoDB on-demand vs provisioned.

Communicate cuts as a roadmap, not a one-week sprint. The first 25% is easy. The last 15% requires architecture changes.

Final Note

The 2026 AWS interview filters for engineers who can read a bill, push back on architecture that's wrong but expensive, and articulate the security model without hand-waving. Memorize the trivia, but spend the most time on the questions in sections 4 and 5. That's where loops decide between mid and senior.

Good luck. Tag us on the offer letter.