gitGood.dev
Back to Blog

Top 50 Terraform Interview Questions in 2026 (With Real Answers)

P
Pat
13 min read

Top 50 Terraform Interview Questions in 2026 (With Real Answers)

Terraform interviews used to be "have you written some HCL?" Those days are gone. By 2026, every platform team expects you to reason about state corruption, multi-account topologies, drift detection, the OpenTofu split, and how to keep a team of 30 engineers from stepping on each other's plans.

These are the 50 questions that come up most often in platform, SRE, and DevOps loops in 2026.


Fundamentals (1-10)

1. What is Terraform and what problem does it solve?

A declarative infrastructure-as-code tool. You describe the desired end state in HCL, Terraform figures out the diff between that and reality, and applies the changes. It replaces clicking through cloud consoles or writing imperative scripts that drift from documentation the day after they ship.

2. HCL vs JSON for Terraform - which do you use?

HCL. It's the human-friendly format with comments, heredocs, expressions, and loops. JSON is supported for machine-generated configs but nobody hand-writes it.

3. What's the Terraform workflow?

init -> plan -> apply -> (changes) -> plan -> apply -> ... -> destroy

init downloads providers and modules. plan shows the diff. apply makes the change. destroy tears it all down.

4. What is a Terraform provider?

A plugin that knows how to talk to a specific API - AWS, Azure, GCP, Cloudflare, GitHub, Datadog, anything with a REST API. Providers translate HCL resources into API calls.

5. What is the Terraform state file and why does it matter?

A JSON file mapping resources in your config to real-world IDs and metadata. It's the source of truth Terraform compares against during plan. Without state, Terraform has no idea what already exists. Lose state and you can end up duplicating resources or being unable to manage existing ones.

6. Local state vs remote state - which is acceptable in production?

Remote, always. Local state only works for solo learning. Remote state (S3 + DynamoDB lock, Terraform Cloud, GCS, Azure Blob) gives you team access, locking, and durability.

7. What is state locking and why is it required?

When two engineers run apply simultaneously, both could modify state and the second write wins, corrupting it. State locking takes an exclusive lock during operations. DynamoDB is the AWS standard backend lock store.

8. Explain the difference between resource, data, and module blocks.

  • resource - something Terraform creates and manages (an EC2 instance, an S3 bucket)
  • data - something Terraform reads but doesn't manage (an existing VPC, current AWS account ID)
  • module - a reusable bundle of resources you call with inputs

9. What's the difference between terraform plan and terraform apply?

plan is a dry-run that shows what will change. apply actually makes the change. In CI you run plan -out=tfplan then apply tfplan so the apply is exactly what was reviewed.

10. How do you handle secrets in Terraform?

Never hardcode them in .tf files. Use environment variables, AWS Secrets Manager, HashiCorp Vault, or sops-encrypted variable files. Mark sensitive outputs with sensitive = true so they don't leak in plan output.


State Management (11-20)

11. What does the S3 + DynamoDB backend setup look like?

terraform {
  backend "s3" {
    bucket         = "my-tf-state"
    key            = "prod/network.tfstate"
    region         = "us-east-1"
    dynamodb_table = "tf-locks"
    encrypt        = true
  }
}

S3 stores the state, DynamoDB holds the lock. Always enable versioning on the bucket so you can roll back from a corrupt state.

12. State got corrupted - how do you recover?

S3 versioning saves you. Check aws s3api list-object-versions for the bucket, find the last good version, restore it. If versioning wasn't enabled, hope you have a manual backup. If you don't, you're rebuilding state from scratch with terraform import.

13. What is terraform import and when do you use it?

It binds an existing real-world resource to a Terraform resource block. Use it when adopting Terraform on infra that was created manually, or after a state corruption recovery. As of Terraform 1.5+, you can use the import block instead of the CLI command - it's reviewable in PRs.

14. Explain the terraform state subcommands you actually use.

  • state list - show all resources in state
  • state show <addr> - inspect one resource
  • state mv <src> <dst> - rename or move within modules without destroying
  • state rm <addr> - drop from state without destroying the real resource
  • state pull / push - read or write the raw state JSON

15. What's drift detection?

When real infrastructure diverges from the state - someone clicked in the console, or another tool modified resources. terraform plan surfaces drift. Detect drift on a schedule (CI cron) so you catch it before someone tries to apply unrelated changes.

16. How do you split a giant state file?

Two options. (1) Use terraform state mv to migrate resources between separate state files - tedious but precise. (2) Use the moved block (Terraform 1.1+) for refactoring within the same state. The big-state-split is usually a sign you need separate root modules per environment or service.

17. What is workspaces and when shouldn't you use it?

terraform workspace creates parallel state files within one backend key. Useful for ephemeral environments. Anti-pattern for prod/staging/dev separation - those should be different state files (or different backends entirely) so a misconfigured workspace can't blow up prod.

18. Explain prevent_destroy.

A lifecycle meta-argument:

resource "aws_db_instance" "prod" {
  lifecycle {
    prevent_destroy = true
  }
}

Terraform refuses to destroy resources marked this way. Use it on databases, S3 buckets with customer data, anything irrecoverable.

19. What's a partial backend configuration?

You leave backend settings out of HCL and pass them at init time:

terraform init -backend-config=backend-prod.hcl

Lets you reuse the same code across environments without hardcoding bucket names per environment.

20. State file is checked into git. What do you do?

Rotate every secret in it (state files often contain plaintext outputs). Migrate to a remote backend immediately. Audit git history for everyone who could have read it. Add *.tfstate* to .gitignore. Then have a conversation with the team about backend setup.


Modules and Composition (21-30)

21. What is a Terraform module?

A directory of .tf files. The root module is your entry point. Child modules are reusable - call them with module "name" { source = "..." }. Modules are how teams scale Terraform without duplication.

22. Local module vs registry module - when do you use each?

Local (source = "./modules/vpc") - for project-internal patterns. Registry (source = "terraform-aws-modules/vpc/aws") - for community-vetted, well-documented building blocks. The community VPC module is so good most teams just use it instead of rolling their own.

23. How do you version modules?

Pin them. For registry modules: version = "~> 5.0". For Git modules: source = "git::https://github.com/org/mod.git?ref=v1.2.3". Never use main or master - your infrastructure shouldn't change because someone merged a PR upstream.

24. What goes in a good module's interface?

Required inputs (no defaults), optional inputs (sensible defaults), and outputs other modules need. Keep the surface small. A module with 80 inputs is a sign it's doing too much - split it.

25. Explain for_each vs count.

  • count - integer, indexed by number. Removing item 0 shifts all others.
  • for_each - map or set, indexed by key. Stable across additions and deletions.

Default to for_each. count is fine for "create N identical things" but breaks badly when you need to remove one.

26. What's a dynamic block?

Generates nested config blocks dynamically:

dynamic "ingress" {
  for_each = var.ingress_rules
  content {
    from_port = ingress.value.from
    to_port   = ingress.value.to
    cidr_blocks = ingress.value.cidrs
  }
}

Use sparingly - they make code harder to read. Only when the alternative is repetitive copy-paste.

27. How do you compose modules - root module orchestrates children, or modules call modules?

Both work. Most teams use a root module that calls child modules in a flat structure. Modules calling modules can work but adds layers of indirection. Keep it as flat as your team can tolerate.

28. Where do environment-specific values live?

In tfvars files: prod.tfvars, staging.tfvars. Pass with -var-file=prod.tfvars. Or use Terragrunt for DRY across environments. Never branch on environment inside module logic - that's a smell.

29. What is Terragrunt and when does it help?

A wrapper that adds DRY for backends, providers, and inputs across environments. Useful when you have many environments with similar but not identical infrastructure. Adds another layer to learn - if you have one prod and one staging, plain Terraform is simpler.

30. How do you handle module testing?

Three levels. (1) terraform validate for syntax. (2) terraform plan in CI for diff review. (3) Terratest (Go) or terraform test (built-in since 1.6) for end-to-end - actually apply, assert behavior, destroy.


Advanced Patterns (31-40)

31. Explain provider aliases.

Multiple configurations of the same provider - e.g., AWS in two regions:

provider "aws" { region = "us-east-1" }
provider "aws" {
  alias  = "west"
  region = "us-west-2"
}

resource "aws_s3_bucket" "logs" {
  provider = aws.west
}

Required for cross-region or cross-account work in one root module.

32. What are depends_on and when do you use it?

Explicit dependency hint when Terraform can't infer one from references. Most common case: an IAM policy must exist before a Lambda that assumes it, but the Lambda doesn't directly reference the policy resource. 90% of depends_on usage is unnecessary - check for missing references first.

33. How does Terraform handle implicit vs explicit dependencies?

Implicit: any reference to another resource's attribute creates a dependency. Explicit: depends_on = [resource.thing]. Implicit is preferred - it's automatic and you can't forget.

34. What's a null_resource and why do people use it?

A resource that does nothing on its own but accepts triggers and provisioners. Used to run scripts as part of a Terraform run. Often a smell - if you need to run a script, consider whether it should be a separate provider, a data source, or moved out of Terraform entirely.

35. Local-exec vs remote-exec provisioners - what's the catch?

Both are last-resort tools. They run scripts locally or via SSH after resource creation. Failure modes are ugly - if the provisioner fails, the resource is "tainted" and Terraform tries to recreate it, often making things worse. Prefer cloud-init, Ansible, or a real config management layer.

36. How do you write a custom provider?

Use the Terraform Plugin Framework (Go SDK). Define schemas for resources and data sources, implement Create/Read/Update/Delete. Most teams never need to do this - use existing providers.

37. What is sensitive = true?

A variable or output marker that suppresses the value in CLI output and logs. The value is still in the state file in plaintext - sensitive doesn't mean encrypted. Don't rely on it for secrets handling, only for noise reduction.

38. Explain ignore_changes.

A lifecycle rule telling Terraform to stop reconciling a specific attribute:

lifecycle {
  ignore_changes = [tags["LastModified"]]
}

Use when something else (a Lambda, a CI pipeline) is allowed to mutate that attribute. Overuse hides drift.

39. What's create_before_destroy?

Default replacement order is destroy-then-create, which causes downtime. create_before_destroy = true reverses it - new resource comes up, then old one is destroyed. Required for zero-downtime ALB target group changes, RDS read replicas, etc.

40. Explain refresh-only mode.

terraform plan -refresh-only updates state to match reality without changing infrastructure. Useful for syncing state after manual changes you've decided to accept, or when investigating drift.


Production, Security, and Tooling (41-50)

41. What is OpenTofu and why does it exist?

A community fork of Terraform 1.5, created in August 2023 after HashiCorp moved Terraform from MPL to BSL. OpenTofu remains MPL-licensed and is governed by the Linux Foundation. By 2026 it's at feature parity for most use cases and is the default choice for teams that want to avoid the BSL license.

42. Should you migrate from Terraform to OpenTofu?

Depends. If you're using HCP Terraform / Terraform Cloud, no. If you're a small team on the OSS CLI and the BSL license is a concern, the migration is terraform -> tofu and is mostly painless. State file format is compatible.

43. How do you scan Terraform for security issues?

tfsec, checkov, or tflint. Run them in CI on every PR. Block merges on high-severity findings. They catch things like public S3 buckets, unencrypted RDS, IMDSv1 EC2 instances, and IAM policies with *:*.

44. Explain Terraform Cloud / HCP Terraform's role.

Hosted backend with workspaces, plan/apply UI, policy-as-code (Sentinel), and run history. Pricing changed significantly post-2023. Many teams use s3+dynamodb+CI instead and skip the SaaS bill.

45. How do you implement policy-as-code with Terraform?

Sentinel (HCP Terraform), OPA + conftest, or Checkov policies. Define rules like "all S3 buckets must have encryption" or "no resources without Project tag." Enforce in CI before plan or apply.

46. What's the right CI pipeline for Terraform?

Standard pipeline:

  1. fmt -check - formatting gate
  2. validate - syntax
  3. tflint / tfsec / checkov - security
  4. plan -out=tfplan - diff review
  5. Manual approval gate for prod
  6. apply tfplan - exactly what was reviewed

47. How do you handle multi-account AWS with Terraform?

Provider aliases per account. Assume role into each account from a central CI account:

provider "aws" {
  alias  = "prod"
  region = "us-east-1"
  assume_role { role_arn = "arn:aws:iam::123:role/TerraformDeploy" }
}

State files are typically separated per account. Some teams use Terragrunt to manage the matrix.

48. What's an immutable infrastructure pattern with Terraform?

Servers are never modified in place - always replaced. Build an AMI with Packer, reference it by ID in Terraform, replace the launch template / ASG when the AMI updates. Combined with create_before_destroy, you get zero-downtime rolling deploys.

49. How do you debug a failing apply?

TF_LOG=DEBUG terraform apply 2>&1 | tee tf.log. Read provider error messages carefully - they usually point at the exact API response. For Error: Provider produced inconsistent final plan, you've hit a provider bug; pin the version and check the issue tracker.

50. What's the biggest mistake you've seen in real Terraform codebases?

The same one everywhere: a single root module managing prod, staging, and dev with conditionals, with a state file in someone's home directory or a public S3 bucket, no locking, and a 90-minute plan that nobody reads. Fix that pattern and you've fixed 70% of Terraform problems.


Final thoughts

Terraform interviews in 2026 are less about "do you know HCL syntax" and more about "have you operated this in anger." Be ready to talk about state corruption recovery, blast radius, drift, and how you would split up a monorepo of 800 resources without breaking apply times.

The candidates who get hired are the ones who can describe the failure modes, not just the happy path.