Top 50 GCP Interview Questions in 2026 (With Real Answers)
Google Cloud has carved out a real share of the cloud market by 2026, especially for AI/ML workloads, data platforms, and the "we're not on AWS" cohort. The interview format mirrors AWS but the primitives are different - and the IAM model is genuinely better.
These are the 50 GCP questions that come up most often in 2026.
Fundamentals (1-10)
1. How does GCP's project structure work?
Resources live inside projects. Projects roll up to folders, folders roll up to an organization. Billing accounts are linked to projects. IAM policies attach at any level and are inherited downward. The hierarchy is: org → folders → projects → resources.
2. Project vs folder vs organization - when do you use each?
- Project - the unit of isolation. One per app or environment.
- Folder - groups projects by team, business unit, or environment.
- Organization - the top of the tree, tied to a Workspace or Cloud Identity domain.
A small startup might be all in one project. A real company has folders like prod/, staging/, dev/, with team-specific projects under each.
3. Compare GCP IAM to AWS IAM.
GCP IAM is identity-and-role-based: you grant a principal (user, service account, group) a role on a resource. There's no equivalent to AWS resource policies for most services. GCP has fewer primitives but cleaner inheritance through the resource hierarchy.
4. What's a service account in GCP?
A non-human identity that workloads use to authenticate. Each service account has an email like name@project.iam.gserviceaccount.com. They can be impersonated, granted roles, and bound to compute resources (Cloud Run, GKE pods, GCE VMs).
5. Should you use service account keys?
Avoid them. They're long-lived credentials that are hard to rotate and easy to leak. Prefer Workload Identity (for GKE), service account impersonation (for short-lived tokens), or attached service accounts on compute resources. Keys are last resort for off-GCP workloads.
6. What is Workload Identity?
The GKE feature that lets pods authenticate as GCP service accounts without keys. You bind a Kubernetes service account to a GCP service account; pods using that KSA get IAM credentials automatically via the metadata server.
7. Compare Cloud Run, GKE, GCE, and App Engine.
- Cloud Run - serverless containers, scale to zero, pay per request. Default for stateless services in 2026.
- GKE - managed Kubernetes. For multi-service systems that need K8s primitives.
- GCE - raw VMs. When you need full OS control or specific instance types.
- App Engine - the original PaaS. Standard environment is mostly legacy; use Cloud Run for new work.
8. What's the difference between Cloud Run and Cloud Functions?
Cloud Functions is event-driven, single-function-per-deploy. Cloud Run runs any container image and is more flexible. Cloud Functions 2nd gen actually runs on Cloud Run under the hood. For new work, use Cloud Run unless you specifically want the function-style ergonomics.
9. Explain GCP regions and zones.
Region = geographic area (e.g., us-central1). Zone = isolated datacenter within a region (e.g., us-central1-a). Most services are zonal or regional. Multi-region services (Cloud Storage, BigQuery) replicate across regions for durability.
10. What's a VPC in GCP and how is it different from AWS?
A VPC is a global resource in GCP, not regional. Subnets are regional. So one VPC can span every region without peering. This drastically simplifies multi-region setups compared to AWS where each VPC is regional.
Compute and Containers (11-20)
11. How does Cloud Run scale?
By concurrent requests. Each container instance handles up to N concurrent requests (configurable, default 80). When concurrency saturates, Cloud Run starts new instances. When traffic drops, it scales to zero (or to the minimum instance count you configured).
12. Cloud Run cold starts - how do you mitigate them?
(1) Set min-instances to keep N warm. (2) Use slim base images (distroless, alpine). (3) Lazy-load heavy dependencies. (4) CPU boost during startup (it's on by default in 2nd gen). For latency-sensitive APIs, min-instances=1 is usually worth the cost.
13. What's Cloud Run jobs?
Containers that run to completion, not as a server. Triggered manually, on a schedule (Cloud Scheduler), or via API. Replaces "spin up a Compute Engine VM to run a script" for batch workloads.
14. GKE Standard vs GKE Autopilot - which do you pick?
- Autopilot - Google manages nodes, you only see pods. Per-pod billing. Less flexibility but less ops.
- Standard - you manage node pools. Fine-grained control over instance types, taints, and autoscaling.
Default to Autopilot unless you have a reason. The reasons: GPU/TPU workloads, custom node configurations, daemon sets that need privileged access.
15. How does GKE auto-scaling work?
Two layers. (1) Horizontal Pod Autoscaler scales pods based on CPU/memory/custom metrics. (2) Cluster Autoscaler scales nodes when pending pods don't fit. In Autopilot, these are automatic.
16. What's a node pool in GKE?
A group of nodes with the same configuration. You can have multiple pools per cluster (e.g., general, gpu, spot). Workloads are scheduled to specific pools via node selectors and taints.
17. Spot VMs vs Preemptible VMs?
Preemptible was the old name. Spot VMs are the modern replacement - same idea (cheap, can be terminated with 30s notice) but no 24-hour cap. Use them for batch jobs and stateless workloads that can tolerate interruption.
18. Compute Engine vs Bare Metal Solution?
Compute Engine - normal cloud VMs. Bare Metal Solution - dedicated hardware for workloads that need it (Oracle DB, certain compliance requirements). Almost no one needs Bare Metal.
19. What's a Custom Machine Type?
Compute Engine lets you pick exact vCPU and memory amounts instead of choosing from a fixed catalog. Useful when standard sizes waste either CPU or memory for your workload.
20. Explain confidential computing in GCP.
VMs with hardware-level memory encryption (AMD SEV, Intel TDX). Memory is encrypted with a key the hypervisor can't read. Used for processing sensitive data when you don't trust the cloud provider's hypervisor implicitly. Real adoption: regulated industries, multi-party computation.
Data and Storage (21-30)
21. Cloud Storage classes - which do you use?
- Standard - hot data, no minimum storage duration
- Nearline - 30-day minimum, lower cost, occasional access
- Coldline - 90-day minimum
- Archive - 365-day minimum, cheapest, slow retrieval
Lifecycle rules transition objects automatically based on age.
22. Cloud Storage vs Filestore vs Persistent Disk?
- Cloud Storage - object store, accessed by HTTP API.
- Filestore - managed NFS, mounted by VMs.
- Persistent Disk - block storage attached to VMs.
GCS for app data, PD for databases and VM disks, Filestore when you genuinely need NFS semantics (legacy apps, shared writes).
23. What's BigQuery and when is it the right choice?
A serverless data warehouse. Stores data in columnar format, scales query compute on demand. Right for analytical queries on large datasets. Wrong for OLTP workloads (point lookups, frequent updates) - that's Cloud SQL or Spanner territory.
24. BigQuery pricing - on-demand vs editions?
- On-demand - pay per TB scanned. Easy to start, hard to budget.
- Editions (Standard/Enterprise/Enterprise Plus) - reserved slots, pay for compute capacity. Better at scale.
Most teams start on-demand and migrate to slots when monthly bills cross ~$5K.
25. How do you reduce BigQuery costs?
(1) Partition tables by date. (2) Cluster on common filter columns. (3) Don't SELECT * - only the columns you need. (4) Use materialized views for repeated aggregations. (5) Set query usage limits per user.
26. Cloud SQL vs AlloyDB vs Spanner?
- Cloud SQL - managed Postgres/MySQL/SQL Server. Standard relational workloads.
- AlloyDB - Google's Postgres-compatible, optimized for analytical and transactional mixed workloads.
- Spanner - globally distributed, strongly consistent, horizontally scalable. The big one.
Default to Cloud SQL. Move to AlloyDB if you need higher performance or analytical SQL on operational data. Use Spanner for genuine global scale or financial-grade consistency.
27. What is Spanner's superpower?
Strong external consistency at global scale. TrueTime gives every transaction a globally ordered timestamp using GPS+atomic clocks in datacenters. You get cross-region ACID transactions without sacrificing throughput.
28. Firestore vs Datastore vs Bigtable?
- Firestore - document database, scales to zero, good for app backends.
- Datastore - older mode of Firestore, mostly legacy.
- Bigtable - wide-column, petabyte-scale, single-digit ms latency. For time-series, IoT, and analytics ingest.
29. What's Pub/Sub?
GCP's managed message queue. At-least-once delivery, scales horizontally, integrates with most services. Used for event-driven architectures, log fanout, and async processing. The 2026 default for "I need a queue."
30. Pub/Sub Lite - when do you use it?
A cheaper, lower-feature variant. Pre-provisioned capacity, no global routing, no exactly-once. Use only when you've measured costs on Pub/Sub Standard and need to optimize.
Networking and Security (31-40)
31. Explain GCP's VPC network model.
VPCs are global. Subnets are regional. Routes and firewall rules attach to the VPC. Default mode auto-creates subnets in every region; custom mode requires you to define each subnet explicitly. Always use custom mode in production.
32. What is a VPC peering vs Cloud Interconnect vs VPN?
- VPC peering - two VPCs talk directly, no transit charges, no transitive peering.
- Cloud Interconnect - dedicated physical link between on-prem and GCP. High bandwidth.
- VPN - encrypted tunnel over the public internet. Cheap, low bandwidth.
33. What's a Shared VPC?
A network architecture where one "host" project owns the VPC and other "service" projects attach to it. Centralizes network management while keeping per-team project isolation. Standard for any organization with more than five projects.
34. Explain Private Service Connect.
A way to expose services privately across VPCs or organizations without peering. You publish a service via PSC, consumers connect via a private endpoint. Used for managed services (Cloud SQL, third-party SaaS) without traversing the internet.
35. What is Cloud Armor?
Google's WAF + DDoS protection. Attaches to global load balancers. Provides geographic blocking, OWASP Top 10 rules, and rate limiting. Standard front door for any internet-facing service.
36. How does Identity-Aware Proxy work?
IAP terminates connections at Google's edge and authenticates users via Cloud Identity before forwarding to your backend. Replaces traditional VPNs for accessing internal apps. You grant roles/iap.httpsResourceAccessor to users; they hit the URL, sign in with Google, and reach the app.
37. What is BeyondCorp?
Google's zero-trust access model. No corporate network perimeter; every access decision is based on user identity and device posture. IAP is one of the building blocks. It's a model, not a product.
38. KMS in GCP - what does it manage?
Cryptographic keys, organized by keyrings. Customer-Managed Encryption Keys (CMEK) replace Google's defaults for services like GCS, Compute Engine disks, and BigQuery. Customer-Supplied Encryption Keys (CSEK) let you provide raw key bytes. CMEK is the standard.
39. Secret Manager vs storing secrets in env?
Secret Manager is encrypted at rest, IAM-controlled, audit-logged, and supports versioning. Env vars in deployments are visible to anyone with deployment-read permissions and don't rotate. Always use Secret Manager.
40. How does GCP Audit Logs work?
Three log streams. Admin Activity (always on), Data Access (off by default, expensive), and System Event. Routed to Cloud Logging, exportable to BigQuery, GCS, or Pub/Sub. Required for compliance; expensive if you turn on Data Access for every service.
AI/ML and Real-World Patterns (41-50)
41. What is Vertex AI?
GCP's unified ML platform. Training, model registry, online/batch prediction, feature store, pipelines, and now hosting for foundation models. By 2026, it's also where Gemini models are served for enterprise customers.
42. What's Gemini in GCP and how do you call it?
Google's frontier model family. Available through Vertex AI's generativeai API. Authentication is standard GCP IAM, billing rolls up to your project. The same models you can call via the consumer Gemini API are available with enterprise SLAs through Vertex.
43. TPU vs GPU - when do you use TPUs?
TPUs are Google's custom AI accelerators. Best when you're training or serving Google-favored architectures (transformer-heavy) at scale. GPUs are more general-purpose. If your code already runs on CUDA and you're not hitting cost walls, GPUs are easier.
44. How do you set up a CI/CD pipeline on GCP?
Cloud Build for containerized builds. Artifact Registry for image storage. Cloud Deploy for progressive rollouts to GKE/Cloud Run. Trigger from GitHub, Bitbucket, or Cloud Source Repositories. Replaces a self-managed Jenkins or external CI for many teams.
45. What's Cloud Build Triggers vs Cloud Deploy?
Cloud Build Triggers - run on commit. Builds, tests, pushes images. Cloud Deploy - manages progressive deployment of those images through environments (dev → staging → prod) with approval gates and canary support.
46. How do you handle multi-environment deployments?
Separate projects per environment (myapp-dev, myapp-staging, myapp-prod). Folder structure groups them. Service account in CI assumes a deploy role in each project. Same Terraform code with different tfvars per environment.
47. What's the standard pattern for cross-project IAM?
Service account in project A is granted a role in project B. Workloads in project A authenticate as that SA, get permissions in B. Common case: shared CI project deploys into multiple environment projects.
48. How does GCP billing work?
Billing accounts are separate from projects. One billing account can pay for many projects. Budgets and alerts attach to billing accounts. Enable detailed billing export to BigQuery for cost analysis - the default UI is not enough.
49. What's the right way to investigate a cost spike?
Billing export to BigQuery, then query by service, project, and SKU. Look for: new resources, regional egress, BigQuery scans, Cloud Logging volume. Most spikes are one of those four.
50. What's the most common GCP architecture mistake?
Either (1) one giant project with everything in it (no IAM isolation, no blast radius limits), or (2) using default networks instead of custom VPCs. Both bite you within the first year of growth and are painful to fix once you have running workloads.
Final thoughts
GCP interviews in 2026 reward people who understand the resource hierarchy, can reason about IAM at scale, and have actually shipped on Cloud Run or GKE. The shallow questions ("name three GCP services") are warmups - the depth shows in how you talk about service accounts, networking, and cost control.
If you're coming from AWS, the IAM model alone takes a few weeks to internalize. Once you have it, GCP starts to feel cleaner.