September 4, 2025Mike Vanbuskirk

Provisioning Google Cloud Infrastructure with Terraform (GKE Cluster Example)

What you'll learn: This hands-on guide walks through creating a production-ready GKE cluster using Terraform, including VPC networking, GitHub Actions automation, and Workload Identity Federation. You'll build a complete infrastructure-as-code foundation that eliminates manual configuration and scales across environments.

Introduction

Managing Kubernetes infrastructure through point-and-click interfaces creates undocumented configuration that then becomes a problem months later. Your GKE cluster runs smoothly until someone needs to recreate it.

Nobody documented which IAM roles they assigned, what firewall rules they created, or why they chose specific node pool settings. Now you're reverse-engineering a production cluster by clicking through dozens of Cloud Console screens, trying to identify every setting that matters. Each manual configuration represents technical debt that accumulates whenever you need to replicate, modify, or troubleshoot your infrastructure.

Terraform changes this dynamic by expressing your GKE cluster as code. Your networking setup, node pool specifications, and security policies exist as reviewable, version-controlled code. When you need a new cluster, you run terraform apply with different variables, rather than spending hours in the console. Changes go through pull requests where teammates can spot that your pod CIDR range overlaps with an existing VPC before it causes production issues.

This guide walks through building a GKE cluster with Terraform, automating deployments through GitHub Actions, and implementing Workload Identity Federation to remove long-lived credentials from your CI/CD pipeline.

You'll handle the specific challenges GCP presents:

  • APIs that must be explicitly enabled before use
  • Quota limits that vary by project age
  • Networking configurations that fail silently when misconfigured.

The end result is infrastructure you can confidently modify, replicate, and scale without manual configuration guesswork.

⚡ ⚡ ⚡

Creating a GKE Cluster with Terraform

Before writing any Terraform code, you need to configure your Google Cloud project and authentication. GCP's permission model requires explicit API enablement, which catches out many engineers moving from AWS or Azure, where services activate on first use.

Setting Up Your Google Cloud Service Account

Terraform needs a service account with specific permissions to create resources in your GCP project. Start by creating a dedicated service account rather than using your personal credentials:

# Set your project ID
export PROJECT_ID="your-project-id"
gcloud config set project $PROJECT_ID

# Create the service account
gcloud iam service-accounts create terraform-gke \
  --display-name="Terraform GKE Service Account"

# Grant necessary roles
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:terraform-gke@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/container.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:terraform-gke@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/compute.networkAdmin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:terraform-gke@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/iam.serviceAccountUser"

These roles give Terraform the minimum permissions needed to create a GKE cluster with its networking. The container.admin role manages Kubernetes resources, compute.networkAdmin handles VPC and subnet creation, and iam.serviceAccountUser allows the service account to act as itself when creating resources.

Each role serves a specific purpose:

  • Without container.admin, you can't create the cluster itself
  • Without compute.networkAdmin, the VPC and subnet creation fails
  • Without iam.serviceAccountUser, the node pools can't use the default compute service account.

Generate a key for local testing, though we'll replace this with Workload Identity Federation in the GitHub Actions section:

gcloud iam service-accounts keys create ~/terraform-gke-key.json \
  --iam-account=terraform-gke@$PROJECT_ID.iam.gserviceaccount.com

export GOOGLE_APPLICATION_CREDENTIALS=~/terraform-gke-key.json

With your service account configured, you need a place to store Terraform's state file that isn't your local development machine. State files track which resources Terraform manages and their current configuration. Losing this file means Terraform can't update or destroy existing resources.

Configuring Terraform State Storage

By default, Terraform stores state locally in a file named terraform.tfstate. When working with Terraform in a team, each developer must ensure they have the latest state data before running Terraform and coordinate to avoid concurrent runs. Remote state solves this coordination problem by storing the state file in a shared location where all team members can access it, with automatic locking to prevent conflicts when multiple people run Terraform simultaneously.

Google Cloud Storage works well as a remote backend for Terraform state:

# Create the state bucket with versioning
gsutil mb -p $PROJECT_ID -c STANDARD -l us-central1 gs://${PROJECT_ID}-terraform-state/
gsutil versioning set on gs://${PROJECT_ID}-terraform-state/

Configure Terraform to use this bucket:

# backend.tf
terraform {
  backend "gcs" {
    bucket = "your-project-id-terraform-state"
    prefix = "terraform/gke-cluster"
  }
}

# Enable required APIs before creating resources
resource "google_project_service" "required_apis" {
  for_each = toset([
    "container.googleapis.com",
    "compute.googleapis.com",
    "iam.googleapis.com",
    "storage.googleapis.com",
  ])
  
  service            = each.key
  disable_on_destroy = false
}

The state bucket needs to exist before running terraform init. The prefix parameter organizes state files when you're managing multiple environments or clusters from the same bucket. Each unique prefix gets its own state file, so your production and staging clusters maintain separate state. The GCS backend automatically handles state locking - when someone runs terraform apply, it acquires a lock that prevents others from making concurrent changes until the operation completes.

Remote state also enables infrastructure decomposition across teams. Your networking team might manage the VPC and subnets in one Terraform configuration, then expose subnet IDs and CIDR ranges through remote state outputs. Your application teams can reference these values using the terraform_remote_state data source, building on top of the core infrastructure without needing direct access to modify it. This separation lets different teams work independently while maintaining clear dependencies between infrastructure layers.

With Terraform's backend configured and APIs enabled, you can start defining the actual infrastructure, beginning with the network layer that your GKE cluster requires.

Building the Network Foundation

GKE clusters need a properly configured VPC with subnets for both nodes and pods. The secondary IP ranges for pods and services prevent IP exhaustion as your cluster grows:

# network.tf
resource "google_compute_network" "gke_vpc" {
  name                    = "gke-vpc"
  auto_create_subnetworks = false
  project                 = var.project_id
  
  depends_on = [
    google_project_service.required_apis["compute.googleapis.com"]
  ]
}

resource "google_compute_subnetwork" "gke_subnet" {
  name          = "gke-subnet"
  ip_cidr_range = "10.0.0.0/20"
  region        = var.region
  network       = google_compute_network.gke_vpc.id
  
  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.4.0.0/14"
  }
  
  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.8.0.0/20"
  }
  
  private_ip_google_access = true
}

These CIDR ranges support up to 4,094 nodes, 262,144 pod IPs, and 4,096 service IPs. The pod range sizing requires careful planning because GKE reserves a /24 block (256 IPs) for each node, regardless of actual pod count. A node running 10 pods still consumes 256 IP addresses from your pod range. This reservation model means a 100-node cluster needs 25,600 pod IPs even if you only run 1,000 pods total.

Setting auto_create_subnetworks to false gives you explicit control over IP allocation. Google's auto-created subnets use a /20 range in every region, which wastes address space and complicates VPC peering with other projects. Manual subnet creation lets you allocate larger ranges where you need them and smaller ranges in regions with minimal infrastructure. The private_ip_google_access flag allows nodes to reach Google APIs and services without requiring external IP addresses, reducing your attack surface and egress costs.

Secondary ranges keep pod and service IPs separate from node IPs, which simplifies firewall rules and routing. Without secondary ranges, pods share the node subnet, making it difficult to distinguish pod traffic from node traffic in firewall rules. The named ranges ("pods" and "services") get referenced in the GKE cluster configuration, creating an explicit link between your network architecture and cluster setup.

Defining the GKE Cluster

Let's deploy a GKE cluster with Terraform that incorporates production-ready settings.

The cluster configuration separates the control plane from the node pools, giving you flexibility to modify nodes without recreating the entire cluster:

# gke.tf
resource "google_container_cluster" "primary" {
  name     = "primary-gke-cluster"
  location = var.region
  
  # We'll manage the node pool separately
  remove_default_node_pool = true
  initial_node_count       = 1
  
  network    = google_compute_network.gke_vpc.self_link
  subnetwork = google_compute_subnetwork.gke_subnet.self_link
  
  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }
  
  cluster_autoscaling {
    enabled = true
    resource_limits {
      resource_type = "cpu"
      minimum       = 4
      maximum       = 100
    }
    resource_limits {
      resource_type = "memory"
      minimum       = 16
      maximum       = 400
    }
  }
  
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }
  
  depends_on = [
    google_project_service.required_apis["container.googleapis.com"]
  ]
}

resource "google_container_node_pool" "primary_nodes" {
  name     = "primary-node-pool"
  location = var.region
  cluster  = google_container_cluster.primary.name
  
  autoscaling {
    min_node_count = 3
    max_node_count = 10
  }
  
  management {
    auto_repair  = true
    auto_upgrade = true
  }
  
  node_config {
    preemptible     = false
    machine_type    = "e2-standard-4"
    disk_size_gb    = 100
    disk_type       = "pd-standard"
    
    metadata = {
      disable-legacy-endpoints = "true"
    }
    
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
    
    workload_metadata_config {
      mode = "GKE_METADATA"
    }
  }
}

The regional deployment (using var.region instead of a specific zone) spreads nodes across multiple availability zones automatically. This protects against zone failures but triples your control plane costs compared to a zonal cluster. For development environments, you might prefer a zonal cluster to save the $150/month in additional control plane fees. The trade-off is that zone failures take down your entire cluster rather than just reducing capacity.

Separating the node pool from the cluster definition allows you to modify node configurations without touching the control plane. You can change machine types, disk sizes, or autoscaling parameters by updating just the node pool resource. Creating the cluster with remove_default_node_pool = true and initial_node_count = 1 starts a minimal cluster that Terraform immediately scales down, avoiding the cost of unused default nodes.

The cluster autoscaler adjusts the node count based on pod resource requests. When pods can't be scheduled due to insufficient CPU or memory, it adds nodes up to your maximum limit. The autoscaler considers the resource limits you've defined - with our configuration supporting 100 CPUs and 400GB of memory maximum. Each e2-standard-4 node contributes 4 vCPUs and 16GB of memory, so the cluster can scale to approximately 25 nodes before hitting these limits.

Workload Identity appears in both the cluster and node pool configuration. The cluster-level setting creates the workload identity pool, while the node-level GKE_METADATA mode configures the metadata server on each node. This combination allows pods to authenticate as Google Cloud service accounts without storing keys in Kubernetes secrets. We'll use this capability when configuring GitHub Actions to deploy without long-lived credentials.

⚡ ⚡ ⚡

Integrating GitHub Actions to Automate the Deployment

Manual Terraform runs from developer laptops introduce risk and inconsistency. Different engineers might have different versions of Terraform installed, environment variables set differently, or local changes that haven't been committed. GitHub Actions creates a controlled environment for infrastructure changes with consistent tooling and proper authentication through Workload Identity Federation.

Configuring Workload Identity Federation

Workload Identity Federation allows GitHub Actions to authenticate directly to GCP using OIDC tokens instead of storing service account keys in GitHub secrets:

# workload_identity.tf
resource "google_iam_workload_identity_pool" "github" {
  workload_identity_pool_id = "github-actions"
  display_name              = "GitHub Actions Pool"
  description               = "Identity pool for GitHub Actions"
}

resource "google_iam_workload_identity_pool_provider" "github" {
  workload_identity_pool_id          = google_iam_workload_identity_pool.github.workload_identity_pool_id
  workload_identity_pool_provider_id = "github-provider"
  display_name                        = "GitHub Provider"
  
  attribute_mapping = {
    "google.subject"       = "assertion.sub"
    "attribute.actor"      = "assertion.actor"
    "attribute.repository" = "assertion.repository"
  }
  
  oidc {
    issuer_uri = "https://token.actions.githubusercontent.com"
  }
}

resource "google_service_account_iam_binding" "github_actions" {
  service_account_id = google_service_account.terraform.name
  role               = "roles/iam.workloadIdentityUser"
  
  members = [
    "principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github.name}/attribute.repository/your-org/your-repo"
  ]
}

Replace your-org/your-repo with your actual GitHub repository path. This configuration restricts authentication to workflows running in your specific repository. Even if someone copies your workflow file, they can't authenticate from a different repository. The repository attribute in the attribute_mapping enforces this security boundary at the GCP level, not just in your workflow configuration.

The attribute mapping connects GitHub's OIDC token claims to Google Cloud attributes. GitHub's OIDC token includes information about the workflow run: which repository it's from, which branch triggered it, and who initiated the run. The mapping tells Google Cloud how to interpret these claims. The repository attribute is particularly important since it's what enforces the repository-level access control.

This setup replaces the traditional approach of creating a service account key, downloading the JSON file, and storing it in GitHub secrets. Those keys never expire unless you rotate them manually, and anyone with repository admin access can view them. With Workload Identity Federation, GitHub receives temporary credentials that expire after an hour, and there's no long-lived secret to steal or leak.

Creating the GitHub Actions Workflow

The workflow runs Terraform commands in response to pull requests and merges to main:

# .github/workflows/terraform.yml
name: 'Terraform GKE Deployment'

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

env:
  TF_VERSION: '1.5.7'
  GOOGLE_REGION: 'us-central1'

jobs:
  terraform:
    name: 'Terraform'
    runs-on: ubuntu-latest
    
    permissions:
      contents: read
      id-token: write
      pull-requests: write
    
    steps:
    - name: Checkout
      uses: actions/checkout@v4
    
    - name: Authenticate
      uses: google-github-actions/auth@v3
      with:
        workload_identity_provider: 'projects/${{ secrets.GCP_PROJECT_NUMBER }}/locations/global/workloadIdentityPools/github-actions/providers/github-provider'
        service_account: 'terraform-gke@${{ secrets.GCP_PROJECT_ID }}.iam.gserviceaccount.com'
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
      with:
        terraform_version: ${{ env.TF_VERSION }}
    
    - name: Terraform Init
      run: terraform init
    
    - name: Terraform Format Check
      run: terraform fmt -check
    
    - name: Terraform Plan
      id: plan
      run: terraform plan -no-color -out=tfplan
    
    - name: Terraform Apply
      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
      run: terraform apply tfplan

The id-token: write permission allows the workflow to request OIDC tokens from GitHub. The authentication step exchanges this token for temporary Google Cloud credentials. No long-lived keys exist in your repository or GitHub secrets - only your project ID and project number, which aren't sensitive on their own.

On pull requests, the workflow runs terraform plan and stops, letting reviewers see infrastructure changes before they merge. Only pushes to main trigger the actual apply. The -out=tfplan flag saves the exact plan that was reviewed. This prevents drift between plan and apply if other changes merge in between. Without saving the plan, Terraform might apply different changes than what reviewers approved if another pull request modifies the same resources.

The format check ensures consistent code style across your team. Terraform fmt reformats configuration files to a canonical style, making diffs cleaner and code reviews easier. Running this check in CI catches formatting issues before they create noisy pull requests. If the check fails, developers can run terraform fmt locally to fix formatting before pushing again.

⚡ ⚡ ⚡

Common GCP-Specific Challenges

GCP has unique behaviors around API enablement, quota management, and network configuration that differ from other cloud providers. Understanding these differences prevents deployment failures and debugging sessions that stretch into hours.

API Enablement Dependencies

GCP requires explicit API enablement before you can create resources. Terraform can enable APIs, but timing issues cause failures when resources try to create before their API's activated. Add explicit dependencies to prevent this:

resource "google_container_cluster" "primary" {
  # ... cluster configuration ...

  depends_on = [
    google_project_service.required_apis["container.googleapis.com"],
    google_project_service.required_apis["compute.googleapis.com"],
  ]
}

The depends_on ensures Terraform waits for API activation to complete. Without this, you'll see errors about APIs being disabled even though your configuration enables them. The error message "googleapi: Error 403: Kubernetes Engine API has not been used in project" means Terraform tried to create the cluster before the API finished enabling. This happens because API enablement is eventually consistent - the google_project_service resource completes successfully, but the API isn't immediately available across all GCP regions and zones.

The eventual consistency delay varies from seconds to several minutes depending on the API and current GCP load. Some APIs like Cloud Run or Cloud Functions can take up to five minutes to fully propagate. Adding disable_on_destroy = false to your API resources prevents another common issue: accidentally disabling APIs when running terraform destroy, which can break other resources in the project that Terraform doesn't manage.

Project Quota Limitations

GCP enforces quotas that vary by project age and billing history. New projects often hit CPU quota limits when creating GKE clusters. Check your quotas before deployment:

gcloud compute project-info describe \
  --project=$PROJECT_ID \
  --format="value(quotas[name='CPUS'].limit)"

A basic three-node cluster with e2-standard-4 machines needs 12 CPUs minimum. Add the cluster autoscaler maximum and you might need 40 CPUs or more. New GCP projects typically start with a 24 CPU quota in each region, which seems sufficient until you realize that both your staging and production clusters count against the same regional quota. Preemptible instances have a separate quota pool, so you might be able to create preemptible nodes when standard nodes fail due to quota limits.

Request quota increases through the console at least 24 hours before production deployments. The approval process runs automatically for reasonable increases but can take time. Requesting 100 CPUs for an established project with a billing history usually gets approved within minutes. The same request for a brand-new project might require manual review and take 2-3 business days. Include justification text explaining your use case - "Running GKE cluster with autoscaling for production workload" generally approves faster than leaving the field blank.

Network Peering and IP Range Conflicts

GKE clusters in different regions or projects often need network connectivity. VPC peering fails silently when IP ranges overlap. The peering establishes successfully but pods can't actually communicate. Document your IP allocations in variables to prevent conflicts:

# variables.tf - Centralize IP management
variable "network_cidrs" {
  default = {
    us_central1 = {
      nodes    = "10.0.0.0/20"
      pods     = "10.4.0.0/14"
      services = "10.8.0.0/20"
    }
    europe_west1 = {
      nodes    = "10.16.0.0/20"
      pods     = "10.20.0.0/14"
      services = "10.24.0.0/20"
    }
  }
}

This approach makes IP planning visible in code reviews. Reviewers can spot conflicts before they cause production issues. The pod and service ranges count for overlap checking too - a common mistake is only checking node ranges. Overlapping pod ranges between peered clusters breaks cross-cluster pod communication, even though nodes can reach each other fine.

Private Google Access adds another consideration for network design. When enabled, Google API requests from private nodes route through Google's private network instead of the internet. This works well until you need to access Google APIs from on-premises networks connected via VPN or Interconnect. Those requests need specific routes to 199.36.153.8/30 (restricted.googleapis.com) or custom DNS configuration to reach private.googleapis.com endpoints. Many teams discover this during production deployment when their on-premises applications can't reach GCS buckets or BigQuery datasets from the connected VPC.

⚡ ⚡ ⚡

Conclusion

Following this guide, you've built a GKE cluster with Terraform that can be consistently deployed across environments. Your infrastructure now exists as version-controlled code where changes go through review processes. The Workload Identity Federation setup removes long-lived credentials from your CI/CD pipeline, while the structured Terraform modules make your infrastructure reusable across projects and regions.

The patterns demonstrated here scale to more complex architectures. The same service account configuration, state management, and GitHub Actions workflows support multi-region deployments and complex microservices architectures. When you need to add Redis clusters on GKE or implement multi-cluster ingress, you'll extend these foundations rather than starting from scratch. The network configuration you've built supports VPC peering for multi-cluster setups, and the autoscaling configuration adapts to varying workload demands.

For teams managing multiple GKE environments or needing more sophisticated deployment workflows, Terrateam adds pull request automation, cost estimation, and policy enforcement on top of your existing Terraform code. Terrateam integrates directly with your GitHub repositories, adding plan locks that prevent conflicting infrastructure changes and drift detection that alerts when manual changes bypass Terraform. These capabilities build on the infrastructure-as-code foundation you've created, adding governance without changing how you write Terraform.

⚡ ⚡ ⚡

GKE Glossary

What is a GKE cluster?

A GKE cluster represents a managed Kubernetes environment running on Google Cloud Platform. Google handles the control plane components (the API server, scheduler, and controller manager) while you manage the worker nodes that run your applications. Each GKE cluster provides a complete Kubernetes API implementation with additional Google Cloud integrations for load balancing, persistent storage, and identity management. Regional clusters spread control plane replicas across multiple zones for high availability, while zonal clusters run in a single zone for development environments.

The cluster architecture separates the control plane from worker nodes, allowing independent scaling and management. Your applications run as pods on the worker nodes, which are organized into node pools based on machine type and configuration. GKE automatically handles tasks like certificate rotation, control plane upgrades, and etcd backups that would require significant operational overhead in self-managed Kubernetes. The managed control plane also integrates with Google Cloud's networking to provide features like private clusters, authorized networks, and VPC-native networking without manual configuration.

How much does a GKE cluster cost?

GKE cluster pricing includes a management fee of $0.10 per cluster per hour (roughly $74/month) plus the cost of your worker nodes. The management fee applies to both zonal and regional clusters, though regional clusters incur additional infrastructure costs for their multi-zone control plane. Your worker nodes are charged as regular Compute Engine instances based on their machine type, so a three-node cluster with e2-standard-4 instances costs about $97 per node monthly, plus the cluster management fee. Autopilot clusters have no management fee but charge a premium per pod resource request, typically costing 20-30% more than standard clusters for equivalent workloads.

Cost optimization strategies vary based on workload requirements. Preemptible nodes reduce costs by up to 80%, but can be terminated with 30 seconds notice, making them suitable for batch jobs and fault-tolerant services. Spot VMs offer similar savings with slightly better availability. Committed use discounts provide 37% savings for one-year commitments or 57% for three-year commitments on the node compute costs. The cluster autoscaler helps control costs by scaling down during low usage periods, though you still pay the base management fee even with zero nodes.

What is the best way to use Google Cloud managed services from an application running in GKE?

Applications running in GKE connect to Google Cloud services through Workload Identity, which maps Kubernetes service accounts to Google Cloud service accounts without storing keys in your pods. A pod that needs to access Cloud Storage or BigQuery receives temporary credentials automatically through the GKE metadata server. Configure this by annotating your Kubernetes service account with the corresponding Google Cloud service account email, then grant the Google service account appropriate IAM permissions for the services your application needs.

The implementation requires coordination between Kubernetes and Google Cloud configurations. First, create a Kubernetes service account in your namespace and annotate it with iam.gke.io/gcp-service-account=YOUR-GSA@PROJECT.iam.gserviceaccount.com. Then bind the Google Cloud service account to the Kubernetes service account using the iam.workloadIdentityUser role. Your pods must reference the Kubernetes service account in their pod spec.

When the application makes API calls, the GKE metadata server intercepts the request and provides temporary credentials based on the mapping. This approach works for all Google Cloud services including Cloud SQL, Pub/Sub, and Firestore, maintaining security boundaries between different applications in your cluster.