November 6, 2025josh-pollara

How to Deploy an EKS Cluster with Terraform

What you'll learn: This guide walks you through provisioning production-ready EKS clusters with Terraform, from networking setup through worker node deployment. You'll configure IAM roles for secure cluster access, build VPCs with proper subnet tagging for load balancers, deploy managed node groups with autoscaling, and avoid the common pitfalls that break clusters under production load.

Introduction

Manually creating an EKS (Elastic Kubernetes Service) cluster through the AWS console seems straightforward until you're clicking through dozens of screens, each hiding critical settings that only surface as problems when your production deployment fails. You think you've configured everything correctly, but your load balancers can't provision because you missed a subnet tag, or your nodes can't pull images because the IAM role lacks one permission.

The inconsistency between environments compounds the problem. Your development cluster works perfectly, but staging fails because someone selected different instance types or forgot to enable DNS support in the VPC.

By the time you reach production, you're deploying to three unique snowflakes that behave differently under load, making debugging hard when things break.

Terraform changes this into manageable, version-controlled infrastructure. Your entire cluster configuration (networking, security, compute) lives in code that deploys identically whether you're creating development environments or disaster recovery regions. When production incidents occur, you can trace the exact change through Git history instead of relying on someone's memory of what they modified last week.

This guide shows you how to create EKS clusters using Terraform that work effectively. You'll understand why each setting matters and what breaks when you get it wrong.

Understanding AWS EKS

EKS is AWS's managed Kubernetes offering that handles the control plane while you focus on deploying applications. AWS maintains the master nodes, etcd clusters, and API servers across multiple availability zones, while you manage the worker nodes that run your workloads.

It eliminates operational headaches (like wrestling with certificate rotation, etcd backups, and control plane upgrades) through a managed control plane that AWS keeps patched, scaled, and compatible with the Kubernetes ecosystem.

It also integrates seamlessly with AWS services. IAM roles map directly to Kubernetes service accounts through IRSA (IAM Roles for Service Accounts), so your pods authenticate to DynamoDB or S3 using temporary credentials that rotate automatically – the same security model you use everywhere else in AWS.

The cost structure reflects this split: you pay $0.10 per hour for the control plane, plus standard EC2 charges for worker nodes. The control plane costs a few dollars per day, with real expenses coming from the compute resources you provision.

EKS cluster architecture

An EKS cluster consists of a managed control plane in AWS's account that connects to worker nodes in your account through elastic network interfaces (ENIs) in your VPC. The control plane runs across multiple availability zones but appears as a single Kubernetes API endpoint.

Worker nodes come in three flavors:

  • Managed node groups (AWS handles EC2 lifecycle)
  • Self-managed nodes (for custom AMIs or configurations)
  • Fargate profiles (serverless pods without EC2 management).

Most teams start with managed node groups.

Critical add-ons make the cluster functional: AWS VPC CNI for pod networking, CoreDNS for service discovery, and kube-proxy for service routing. These components break your cluster if misconfigured, which is why EKS manages their lifecycle and compatibility.

Authentication flows through AWS IAM first, then maps to Kubernetes RBAC. This means you grant AWS users cluster access, and then use standard Kubernetes roles to control their permissions within the cluster.

The case for deploying EKS with Terraform

Manual EKS deployment through the console means clicking through dozens of screens, with every click being a chance to misconfigure something that only breaks under production load.

Terraform turns EKS deployment into reproducible code. Your VPC configuration, subnet tagging, IAM roles, and node groups live in version-controlled files that deploy identically across environments. When production breaks, you can trace the exact change that caused it through Git history.

Terraform also handles EKS's web of dependencies: clusters need VPCs, node groups need clusters, and IAM roles must exist before either. It creates resources in the right order and, more importantly, destroys them correctly when you tear down environments.

Cluster upgrades also become easy via pull requests, especially when integrated with CI/CD pipelines. Update the cluster version in your Terraform configuration, run a plan to review the changes, test in development, and then promote the same change through staging to production. A consistent process.

For teams running multiple clusters, Terraform modules let you standardize configurations while allowing environment-specific overrides (for more on structuring your Terraform, see our guide on code organization), ensuring security policies and networking patterns stay consistent while letting each team tune their cluster sizing and node types.

What you need before deploying

Your AWS account requires specific IAM permissions to create EKS resources, including full access to EKS, EC2 for node groups, IAM for role creation, and VPC for networking. Without ec2:CreateNetworkInterface or iam:CreateServiceLinkedRole, your deployment will fail halfway through with cryptic errors.

Install Terraform (version 1.3 or higher for stable EKS provider features), AWS CLI for verification, and kubectl to interact with your cluster once it's running. Configure AWS credentials through environment variables or AWS CLI profiles; never hardcode them in Terraform files.

Plan your network architecture before writing any code. Choose CIDR ranges that won't overlap with existing VPCs or on-premises networks you might peer with later (changing VPC CIDR ranges means rebuilding everything from scratch). A /16 VPC with /24 subnets gives you room to grow without waste.

Decide on your cluster version (use the latest stable, currently 1.29, unless you have specific compatibility requirements), instance types for node groups (t3.medium minimum for development, larger for production workloads), and whether you need public endpoint access (convenient but less secure) or private-only (requires VPN or bastion access).

Budget for hidden costs: NAT gateways run $45 per month each, data transfer between availability zones adds up quickly, and don't forget the $72 monthly control plane fee. A basic production setup with redundancy typically costs $200–$300 before any actual workloads are executed.

How to deploy an EKS cluster with Terraform

Note: These code examples illustrate the essential EKS resources. You'll need to add supporting infrastructure (route tables, data sources, etc.) and organize resources in the correct dependency order for actual deployment.

Setting up the provider and variables

Start with provider configuration that pins versions to avoid surprises when HashiCorp releases updates. The AWS provider changes frequently, and the Kubernetes provider you'll need later has its own compatibility matrix with EKS versions.

terraform {
  required_version = ">= 1.3"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "Terraform"
      Project     = var.cluster_name
    }
  }
}

Define variables that make your configuration reusable across environments. These are essential because hardcoding values means maintaining separate configurations for dev, staging, and production.

variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
}

variable "cluster_version" {
  description = "Kubernetes version"
  type        = string
  default     = "1.29"
}

variable "aws_region" {
  type    = string
  default = "us-east-1"
}

variable "environment" {
  type    = string
  default = "development"
}

The default_tags block automatically applies tags to every resource, critical for cost tracking and compliance. Without proper tagging, you'll never figure out which team's cluster is burning through your AWS budget.

Creating the networking foundation

EKS has specific networking requirements that break your cluster if misconfigured. Your VPC needs DNS support enabled, subnets must be tagged for load balancer discovery, and private subnets need NAT gateways for nodes to pull container images.

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.cluster_name}-vpc"
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 1}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "${var.cluster_name}-private-${count.index + 1}"
    "kubernetes.io/cluster/${var.cluster_name}" = "owned"
    "kubernetes.io/role/internal-elb"           = "1"
  }
}

resource "aws_subnet" "public" {
  count                   = 2
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index + 10}.0/24"
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.cluster_name}-public-${count.index + 1}"
    "kubernetes.io/cluster/${var.cluster_name}" = "owned"
    "kubernetes.io/role/elb"                    = "1"
  }
}

Without kubernetes.io/role/elb on public subnets, the AWS Load Balancer Controller can't provision internet-facing load balancers. Likewise, missing kubernetes.io/role/internal-elb on private subnets means internal services can't get load balancers. These tags tell Kubernetes where to place different types of load balancers.

Create internet and NAT gateways for connectivity:

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.cluster_name}-igw" }
}

resource "aws_eip" "nat" {
  count  = 2
  domain = "vpc"
  tags   = { Name = "${var.cluster_name}-nat-eip-${count.index + 1}" }
}

resource "aws_nat_gateway" "main" {
  count         = 2
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  tags          = { Name = "${var.cluster_name}-nat-${count.index + 1}" }
}

Two NAT gateways cost more but provide availability zone isolation. When one AZ fails, nodes in the other zone keep running. Using a single NAT gateway saves $45 monthly but creates a single point of failure.

Configuring IAM roles and policies

EKS needs two separate IAM roles: one for the cluster control plane and another for worker nodes. These roles use different trust policies because they're assumed by different AWS services.

resource "aws_iam_role" "cluster" {
  name = "${var.cluster_name}-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}

resource "aws_iam_role" "node_group" {
  name = "${var.cluster_name}-node-group-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

Node groups need three managed policies to function: WorkerNodePolicy for kubelet API access, CNI_Policy for pod networking, and ContainerRegistryReadOnly to pull images from ECR.

resource "aws_iam_role_policy_attachment" "node_policies" {
  for_each = toset([
    "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
    "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
    "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  ])

  policy_arn = each.value
  role       = aws_iam_role.node_group.name
}

Set up OIDC provider for pod-level IAM roles (IRSA) to let pods authenticate to AWS services without storing credentials:

data "tls_certificate" "cluster" {
  url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "cluster" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.cluster.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

Creating the EKS cluster resource

The cluster resource ties everything together. VPC configuration determines which subnets host the control plane ENIs and where worker nodes can run.

resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.cluster.arn
  version  = var.cluster_version

  vpc_config {
    subnet_ids              = concat(aws_subnet.private[*].id, aws_subnet.public[*].id)
    endpoint_private_access = true
    endpoint_public_access  = true
    public_access_cidrs    = ["0.0.0.0/0"]
  }

  enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  encryption_config {
    provider {
      key_arn = aws_kms_key.eks.arn
    }
    resources = ["secrets"]
  }

  depends_on = [
    aws_iam_role_policy_attachment.cluster_policy,
    aws_cloudwatch_log_group.cluster
  ]
}

resource "aws_kms_key" "eks" {
  description = "EKS cluster encryption key"
}

resource "aws_cloudwatch_log_group" "cluster" {
  name              = "/aws/eks/${var.cluster_name}/cluster"
  retention_in_days = 7
}

The endpoint_public_access setting determines whether you can reach the API server from the internet. Setting it to false means you need VPN or bastion access to manage the cluster. The public_access_cidrs list restricts which IP addresses can connect; use your office IP ranges for better security than 0.0.0.0/0.

Cluster creation takes 10–15 minutes. If it fails, check CloudWatch logs for details, though the error messages often require deep Kubernetes knowledge to decipher.

Deploying managed node groups

Node groups provide the compute that actually runs your workloads. Managed node groups handle EC2 lifecycle, automatic security patching, and node updates.

resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.cluster_name}-main"
  node_role_arn   = aws_iam_role.node_group.arn
  subnet_ids      = aws_subnet.private[*].id

  scaling_config {
    desired_size = 2
    max_size     = 4
    min_size     = 1
  }

  update_config {
    max_unavailable_percentage = 33
  }

  instance_types = ["t3.medium"]

  disk_size = 20

  labels = {
    role = "general"
  }

  tags = {
    "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned"
    "k8s.io/cluster-autoscaler/enabled"             = "true"
  }

  depends_on = [
    aws_iam_role_policy_attachment.node_policies
  ]
}

Use t3.medium at minimum; t3.small nodes run out of memory with standard Kubernetes system pods. The disk_size of 20GB handles basic workloads, but increase it if you're running image-heavy applications or need local storage.

For production, create multiple node groups with different purposes:

resource "aws_eks_node_group" "spot" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.cluster_name}-spot"
  node_role_arn   = aws_iam_role.node_group.arn
  subnet_ids      = aws_subnet.private[*].id
  capacity_type   = "SPOT"

  scaling_config {
    desired_size = 1
    max_size     = 3
    min_size     = 0
  }

  instance_types = ["t3.medium", "t3a.medium"]

  taints {
    key    = "spot"
    value  = "true"
    effect = "NO_SCHEDULE"
  }
}

Spot instances cost 70% less but can be terminated with two minutes' notice. The taint prevents critical workloads from landing on spot nodes unless they explicitly tolerate the instability.

Post-deployment configuration

After Terraform creates your cluster, configure kubectl access and install essential add-ons:

resource "null_resource" "kubectl_config" {
  provisioner "local-exec" {
    command = "aws eks update-kubeconfig --region ${var.aws_region} --name ${var.cluster_name}"
  }
  depends_on = [aws_eks_cluster.main]
}

resource "aws_eks_addon" "vpc_cni" {
  cluster_name = aws_eks_cluster.main.name
  addon_name   = "vpc-cni"
  addon_version = "v1.15.4-eksbuild.1"
}

resource "aws_eks_addon" "ebs_csi_driver" {
  cluster_name = aws_eks_cluster.main.name
  addon_name   = "aws-ebs-csi-driver"
  addon_version = "v1.26.1-eksbuild.1"
}

The EBS CSI driver is mandatory for persistent volumes on newer Kubernetes versions. Without it, your StatefulSets can't provision storage.

Verify your cluster works:

kubectl get nodes
kubectl get pods -n kube-system

If nodes show NotReady, check security groups and subnet routing. System pods in CrashLoopBackOff usually indicate IAM permission problems or missing cluster add-ons.

Alternative cloud providers

Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) simplifies a few things: automatic node upgrades, built-in workload identity instead of IRSA configuration, and networking that works without subnet tagging gymnastics.

GKE clusters provision faster (5 minutes versus EKS's 15), and Google handles add-on management without you defining each one in Terraform. The trade-off is less control over the underlying infrastructure and weaker integration with non-Google services.

Azure Kubernetes Service (AKS)

Azure Kubernetes Service (AKS) sits between GKE's simplicity and EKS's flexibility.

Azure Active Directory integration means your existing enterprise identities work without translation layers, and the Azure CNI provides better network performance than EKS's default CNI. However, AKS's node pool concept differs from EKS node groups, meaning migration between them requires architectural changes, not just provider swaps.

Terraform somewhat abstracts these differences: all three providers use similar resource patterns for clusters and node pools, but each cloud's peculiarities leak through. EKS needs explicit IAM role configuration that GKE handles automatically, AKS requires different networking decisions than either AWS or Google, and version upgrade strategies vary wildly between providers.

Which alternative cloud provider is right for you?

EKSGKEAKS
You're already invested in AWS services, need deep integration with Route53, ALB, or IAM, or require the flexibility to customize every aspect of cluster behaviorYou want developer-friendly defaults and faster iterationYou're a Microsoft shop with existing Azure infrastructure

Alternative orchestration tools

HashiCorp Nomad

HashiCorp Nomad strips away Kubernetes complexity while keeping the core scheduling benefits.

You write job files instead of manifests and get a single binary that runs anywhere without the supporting cast of DNS, networking, and storage plugins that Kubernetes demands.

Nomad handles mixed workloads (containers, VMs, Java apps, and batch jobs) that would require separate Kubernetes operators, making it ideal when your infrastructure isn't yet container-native.

K3s

K3s compresses Kubernetes into a 50MB binary that runs on Raspberry Pis, which works well for edge deployments where EKS's control plane overhead is not feasible.

It includes batteries most teams need (Traefik ingress, local storage provider, ServiceLB) while removing cloud provider integrations you won't use outside AWS.

Development teams love K3s for local testing because it starts in seconds.

Amazon ECS

Consider Amazon ECS if you're AWS-native and don't need Kubernetes ecosystem tools.

Task definitions are simpler than Kubernetes manifests, Fargate integration works better than EKS's version, and you skip the cognitive overhead of translating between AWS and Kubernetes concepts.

The downside is that you're locked to AWS, and the ecosystem of third-party tools is a fraction of what Kubernetes offers.

Which orchestration tool is right for you?

Choose full Kubernetes (EKS) when you need portability, extensive ecosystem support, or have teams already familiar with kubectl. Pick alternatives when Kubernetes complexity outweighs its benefits.

Conclusion

With your foundation ready, deploy applications with Helm or Kubernetes manifests, set up GitOps with ArgoCD for continuous deployment (see our guide on Deploying an AWS EKS Cluster with Terraform and GitHub Actions for automated workflows), implement Prometheus and Grafana for observability, and add service mesh capabilities with Istio when your microservices architecture demands it.

Your cluster will evolve as you add node groups for different workload types, implement cluster autoscaling for cost optimization, enable backup strategies for stateful applications, and eventually expand to multi-region deployments for disaster recovery.

For teams ready to scale beyond manual Terraform workflows, Terrateam automates the operational overhead of managing infrastructure as code.

Instead of coordinating plan and apply commands across team members, Terrateam provides automated workflows that run plans on pull requests, enforce security policies before applying, detect and alert on infrastructure drift, and maintain audit logs for compliance requirements.

Sign up for Terrateam to automate your infrastructure deployment workflow.