Deploying an AWS EKS Cluster with Terraform and GitHub Actions
What you'll find in this guide
Introduction to deploying an AWS EKS Cluster
Spinning up an EKS cluster manually through the AWS console is easy. Building a production-ready cluster that won't break under load is hard. You need proper networking, security groups, IAM roles, node groups with autoscaling, and monitoring. Dozens of interdependent resources must work together perfectly.
Manual provisioning creates inconsistency. Every environment is a unique snowflake with enough subtle differences to cause deployment failures. Your staging cluster works fine, but production mysteriously fails because someone clicked the wrong subnet during setup.
This article discusses ways to reduce the chaos by using infrastructure as code. You'll write Terraform configurations that create everything from VPCs to worker nodes, then automate deployments with GitHub Actions.
Provisioning a production-ready EKS cluster using Terraform
To run EKS clusters in production, you need networking, security, worker nodes, and monitoring; all configured correctly, or you'll face outages and security holes.
Essential components include:
- A dedicated VPC with public/private subnets across multiple AZs
- Security groups restricting traffic to necessary ports
- IAM roles with least-privilege access
- Managed node groups with autoscaling
- Key add-ons like AWS Load Balancer Controller
Terraform's strength is consistency. It also handles complexity better than manual provisioning, especially when managing dependency relationships. EKS clusters depend on VPCs, node groups depend on clusters, and security groups reference each other. If your infrastructure is large, be certain you'll spend hours figuring out the correct sequence if you try to build this manually.
The AWS provider includes dedicated EKS resources: aws_eks_cluster
, aws_eks_node_group
, aws_eks_addon
. These understand EKS requirements like proper subnet tagging and load balancer discovery. When someone modifies your cluster through the console, Terraform detects the drift and lets you fix it.
Integrating the Amazon EKS cluster into a CI pipeline
As mentioned previously, manual EKS deployments create configuration drift and contribute to human errors. Your staging and production clusters diverge over time, causing mysterious application failures.
A CI pipeline treats EKS infrastructure like application code. Changes go through pull requests, get reviewed, and deploy automatically:
Code Change → PR → Plan → Review → Merge → Deploy → Verify
Benefits include predictable deployments, identical environments across stages, and simple rollbacks through Git reverts. Advanced workflows, such as security scanning, add-on validation, and automated testing, are then easy to implement.
GitHub Actions works well here because it integrates with both Terraform and kubectl:
- name: Deploy Infrastructure
run: terraform apply
- name: Configure Cluster
run: |
aws eks update-kubeconfig --name ${{ vars.CLUSTER_NAME }}
kubectl apply -f manifests/
The pipeline handles everything from VPC creation to running workloads in a single automated workflow. But first, let's see what the Terraform configuration for an EKS cluster looks like.
Writing Terraform configurations for an EKS cluster
For a full guide on how to organize your Terraform code, refer to the this article. For this guide you can have everything in one file.
Your EKS cluster needs a foundation of networking resources before the cluster itself can exist. Start with a VPC that provides isolation and proper subnet layout:
# vpc.tf
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "eks-vpc"
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "eks-private-${count.index + 1}"
"kubernetes.io/cluster/${var.cluster_name}" = "owned"
"kubernetes.io/role/internal-elb" = "1"
}
}
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 10}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "eks-public-${count.index + 1}"
"kubernetes.io/cluster/${var.cluster_name}" = "owned"
"kubernetes.io/role/elb" = "1"
}
}
The subnet tags tell AWS Load Balancer Controller where to place load balancers.
Create the EKS cluster with proper IAM configuration:
# eks.tf
resource "aws_eks_cluster" "main" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.arn
version = "1.29"
vpc_config {
subnet_ids = concat(aws_subnet.private[*].id, aws_subnet.public[*].id)
endpoint_private_access = true
endpoint_public_access = true
public_access_cidrs = ["0.0.0.0/0"]
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
depends_on = [
aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
]
}
resource "aws_iam_role" "cluster" {
name = "${var.cluster_name}-cluster-role"
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}
resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.cluster.name
}
Add managed node groups for your worker nodes:
# node-groups.tf
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "main-nodes"
node_role_arn = aws_iam_role.node_group.arn
subnet_ids = aws_subnet.private[*].id
instance_types = ["t3.medium"]
scaling_config {
desired_size = 2
max_size = 4
min_size = 1
}
update_config {
max_unavailable = 1
}
depends_on = [
aws_iam_role_policy_attachment.node_group_AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.node_group_AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.node_group_AmazonEC2ContainerRegistryReadOnly,
]
}
resource "aws_iam_role" "node_group" {
name = "${var.cluster_name}-node-group-role"
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}
Using this configuration, you can create a production-ready cluster with proper networking, security, and autoscaling. The node groups run in private subnets for security while maintaining internet access through NAT gateways.
Using GitHub Actions to plan and apply changes
GitHub Actions automates your EKS deployments the same way it handles any Terraform infrastructure. The key difference is adding Kubernetes verification steps after the cluster deploys.
The workflow builds on the same OIDC patterns from our CI/CD pipeline guide, but adds EKS-specific verification:
name: EKS Deployment
on:
pull_request:
paths: ['**.tf']
push:
branches: [main]
permissions:
id-token: write
contents: read
pull-requests: write
jobs:
deploy:
runs-on: ubuntu-latest
environment: ${{ github.ref == 'refs/heads/main' && 'production' || 'development' }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/GitHubActionsTerraformRole
aws-region: ${{ vars.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
- name: Terraform Plan
id: plan
run: terraform plan -var="cluster_name=${{ vars.CLUSTER_NAME }}"
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: terraform apply -auto-approve -var="cluster_name=${{ vars.CLUSTER_NAME }}"
- name: Verify Cluster
if: github.ref == 'refs/heads/main'
run: |
aws eks update-kubeconfig --name ${{ vars.CLUSTER_NAME }}
kubectl get nodes
kubectl get pods -A
Environment-specific deployments work through GitHub repository variables. Set CLUSTER_NAME
to dev-cluster
in development and prod-cluster
in production. The same workflow code handles both environments with different configurations.
You get:
- Consistent cluster configuration across all environments
- Built-in approval gates through GitHub Environments
- Automatic rollback capability through Git reverts
- Integration with existing code review processes
The verification step makes sure your cluster is functional before marking the deployment successful. Failed cluster creation gets caught immediately rather than discovered later during application deployment.
For multiple environments, you can duplicate the workflow with different branch triggers.
Handling secrets and verification steps
OIDC handles AWS authentication, but your applications running in EKS need their own secrets, like database passwords, API keys, and third-party service credentials. Never store these in your Terraform code or GitHub repository.
Instead, use AWS Secrets Manager and integrate it with Kubernetes:
- name: Create Application Secrets
run: |
aws secretsmanager create-secret \
--name "eks-app-secrets" \
--secret-string '{"db_password":"${{ secrets.DB_PASSWORD }}"}'
# Install AWS Secrets Store CSI Driver
kubectl apply -f https://raw.githubusercontent.com/aws/secrets-store-csi-driver-provider-aws/main/deployment/aws-provider-installer.yaml
Never commit kubeconfig files to version control:
- name: Configure kubectl
run: |
aws eks update-kubeconfig --name ${{ vars.CLUSTER_NAME }} --region ${{ vars.AWS_REGION }}
kubectl config current-context
Here, the aws eks update-kubeconfig
command creates a temporary kubeconfig that uses AWS credentials for authentication. It expires with your OIDC token, which maintains security.
Next, you can add a few verification steps to confirm your cluster works before declaring success:
- name: Cluster Health Check
run: |
# Verify nodes are ready
kubectl wait --for=condition=Ready nodes --all --timeout=300s
# Check system pods
kubectl get pods -n kube-system
# Verify DNS resolution
kubectl run test-pod --image=busybox --rm -it --restart=Never -- nslookup kubernetes.default
# Test load balancer controller
kubectl get deployment -n kube-system aws-load-balancer-controller
You can also add connectivity verification to make sure your cluster can reach external services:
# Test internet connectivity from nodes
kubectl run connectivity-test --image=curlimages/curl --rm -it --restart=Never -- curl -Is https://www.google.com
# Verify ECR access for image pulls
kubectl create secret docker-registry ecr-secret --docker-server=${{ vars.AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_REGION }}.amazonaws.com
These checks catch common issues like misconfigured security groups, broken DNS, or missing IAM permissions before your applications try to deploy.
Tips to deploy and manage your cluster
kubeconfig
in CI/CD requires different handling than local development. Store cluster connection details in GitHub secrets rather than committing config files:
- name: Setup Cluster Access
run: |
aws eks update-kubeconfig --name ${{ vars.CLUSTER_NAME }}
export KUBECONFIG=$HOME/.kube/config
Also, bootstrap essential add-ons through Terraform, not manual kubectl commands. Your cluster needs the AWS Load Balancer Controller and EBS CSI driver to function properly:
resource "aws_eks_addon" "ebs_csi" {
cluster_name = aws_eks_cluster.main.name
addon_name = "aws-ebs-csi-driver"
}
resource "helm_release" "argocd" {
name = "argocd"
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
namespace = "argocd"
create_namespace = true
}
Your Terraform creates the cluster and core add-ons. ArgoCD or similar tools handle application deployments. This separation prevents application changes from triggering expensive infrastructure plans.
Applications need cluster endpoints and certificate data that change between deployments. You can use Terraform data sources to pull dynamic cluster information:
data "aws_eks_cluster_auth" "main" {
name = aws_eks_cluster.main.name
}
Finally, test cluster functionality immediately after creation. Deploy a simple nginx pod and expose it through a LoadBalancer service. If simple workloads can't run, fix the cluster before deploying complex applications.
The goal is a cluster that works immediately without manual configuration steps. Everything your applications need should be automated through Terraform or ArgoCD manifests.
AWS EKS cluster version upgrade best practices
EKS upgrades are high-risk operations that can break your entire workload. Kubernetes changes APIs between versions, deprecates features, and introduces new security policies that might reject your existing pods.
To plan upgrades, check the Kubernetes changelog for breaking changes affecting your workloads. Also, test application compatibility in development clusters running the target version before upgrading production.
Don't skip over versions. For example, go from 1.27 to 1.28, then 1.28 to 1.29. Never jump directly from 1.27 to 1.29. The upgrade sequence matters too. Upgrade the cluster first, then managed add-ons, and finally the node groups:
resource "aws_eks_cluster" "main" {
version = "1.29" # Update this first
}
resource "aws_eks_addon" "vpc_cni" {
addon_version = "v1.15.0-eksbuild.2" # Update after cluster
}
resource "aws_eks_node_group" "main" {
version = "1.29" # Update last
}
Instead of clicking through the console and hoping you remember all the steps, you should be using code to upgrade:
- name: Upgrade Cluster
run: |
terraform plan -var="cluster_version=1.29"
terraform apply -auto-approve
Test upgrades in lower environments first. Your pipeline can automatically upgrade your development environment when you merge version changes, but you should require manual approval for staging and production deployments.
The key is treating upgrades like any other infrastructure change: planned, tested, and deployed through your existing automation rather than emergency console operations.
Conclusion
Your Terraform configurations create consistent clusters with proper networking, security, and autoscaling. GitHub Actions automates the entire process from code changes to running workloads. When you need a new environment, you modify variables and let the pipeline handle deployment.
With such a setup, your clusters deploy predictably across development, staging, and production. Version upgrades happen through pull requests rather than risky console operations. Applications get consistent infrastructure regardless of who deploys them or when.
Terrateam takes this foundation further by adding enterprise-grade features. Instead of maintaining custom workflows for dependency management between networking and application layers, Terrateam automatically orchestrates complex deployments. Policy enforcement, drift detection, and compliance reporting work out of the box.
For EKS specifically, Terrateam understands Kubernetes deployment patterns and integrates with tools like ArgoCD and Helm. Your infrastructure and application deployments coordinate automatically without custom scripting.
Start with the GitHub Actions approach to learn the fundamentals. When your team needs advanced orchestration and enterprise controls, Terrateam provides the next level of automation without throwing away your existing Terraform code.