How to Deploy Grafana with Terraform on AWS and Kubernetes

What you'll learn: How to build production-ready monitoring infrastructure with persistent storage, IAM authentication, and automated configuration, and then configure data sources programmatically using the Grafana Terraform provider. The guide covers importing pre-built dashboards for CloudWatch and Prometheus, plus setting up alerting rules that catch issues before they become outages.

Introduction

Monitoring infrastructure without visualization creates blind spots. You collect metrics from dozens of services, but when something breaks, you're stuck piecing together data from CloudWatch, Prometheus, application logs, and custom dashboards scattered across different tools. By the time you find the issue, users have already noticed.

Grafana consolidates metrics into a single interface where patterns and anomalies become immediately visible. But deploying it manually wastes time and introduces inconsistency. Your staging environment uses different settings from production, making it harder to catch issues before they affect users.

This article discusses deploying Grafana as code on AWS EC2 and Kubernetes, using Terraform to eliminate manual setup and ensure your monitoring infrastructure deploys the same way every time.

What is Grafana?

Grafana is an open-source visualization platform that transforms metrics and logs into understandable dashboards. It connects to dozens of data sources, such as Prometheus, InfluxDB, and AWS CloudWatch, to let you query and visualize data from anywhere in your infrastructure. It excels at creating real-time dashboards that make patterns and anomalies immediately visible, especially when you're trying to spot issues before they become outages.

Grafana also handles alerting and notification. You can define thresholds on any metric and route alerts through email, Slack, PagerDuty, or other channels. Teams use it for infrastructure monitoring, application performance management, business analytics, and IoT monitoring. The common thread is turning raw time-series data into actionable insights that help you understand what's happening in your systems.

Benefits of using Grafana with Terraform

Unified monitoring across platforms

Grafana pulls data from multiple sources into a single interface. You can visualize AWS CloudWatch metrics alongside Prometheus data from your Kubernetes cluster, add application logs from Elasticsearch, and overlay business metrics in one dashboard. This unified monitoring approach eliminates the context-switching that slows down troubleshooting.

Version-controlled infrastructure

Deploying Grafana through Terraform means your monitoring setup lives in Git. You can track changes, review modifications through pull requests, and roll back problematic updates. When configuration breaks, you have a clear audit trail. For more on structuring your configurations, see our guide on Terraform code organization.

Consistent environments

Terraform deploys identical Grafana instances across development, staging, and production. Your staging environment mirrors production's monitoring setup, so you catch dashboard issues before they affect production.

Configure through code

The Grafana Terraform provider configures data sources, dashboards, and alert rules through code. This prevents configuration drift and simplifies replicating complex setups. When you need to spin up a new Grafana instance, you apply the same configuration instead of manually recreating dozens of data sources.

Deploying Grafana to AWS

Before you start, you'll need:

Terraform installed (version 1.0 or later)
AWS CLI configured with valid credentials
An SSH key pair created in your AWS region

The following configuration creates everything needed to run Grafana on an EC2 instance with CloudWatch access.

Notes:

Replace the AMI ID with the latest Amazon Linux 2 AMI for your region.
The code uses local state for illustration. For team collaboration, configure remote state.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

# Security group for Grafana access
resource "aws_security_group" "grafana" {
  name        = "grafana-access"
  description = "Allow Grafana and SSH access"

  ingress {
    from_port   = 3000
    to_port     = 3000
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# IAM role for CloudWatch access
resource "aws_iam_role" "grafana" {
  name = "grafana-cloudwatch-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "grafana_cloudwatch" {
  role       = aws_iam_role.grafana.name
  policy_arn = "arn:aws:iam::aws:policy/CloudWatchReadOnlyAccess"
}

resource "aws_iam_instance_profile" "grafana" {
  name = "grafana-instance-profile"
  role = aws_iam_role.grafana.name
}

# EC2 instance with Grafana installed,  # Use the latest Amazon Linux 2 AMI for your region.
resource "aws_instance" "grafana" {
  ami                    = "ami-0000000000000000"
  instance_type          = "t3.small"
  key_name               = "your-key-pair-name"
  vpc_security_group_ids = [aws_security_group.grafana.id]
  iam_instance_profile   = aws_iam_instance_profile.grafana.name

  user_data = <<-EOF
    #\!/bin/bash
    # Add Grafana repository
    cat > /etc/yum.repos.d/grafana.repo <<'REPO'
    [grafana]
    name=grafana
    baseurl=https://packages.grafana.com/oss/rpm
    repo_gpgcheck=1
    enabled=1
    gpgcheck=1
    gpgkey=https://packages.grafana.com/gpg.key
    sslverify=1
    sslcacert=/etc/pki/tls/certs/ca-bundle.crt
    REPO

    # Install and start Grafana
    yum install -y grafana
    systemctl daemon-reload
    systemctl enable grafana-server
    systemctl start grafana-server
  EOF

  tags = {
    Name = "grafana-server"
  }
}

# Elastic IP for consistent access
resource "aws_eip" "grafana" {
  instance = aws_instance.grafana.id
  domain   = "vpc"
}

output "grafana_url" {
  value = "http://${aws_eip.grafana.public_ip}:3000"
}

In this configuration, the security group opens port 3000 for Grafana's web interface and port 22 for SSH access (you should restrict the SSH CIDR block to your IP address in production). The IAM role grants CloudWatchReadOnlyAccess, which lets Grafana query AWS metrics without requiring additional credentials.

The user_data script runs when the instance launches, installing Grafana from the official repository and configuring it to start automatically. This way, the installation is simple and repeatable. The Elastic IP ensures your Grafana instance keeps the same public IP address even if you need to replace the EC2 instance.

Apply this configuration with standard Terraform commands: terraform init and terraform apply. To automate these deployments through CI/CD, check out our guide on best practices for CI/CD pipelines.

Once Terraform finishes, grab the URL from the output and open it in your browser. Grafana takes about 30 seconds to fully start after the instance launches, so give it a moment if the page doesn't load immediately.

The default credentials are:

Username: admin
Password: admin

Grafana prompts you to change the password on first login. Do this immediately; leaving default credentials on a public-facing instance is a security risk.

How to Deploy Grafana with Terraform on AWS and Kubernetes Grafana login

Monitoring AWS services with Grafana

Setting up CloudWatch data source

The Grafana Terraform provider lets you configure Grafana resources through code instead of clicking through the web interface. You'll need the provider credentials, and then you can add data sources programmatically:

terraform {
  required_providers {
    grafana = {
      source  = "grafana/grafana"
      version = "~> 2.0"
    }
  }
}

provider "grafana" {
  url  = "http://${aws_eip.grafana.public_ip}:3000"
  auth = "admin:your-new-password"
}

resource "grafana_data_source" "cloudwatch" {
  type = "cloudwatch"
  name = "CloudWatch"

  json_data_encoded = jsonencode({
    defaultRegion = "us-east-1"
    authType      = "default"
  })
}

The provider connects to your Grafana instance using basic authentication. In production, you'd use an API key instead of the admin password, but this works for getting started. The authType = "default" setting tells Grafana to use the EC2 instance's IAM role for CloudWatch authentication, the same role you attached to the instance earlier. This eliminates the need to manage AWS access keys.

For teams using Grafana Cloud instead of self-hosted instances, the Grafana Cloud Terraform provider offers similar capabilities with managed infrastructure. The configuration pattern stays the same, but you authenticate against Grafana's cloud API rather than a local instance.

Creating your first dashboard

Rather than building dashboards from scratch, import pre-built ones from Grafana's community library. Add this resource to your Terraform configuration:

resource "grafana_dashboard" "cloudwatch_ec2" {
  config_json = jsonencode({
    dashboard = jsondecode(file("${path.module}/dashboards/ec2-monitoring.json"))
    overwrite = true
  })
}

Download the EC2 monitoring dashboard JSON from Grafana's website. Save it to a dashboards/ directory in your Terraform project. The configuration loads the JSON and creates the dashboard automatically. You can also create custom dashboards through the Grafana UI and export them as JSON to version control them.

How to Deploy Grafana with Terraform on AWS

Basic alerting

Set up a simple alert to notify you when EC2 CPU usage exceeds 80%:

resource "grafana_contact_point" "email" {
  name = "Email Alerts"

  email {
    addresses = ["ops@yourcompany.com"]
  }
}

resource "grafana_rule_group" "ec2_alerts" {
  name             = "EC2 Alerts"
  folder_uid       = grafana_folder.alerts.uid
  interval_seconds = 60

  rule {
    name      = "High CPU Usage"
    condition = "A"

    data {
      ref_id = "A"

      relative_time_range {
        from = 600
        to   = 0
      }

      datasource_uid = grafana_data_source.cloudwatch.uid
      model = jsonencode({
        refId      = "A"
        queryType  = "timeSeriesQuery"
        namespace  = "AWS/EC2"
        metricName = "CPUUtilization"
        statistic  = "Average"
        dimensions = {
          InstanceId = aws_instance.grafana.id
        }
      })
    }

    no_data_state  = "NoData"
    exec_err_state = "Error"

    for = "5m"

    annotations = {
      description = "EC2 instance CPU usage is above 80%"
    }

    labels = {
      severity = "warning"
    }
  }
}

resource "grafana_folder" "alerts" {
  title = "Alert Rules"
}

Here you're creating an email contact point and an alert rule that checks CPU usage every minute. The alert fires if the CPU stays above 80% for five consecutive minutes, which prevents false alarms from brief spikes. The for duration gives transient issues time to resolve themselves before triggering notifications.

You can add more complex notification channels like SNS, Slack, or PagerDuty by changing the contact point configuration. The pattern stays the same:

Define the channel
Create the rule
Specify thresholds that make sense for your infrastructure

Deploying Grafana to Kubernetes

Before deploying to Kubernetes, make sure you have:

A running Kubernetes cluster (EKS, GKE, or any other distribution)
kubectl configured and able to access your cluster
Terraform installed with the Kubernetes provider

If you're deploying to AWS EKS, see our detailed guide on deploying an AWS EKS cluster with Terraform.

Below is an example configuration that deploys Grafana with persistent storage and production-ready settings.

Notes:

This won't work in CI/CD. For CI/CD pipelines, use service account tokens or OIDC authentication instead of local kubeconfig files.
The code uses local state for illustration. For team collaboration, configure remote state.

terraform {
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
  }
}

provider "kubernetes" {
  config_path = "~/.kube/config"
}

resource "kubernetes_namespace" "grafana" {
  metadata {
    name = "grafana"
  }
}

resource "kubernetes_secret" "grafana_admin" {
  metadata {
    name      = "grafana-admin"
    namespace = kubernetes_namespace.grafana.metadata[0].name
  }

  data = {
    admin-user     = "admin"
    admin-password = "your-secure-password"
  }
}

resource "kubernetes_persistent_volume_claim" "grafana" {
  metadata {
    name      = "grafana-storage"
    namespace = kubernetes_namespace.grafana.metadata[0].name
  }

  spec {
    access_modes = ["ReadWriteOnce"]
    resources {
      requests = {
        storage = "10Gi"
      }
    }
  }
}

resource "kubernetes_deployment" "grafana" {
  metadata {
    name      = "grafana"
    namespace = kubernetes_namespace.grafana.metadata[0].name
  }

  spec {
    replicas = 1

    selector {
      match_labels = {
        app = "grafana"
      }
    }

    template {
      metadata {
        labels = {
          app = "grafana"
        }
      }

      spec {
        container {
          name  = "grafana"
          image = "grafana/grafana:10.2.0" # Use latest stable version

          port {
            container_port = 3000
          }

          env {
            name = "GF_SECURITY_ADMIN_USER"
            value_from {
              secret_key_ref {
                name = kubernetes_secret.grafana_admin.metadata[0].name
                key  = "admin-user"
              }
            }
          }

          env {
            name = "GF_SECURITY_ADMIN_PASSWORD"
            value_from {
              secret_key_ref {
                name = kubernetes_secret.grafana_admin.metadata[0].name
                key  = "admin-password"
              }
            }
          }

          resources {
            requests = {
              memory = "256Mi"
              cpu    = "250m"
            }
            limits = {
              memory = "512Mi"
              cpu    = "500m"
            }
          }

          volume_mount {
            name       = "grafana-storage"
            mount_path = "/var/lib/grafana"
          }

          liveness_probe {
            http_get {
              path = "/api/health"
              port = 3000
            }
            initial_delay_seconds = 30
            period_seconds        = 10
          }

          readiness_probe {
            http_get {
              path = "/api/health"
              port = 3000
            }
            initial_delay_seconds = 5
            period_seconds        = 5
          }
        }

        volume {
          name = "grafana-storage"
          persistent_volume_claim {
            claim_name = kubernetes_persistent_volume_claim.grafana.metadata[0].name
          }
        }
      }
    }
  }
}

resource "kubernetes_service" "grafana" {
  metadata {
    name      = "grafana"
    namespace = kubernetes_namespace.grafana.metadata[0].name
  }

  spec {
    selector = {
      app = "grafana"
    }

    port {
      port        = 80
      target_port = 3000
    }

    type = "LoadBalancer"
  }
}

output "grafana_loadbalancer" {
  value = "Check status: kubectl get svc grafana -n grafana"
}

The persistent volume claim is important here. Without it, every time the Grafana pod restarts, whether from an upgrade, node failure, or cluster reschedule, you lose all dashboards, data sources, and alert configurations. The 10GB volume stores Grafana's SQLite database, which contains everything you've configured through the interface.

Resource limits prevent Grafana from consuming excessive cluster resources. The requests tell Kubernetes the minimum resources needed to run Grafana, while limits cap maximum usage. These values work well for most small to medium deployments, but you'll want to adjust them based on your dashboard complexity and query volume.

The liveness and readiness probes ensure Kubernetes routes traffic only to healthy pods and restarts containers that become unresponsive. The /api/health endpoint returns a 200 status when Grafana is ready to serve traffic. The initial delay gives Grafana time to start up before Kubernetes begins health checks.

When you apply to the configuration and Terraform completes, get the LoadBalancer IP address: kubectl get svc grafana -n grafana

Cloud providers typically take 1–2 minutes to provision the load balancer, so you might see <pending> initially. Once the IP appears, access Grafana at http://<EXTERNAL-IP>.

Username: admin
Password: your-secure-password (whatever you specified in the secret)

Change the default password immediately after first login, even though it's stored in a Kubernetes secret. Secrets provide better security than plain environment variables, but rotating credentials regularly is still good practice.

Monitoring Kubernetes with Grafana

Connecting to Prometheus

Most Kubernetes clusters run Prometheus for metrics collection. If you don't have Prometheus installed yet, the kube-prometheus-stack Helm chart provides a complete monitoring setup. For this guide, we'll assume Prometheus is already running in your cluster, typically in a monitoring namespace.

Configure Grafana to connect to Prometheus using the Grafana Terraform provider:

provider "grafana" {
  url  = "http://${kubernetes_service.grafana.status[0].load_balancer[0].ingress[0].ip}"
  auth = "admin:your-secure-password"
}

resource "grafana_data_source" "prometheus" {
  type = "prometheus"
  name = "Prometheus"
  url  = "http://prometheus-server.monitoring.svc.cluster.local:80"

  json_data_encoded = jsonencode({
    httpMethod    = "POST"
    timeInterval  = "30s"
  })
}

The URL uses Kubernetes DNS service discovery: prometheus-server.monitoring.svc.cluster.local. This pattern follows service-name.namespace.svc.cluster.local, which lets Grafana reach Prometheus without exposing it outside the cluster. If your Prometheus service has a different name or namespace, adjust the URL accordingly. The httpMethod = "POST" setting handles large queries more efficiently than GET requests.

Essential Kubernetes dashboards

Rather than building dashboards manually, import the community's Kubernetes cluster monitoring dashboard:

resource "grafana_dashboard" "kubernetes_cluster" {
  config_json = jsonencode({
    dashboard = {
      id    = null
      uid   = "kubernetes-cluster"
      title = "Kubernetes Cluster Monitoring"
      tags  = ["kubernetes"]
    }
    overwrite = true
  })
}

For a complete dashboard with pre-configured panels, download dashboard 315 from grafana.com/grafana/dashboards and reference it similarly to the CloudWatch example:

resource "grafana_dashboard" "k8s_monitoring" {
  config_json = file("${path.module}/dashboards/kubernetes-315.json")
}

This dashboard tracks the metrics that matter most for cluster health:

Node CPU and memory usage identify nodes under pressure before they affect workloads
Pod resource consumption shows which pods consume the most resources
Network I/O detects unusual traffic patterns or bandwidth saturation
Pod restart counts highlight unstable applications that crash frequently

These metrics give you early warning signs. High node memory usage means you need to scale your cluster or optimize workloads. Frequent pod restarts indicate application issues that need investigation.

How to Deploy Grafana with Terraform on Kubernetes

Kubernetes-specific alerts

Set up an alert for pods that restart too frequently, which usually signals application problems or resource constraints:

resource "grafana_contact_point" "slack" {
  name = "Slack Notifications"

  slack {
    url = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
  }
}

resource "grafana_rule_group" "kubernetes_alerts" {
  name             = "Kubernetes Alerts"
  folder_uid       = grafana_folder.k8s_alerts.uid
  interval_seconds = 60

  rule {
    name      = "Pod Restart Alert"
    condition = "A"

    data {
      ref_id = "A"

      relative_time_range {
        from = 600
        to   = 0
      }

      datasource_uid = grafana_data_source.prometheus.uid
      model = jsonencode({
        expr   = "rate(kube_pod_container_status_restarts_total[5m]) > 0"
        refId  = "A"
      })
    }

    for = "5m"

    annotations = {
      description = "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is restarting"
    }

    labels = {
      severity = "warning"
    }
  }
}

resource "grafana_folder" "k8s_alerts" {
  title = "Kubernetes Alerts"
}

This alert checks the restart rate over five minutes and fires if any pod restarts during that window. The PromQL query rate(kube_pod_container_status_restarts_total[5m]) > 0 calculates restart frequency, and the for = "5m" duration prevents alerts from brief one-time restarts. The annotation includes pod and namespace names, making it immediately clear which workload needs attention.

You can expand this pattern to monitor other conditions like pods stuck in a pending state, high container memory usage, or persistent volume claim capacity. The key is choosing thresholds that signal real problems without creating alert fatigue from normal operational noise.

Conclusion

Deploying Grafana through Terraform gives you repeatable, version-controlled monitoring infrastructure. The AWS approach works well for simpler setups or teams already running EC2-based infrastructure. The Kubernetes deployment fits containerized environments where you want Grafana running alongside your applications. Both methods use the Grafana Terraform provider to configure data sources and dashboards as code.

The configurations in this guide provide production-ready starting points. You'll want to adjust resource sizes, IDs, add authentication beyond basic passwords, and expand alerting rules based on your specific infrastructure. However, the foundation remains the same: infrastructure as code that deploys consistently.

Managing Terraform deployments manually gets messy as teams grow. Terrateam automates the workflow, handles secrets through OIDC, and adds approval gates for production. When you're deploying Grafana across multiple environments, it removes the manual coordination work.

Features

Learn More

Learn

Connect