Scaling Terraform with Terragrunt | Multi-Environment Management
Managing hundreds of Terraform modules across multiple AWS accounts starts as organized infrastructure-as-code. Six months later, it becomes a maze of configuration drift and copy-paste errors.
Your production backend configuration accidentally points to the staging state file because someone copied the wrong block. Development deploys with production's database password because variables were duplicated across thirty files. The VPC module fails because someone forgot to deploy the security groups first.
Terraform doesn't allow variables in backend configurations, meaning every module needs its own hardcoded version. Every environment requires you to copy the same variable definitions, hoping you update them all consistently. Due to manual dependency management, you have to run Terraform apply in precisely the right order across dozens of directories.
Terragrunt solves these problems by adding the orchestration layer that Terraform lacks.
It eliminates repetition through configuration inheritance, automates dependency management, and enables infrastructure-as-code at scale. Instead of managing hundreds of similar configurations, you manage one propagating template.
What is Terragrunt?
Terragrunt is a thin wrapper around Terraform that adds the orchestration capabilities Terraform lacks for enterprise-scale deployments. While Terraform provides the building blocks for infrastructure as code, Terragrunt supplies the blueprint for assembling those blocks into manageable systems.
Terraform excels at managing individual pieces of infrastructure, but when you're juggling dozens of environments across multiple AWS accounts, you need something to coordinate the complexity.
The gap it fills
Native Terraform forces you to repeat yourself constantly. Every module needs its own backend configuration, and every environment requires you to copy and paste the same variable definitions. When your infrastructure grows to hundreds of modules, this repetition becomes a maintenance burden.
Here's what you face with pure Terraform at scale:
# In every module, you write this:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/networking/vpc/terraform.tfstate" # Hardcoded, error-prone
region = "us-east-1"
}
}
Terragrunt eliminates this repetition by letting you define configuration once and inherit it everywhere. Key features include:
- Remote state configuration inheritance. Define your S3 backend once at the root, and every module automatically inherits it with dynamically generated state paths.
- Variable management across environments. Set common variables at the root level, then override only what changes per environment.
- Dependency orchestration. Tell Terragrunt that your app module depends on your database module, and it will handle the deployment order automatically.
- Multi-module execution. Deploy your entire infrastructure with
terragrunt run-all applyinstead of running Terraform in 20 different directories. - Hooks for automation. Run scripts before or after Terraform commands, perfect for validation, notifications, or custom workflows.
When to consider Terragrunt
Start evaluating Terragrunt when you encounter these scenarios:
- Managing 10+ similar environments that share the most configuration
- Orchestrating multi-account or multi-region AWS deployments
- Finding configuration drift between environments
- Needing standardized patterns across all your Terraform modules
If you're managing three VPCs that are 95% identical except for CIDR blocks and regions, you're ready for Terragrunt.
Terragrunt vs. Terraform: when to use them together
Pure Terraform works well for single environments or simple setups, but if you scale up to multiple environments, you'll hit limitations that Terraform alone can't overcome.
Pure Terraform limitations at scale
The most frustrating Terraform limitation is that backend configurations reject variables entirely. You must hardcode every value:
# Can't do this - Terraform will error:
terraform {
backend "s3" {
bucket = var.state_bucket
key = "${var.env}/terraform.tfstate"
}
}
# Must do this instead:
terraform {
backend "s3" {
bucket = "my-terraform-state" # Hardcoded
key = "prod/terraform.tfstate" # Hardcoded
}
}
This means copying backend blocks across every module, risking state file overwrites if you make a mistake.
Module sources face the same restriction. You can't deploy different module versions to different environments without separate code copies. When you lack native dependency management, you must manually run terraform apply in the correct order across dozens of modules. If you miss one dependency, your deployment fails partway through.
Where Terragrunt adds value
Terragrunt transforms these limitations into solved problems:
DRY backend configuration to define your backend once in the root terragrunt.hcl:
remote_state {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate" # Dynamic path
}
}
Variable inheritance, meaning common variables live at the root and environments only specify their differences:
# root terragrunt.hcl
inputs = {
instance_type = "t3.medium"
region = "us-east-1"
}
# prod/terragrunt.hcl - override only what changes
include "root" {
path = find_in_parent_folders()
}
inputs = {
instance_type = "t3.xlarge" # Bigger instances for prod
}
Module versioning so that each environment can pin different versions:
# prod/vpc/terragrunt.hcl
terraform {
source = "git::https://github.com/myorg/modules.git//vpc?ref=v2.0.0" # Prod on stable
}
# dev/vpc/terragrunt.hcl
terraform {
source = "git::https://github.com/myorg/modules.git//vpc?ref=main" # Dev tests latest
}
Dependency management to explicitly declare dependencies:
dependency "vpc" {
config_path = "../vpc"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id # Guaranteed correct order
}
Terragrunt vs. Terraform: A trade-off analysis and decision framework
Terragrunt adds another tool to learn and another layer to debug when things break. Your team will need training, and you're committing to Terragrunt's opinionated directory structure, but consider the alternative: untangling configuration drift, manually orchestrating deployments, and fixing copy-paste errors in backend configurations. The structured approach Terragrunt enforces prevents these issues from occurring.
You can decide how you want to implement your IaC setup by considering the following:
Terraform vs. Terragrunt: When to use each
| Stay with pure Terraform when: | Adopt Terragrunt when: |
|---|---|
| Managing single environments or proof-of-concepts Your infrastructure rarely changes The team is new to Infrastructure as Code | Managing 3+ similar environments Operating across multiple AWS accounts Configuration management consumes significant team time You need different module versions per environment |
The tipping point is when you spend more time managing Terraform configurations than managing actual infrastructure.
How to use Terragrunt with Terraform
Let's look at a multi-account AWS infrastructure example using Terragrunt. We'll create a setup that manages networking, databases, and applications across development, staging, and production accounts.
Terragrunt uses a hierarchical structure that mirrors your infrastructure organization. Here's a production-ready layout:
infrastructure/
├── terragrunt.hcl # Root configuration
├── _envcommon/ # Shared environment configs
│ ├── vpc.hcl
│ └── rds.hcl
├── dev/
│ ├── account.hcl # Dev account settings
│ ├── vpc/
│ │ └── terragrunt.hcl
│ ├── rds/
│ │ └── terragrunt.hcl
│ └── eks/
│ └── terragrunt.hcl
├── staging/
│ ├── account.hcl
│ └── [same structure as dev]
└── prod/
├── account.hcl
└── [same structure as dev]
Each terragrunt.hcl file contains only the unique configuration for that specific deployment. Everything else gets inherited. The _envcommon folder holds configurations that are shared across environments but vary by component type.
Creating the root configuration
Your root terragrunt.hcl is the single source of truth for backend configuration and shared variables:
# infrastructure/terragrunt.hcl
locals {
account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
account_name = local.account_vars.locals.account_name
aws_account_id = local.account_vars.locals.aws_account_id
aws_region = local.account_vars.locals.aws_region
}
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite"
}
config = {
bucket = "terraform-state-${local.aws_account_id}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = local.aws_region
encrypt = true
dynamodb_table = "terraform-locks-${local.aws_account_id}"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite"
contents = <<EOF
provider "aws" {
region = "${local.aws_region}"
assume_role {
role_arn = "arn:aws:iam::${local.aws_account_id}:role/TerraformExecutionRole"
}
default_tags {
tags = {
Environment = "${local.account_name}"
ManagedBy = "Terraform"
}
}
}
EOF
}
inputs = {
project_name = "myapp"
common_tags = {
Project = "MyApp"
Owner = "Platform Team"
}
}
This configuration automatically generates unique state paths for each module, creates backend configuration files, sets up provider configuration with account-specific roles, and applies consistent tagging.
Environment-specific configurations
Each environment folder contains an account.hcl with account-specific settings:
# dev/account.hcl
locals {
account_name = "dev"
aws_account_id = "123456789012"
aws_region = "us-east-1"
instance_types = {
web = "t3.small"
db = "db.t3.micro"
}
}
Individual modules inherit and override as needed:
# dev/vpc/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
include "envcommon" {
path = "${dirname(find_in_parent_folders())}/_envcommon/vpc.hcl"
}
inputs = {
vpc_cidr = "10.0.0.0/16" # Dev-specific CIDR
}
For production, you might use a different module version:
# prod/vpc/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://github.com/terraform-aws-modules/terraform-aws-vpc.git?ref=v5.1.0"
}
inputs = {
vpc_cidr = "10.10.0.0/16"
enable_nat_gateway = true # Production needs NAT
enable_vpn_gateway = true # Production needs VPN
}
Multi-account AWS pattern
Managing multiple AWS accounts requires careful role and permission setup. Each account has its own IAM role that Terragrunt assumes:
# prod/account.hcl
locals {
account_name = "prod"
aws_account_id = "987654321098"
aws_region = "us-east-1"
}
# prod/rds/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
dependency "vpc" {
config_path = "../vpc"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.database_subnets
# Production-specific configuration
instance_class = "db.t3.large"
allocated_storage = 100
backup_retention_period = 30
}
Dependency management in action
Dependencies ensure modules deploy in the correct order. Here's how to set up an EKS cluster that needs networking and database resources first:
# dev/eks/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::https://github.com/terraform-aws-modules/terraform-aws-eks.git?ref=v19.0.0"
}
dependency "vpc" {
config_path = "../vpc"
# Prevent running if VPC isn't ready
skip_outputs = false
}
dependency "rds" {
config_path = "../rds"
# Optional: Mock outputs for `terragrunt validate`
mock_outputs = {
db_endpoint = "mock-db.cluster-xyz.us-east-1.rds.amazonaws.com"
}
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}
inputs = {
cluster_name = "${local.account_name}-eks-cluster"
cluster_version = "1.28"
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnets
# Pass RDS endpoint to cluster
cluster_additional_security_group_ids = [dependency.vpc.outputs.default_security_group_id]
node_groups = {
main = {
desired_size = local.account_name == "prod" ? 3 : 1
max_size = local.account_name == "prod" ? 10 : 3
min_size = local.account_name == "prod" ? 3 : 1
instance_types = [local.instance_types.web]
}
}
}
When you run terragrunt run-all apply from the dev folder, Terragrunt deploys the VPC first, creates RDS in parallel with other independent resources, deploys EKS only after dependencies complete, and handles all state locking automatically. This orchestration happens without manual coordination.
Running Terragrunt in a CI pipeline
Automating Terragrunt deployments through GitHub Actions eliminates manual errors and enforces consistent deployment patterns. Your pipeline handles authentication, dependency resolution, and multi-module deployments across different AWS accounts. Before implementing Terragrunt workflows, review these CI/CD best practices for foundational automation patterns.
GitHub Actions workflow setup
Structure your repository to match your deployment strategy. Each environment gets its own workflow trigger:
.github/
└── workflows/
└── terragrunt.yml
infrastructure/
├── terragrunt.hcl
├── dev/
├── staging/
└── prod/
Branch protection rules enforce your deployment flow. Development deploys from feature branches, staging from the staging branch, and production only from main with required reviews.
Authentication and permissions
OIDC eliminates long-lived credentials. Each environment assumes a specific role with least-privilege permissions. Here's a complete pipeline:
name: Terragrunt Deploy
on:
push:
branches: ['main', 'dev']
paths: ['infrastructure/**']
pull_request:
paths: ['infrastructure/**']
env:
AWS_REGION: us-east-1
TF_VERSION: 1.5.0
TG_VERSION: 0.54.0
permissions:
id-token: write
contents: read
pull-requests: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Determine Environment
id: env
run: |
if [[ "${{ github.ref_name }}" == "main" ]]; then
echo "environment=prod" >> $GITHUB_OUTPUT
echo "aws_account=${{ secrets.PROD_AWS_ACCOUNT }}" >> $GITHUB_OUTPUT
else
echo "environment=dev" >> $GITHUB_OUTPUT
echo "aws_account=${{ secrets.DEV_AWS_ACCOUNT }}" >> $GITHUB_OUTPUT
fi
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ steps.env.outputs.aws_account }}:role/GitHubActionsRole
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Setup Terragrunt
run: |
wget -q https://github.com/gruntwork-io/terragrunt/releases/download/v${TG_VERSION}/terragrunt_linux_amd64
chmod +x terragrunt_linux_amd64
sudo mv terragrunt_linux_amd64 /usr/local/bin/terragrunt
- name: Terragrunt Plan
id: plan
run: |
cd infrastructure/${{ steps.env.outputs.environment }}
terragrunt run-all plan --terragrunt-non-interactive
- name: Terragrunt Apply
if: github.event_name == 'push'
run: |
cd infrastructure/${{ steps.env.outputs.environment }}
terragrunt run-all apply --terragrunt-non-interactive --auto-approve
This workflow determines the environment based on the branch, assumes the correct AWS role using OIDC, runs terragrunt plan for all pull requests, and applies changes when code merges to the branch.
For production environments, add a manual approval step:
deploy-prod:
if: github.ref == 'refs/heads/main'
environment: production # Creates manual approval gate
runs-on: ubuntu-latest
steps:
# ... previous setup steps ...
- name: Terragrunt Apply Production
run: |
cd infrastructure/prod
# Run with reduced parallelism for safety
terragrunt run-all apply \
--terragrunt-non-interactive \
--terragrunt-parallelism 1 \
--auto-approve
The environment: production setting requires manual approval before the job runs. Configure this in your repository settings under Environments.
Handling run-all commands
The run-all command deploys multiple modules while respecting dependencies. Control parallelism based on your AWS API limits:
# Fast but aggressive - good for dev
terragrunt run-all apply --terragrunt-parallelism 4
# Slow but safe - good for production
terragrunt run-all apply --terragrunt-parallelism 1
If one module fails, Terragrunt continues with modules that don't depend on it. Use --terragrunt-ignore-dependency-errors=false to fail fast instead.
Advanced patterns
For selective deployments when only certain modules change:
- name: Deploy Changed Modules Only
run: |
cd infrastructure/${{ steps.env.outputs.environment }}
# Get changed directories
CHANGED_DIRS=$(git diff --name-only HEAD^ | grep "^infrastructure/" | cut -d/ -f3 | sort -u)
for dir in $CHANGED_DIRS; do
if [ -d "$dir" ]; then
echo "Deploying $dir"
terragrunt apply --terragrunt-working-dir $dir \
--auto-approve --terragrunt-non-interactive
fi
done
Schedule drift detection to catch manual changes:
on:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM
jobs:
drift-detection:
runs-on: ubuntu-latest
steps:
# ... setup steps ...
- name: Check for Drift
run: |
cd infrastructure/prod
terragrunt run-all plan -detailed-exitcode --terragrunt-non-interactive || EXIT_CODE=$?
if [ "${EXIT_CODE}" = "2" ]; then
echo "::error::Drift detected in production!"
exit 1
fi
Troubleshooting common issues
State lock conflicts occur when deployments overlap. Extend the lock timeout:
- name: Apply with Extended Lock Timeout
run: |
terragrunt run-all apply \
--terragrunt-non-interactive \
--auto-approve \
--lock-timeout=30m
If locks get stuck, you may need to force unlock the state file.
Module download failures happen with private repositories. Clear the cache and retry:
- name: Clean and Retry
if: failure()
run: |
rm -rf .terragrunt-cache/
terragrunt run-all init --terragrunt-non-interactive
IAM permission errors need debugging. Enable detailed logging:
- name: Apply with Debug Output
if: failure()
env:
TF_LOG: DEBUG
TERRAGRUNT_LOG_LEVEL: debug
run: |
terragrunt apply --terragrunt-working-dir problem-module
Best practices and troubleshooting
When it comes to managing dozens of Terragrunt deployments, adopting certain patterns separate smooth operations from constant firefighting. Before diving into Terragrunt-specific practices, ensure you understand how to organize your Terraform code effectively.
Directory structure
Keep your hierarchy shallow (three levels maximum). Deep nesting makes navigation harder and slows down run-all commands:
infrastructure/
├── terragrunt.hcl
├── dev/
│ ├── us-east-1/
│ │ ├── vpc/
│ │ └── eks/
│ └── us-west-2/
└── prod/
Name your modules consistently. For example, if it's vpc in dev, it's vpc in prod, not network or virtual-private-cloud. This consistency enables automation like terragrunt run-all apply --terragrunt-include-dir "*/vpc".
Finally, pin module versions explicitly. Development can test newer versions while production stays stable:
# dev uses latest
source = ".../modules//vpc?ref=main"
# prod uses pinned version
source = ".../modules//vpc?ref=v2.1.0"
State management
It is recommended to use one state file per component per environment; in other words, never share state between environments. Terragrunt generates unique paths automatically if you use path_relative_to_include() in your backend configuration.
While S3 versioning helps with reverting to older versions of the state file, making explicit backups before risky operations is safer:
terragrunt state pull > backup-$(date +%Y%m%d).tfstate
When migrating existing Terraform to Terragrunt, carefully migrate the existing state. Run terragrunt init first to create the new state location, then use Terraform's built-in migration prompts.
Variable management
Use locals for computed values instead of duplicating logic:
locals {
environment = split("/", path_relative_to_include())[0]
region = split("/", path_relative_to_include())[1]
name_prefix = "${local.environment}-${local.region}"
}
Document variable inheritance in your root terragrunt.hcl. Future team members need to understand what comes from where. Add validation where possible to catch errors early.
Performance optimization
Parallelism defaults to 10, which can overwhelm AWS APIs. Production should use -terragrunt-parallelism 2 to avoid rate limits, while development can push to between 4 and 5.
The download cache grows indefinitely. Add a weekly cleanup job to your CI pipeline. Use selective execution with --terragrunt-include-dir or --terragrunt-exclude-dir for targeted deployments.
Team collaboration
Document your Terragrunt patterns in a team runbook. Include directory structure conventions, how to add new environments, variable inheritance hierarchy, and troubleshooting steps.
Code reviews should check for circular dependencies, hardcoded values that belong in variables, and consistent naming.
New team members need Terragrunt training before touching production. The abstraction layer eventually helps, but can become confusing. Pair programming for the first few deployments to accelerate learning.
Common pitfalls and solutions
Even experienced teams encounter Terragrunt issues. Here's how to recognize and resolve them quickly.
Circular dependencies kill deployments instantly. Terragrunt detects them and fails with "Dependency cycle detected." This typically happens when Module A depends on Module B, which depends on Module C, which depends back on Module A. Security groups are frequent culprits.
Solution: restructure your modules so dependencies flow in one direction, or combine related security groups into a single module.
Cache corruption manifests as "module not found" errors or version mismatches, especially after switching branches. The fix is straightforward:
rm -rf .terragrunt-cache/
terragrunt run-all init
Add .terragrunt-cache/ to your .gitignore file – this cache should never be committed.
Version conflicts between Terraform and Terragrunt cause cryptic errors like "invalid character" or "unknown function." Terragrunt 0.54+ requires Terraform 1.0+. Always pin both versions in your CI pipeline and document them in your README.
Module source authentication with private repositories fails silently. For local development, configure Git to use SSH. For CI/CD, use GitHub Apps or deploy keys instead of personal access tokens.
Conclusion
Terragrunt transforms Terraform from a powerful tool into a scalable platform. Through inheritance, automated dependency management, and simplified multi-account deployments, you've eliminated configuration duplication. Your modules now follow standardized patterns that scale.
The investment pays off quickly when managing complex infrastructures. Teams juggling 100+ modules across multiple AWS accounts save more time on manual orchestration than they spend learning Terragrunt's patterns. The opinionated structure that seems restrictive at first becomes the foundation for reliable deployments.
Start by picking development as your proof of concept, where mistakes are cheap. Migrate existing modules gradually, establishing team standards for naming conventions and directory structures before they become implicit knowledge.
While Terragrunt solves orchestration, Terrateam adds enterprise automation. It understands Terragrunt's dependency graphs, provides policy enforcement across all modules, and adds pull request automation that respects your module relationships.
Ready to scale your infrastructure without the complexity? Sign up for Terrateam and transform your Terragrunt deployments with automated workflows.