Organizing Terraform Code for Scalability and Maintainability

Organizing Terraform Code for Scalability and Maintainability blog post

The Importance of Code Organization in Terraform Projects

Managing Infrastructure as Code (IaC) at scale is fundamentally different from handling a handful of resources. While getting started with Terraform/OpenTofu is relatively straightforward, maintaining large-scale infrastructure deployments introduces challenges that can’t be solved by simply writing more code. Code organization becomes a critical factor in whether your infrastructure remains manageable or devolves into a maintenance nightmare.

The difference between well-organized and poorly-organized infrastructure code becomes apparent as soon as you need to:

  • Make changes across multiple environments
  • Onboard new team members
  • Troubleshoot issues in production
  • Implement compliance requirements
  • Manage infrastructure across multiple regions or accounts

This is where proper code organization transitions from a “nice to have” to a critical requirement. Without it, even simple changes can become risky and time-consuming operations.

In this article, we’ll examine practical approaches to organizing Terraform/OpenTofu code in complex environments. We’ll look at specific techniques for building maintainable, scalable infrastructure code that works in real-world scenarios. Rather than focusing on basic concepts, we’ll explore how to implement patterns that help manage complexity in production environments.

You’ll learn:

  • How to structure modules for reusability without overcomplicating your codebase
  • Practical approaches to configuration management across environments
  • Techniques for managing state and shared data between components
  • Methods for scaling your infrastructure code across multiple regions and accounts

This isn’t just about following best practices—it’s about implementing organizational patterns that make your infrastructure code more maintainable, more reliable, and easier to work with as your software environment scales.

Note: For the concepts discussed in this article, OpenTofu and Terraform should be interchangeable. Within the article, configurations and concepts will generally be referred to as “Terraform” or “terraform”.

Building Reusable and Modular Components

One of the fastest ways to accumulate technical debt in your infrastructure code is to stuff everything into massive root modules. We’ve all seen it happen: what starts as a simple configuration grows into a sprawling 1000+ line file that everyone’s afraid to touch. Changes become risky, plan times stretch longer, and eventually, even simple updates turn into nerve-wracking operations.

The key to managing this complexity is breaking infrastructure code into focused, reusable modules. While IaC has its own unique characteristics, we can borrow proven software engineering principles to guide how we structure these modules.

Understanding Terraform/OpenTofu Modules

A Terraform/OpenTofu module is essentially a container for multiple resources that are used together. Think of it like a function in traditional programming - it encapsulates logic, accepts input variables, and returns outputs that other code can use. Every TF configuration is a module, including the root module: the directory containing your primary .tf files.

Modules consist of a collection of .tf and/or .tf.json files in a directory. A basic module structure typically includes:

Terminal window
modules/
└── network/
├── main.tf
└── outputs.tf
modules/network/main.tf
resource "aws_vpc" "this" {
cidr_block = "10.0.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "example-vpc"
}
}
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet("10.0.0.0/16", 8, count.index)
map_public_ip_on_launch = true
availability_zone = ["us-east-1a", "us-east-1b"][count.index]
tags = {
Name = "public-subnet-${count.index}"
}
}
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet("10.0.0.0/16", 8, 2 + count.index)
availability_zone = ["us-east-1a", "us-east-1b"][count.index]
tags = {
Name = "private-subnet-${count.index}"
}
}
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = {
Name = "example-igw"
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
tags = {
Name = "public-route-table"
}
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
}
resource "aws_route_table_association" "public" {
count = 2
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
modules/network/outputs.tf
output "vpc_id" {
description = "The VPC ID"
value = aws_vpc.this.id
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}

This basic module would be used as follows in a typical root module configuration:

main.tf
provider "aws" {
region = "us-east-1"
}
module "network" {}
output "vpc_id" {
value = module.network.vpc_id
}
output "public_subnets" {
value = module.network.public_subnet_ids
}
output "private_subnets" {
value = module.network.private_subnet_ids
}

While you could write all your Terraform configuration in a single directory, modules provide three key benefits that become critical as your infrastructure grows:

  1. First, modules enable code reuse. Instead of copying and pasting similar resource configurations across different parts of your infrastructure, you define the pattern once and reuse it with different input variables. This means less code to maintain and fewer opportunities for errors to creep in.

  2. Second, modules provide consistent abstractions. Rather than working directly with low-level resources everywhere, you can create modules that represent higher-level concepts like “application cluster” or “database platform” - concepts that align with how you actually think about your infrastructure.

  3. Third, modules help manage complexity by encapsulating implementation details. Teams working with your infrastructure don’t need to understand every resource configuration - they just need to know what inputs the module expects and what outputs it provides.

Implementing Parameterization for Flexibility

A module without parameters is like a blueprint that can only build one specific house - useful, but limited. Parameterization transforms your module into a flexible blueprint that can create different houses based on the owner’s needs, while maintaining structural integrity. In Terraform and OpenTofu, we achieve this flexibility through input variables.

Let’s take our network module from the previous section and make it adaptable to different scenarios. First, we’ll set up our module structure:

Terminal window
modules/
└── network/
├── main.tf
├── variables.tf
└── outputs.tf

The key is the file variables.tf. This is where we define what “knobs and dials” users can adjust when they use our module:

modules/network/variables.tf
variable "vpc_cidr_block" {
description = "The VPC CIDR block"
type = string
}
variable "public_subnet_count" {
description = "Number of public subnets to create"
type = number
default = 2
}
variable "private_subnet_count" {
description = "Number of private subnets to create"
type = number
default = 2
}
variable "azs" {
description = "List of availability zones to use"
type = list(string)
}
variable "tags" {
description = "Resource tags"
type = map(string)
default = {}
}

Notice how we’ve made some strategic decisions here. Some variables have default values, while others don’t. This isn’t random - we’re requiring users to explicitly specify critical values like the VPC CIDR block and availability zones, while providing sensible defaults for less critical options like the number of subnets. This balance between flexibility and convenience is key to creating user-friendly modules.

Here’s how someone would use our parameterized module:

main.tf
provider "aws" {
region = "us-east-1"
}
module "network" {
source = "./modules/network"
vpc_cidr_block = "10.0.0.0/16"
public_subnet_count = 2
private_subnet_count = 2
azs = ["us-east-1a", "us-east-1b"]
tags = {
Environment = "dev"
Project = "ExampleProject"
}
}
output "vpc_id" {
value = module.network.vpc_id
}
output "public_subnets" {
value = module.network.public_subnet_ids
}
output "private_subnets" {
value = module.network.private_subnet_ids
}

This module can now adapt to different environments and requirements. Need three public subnets in production? Just change the count. Want different CIDR blocks for different environments? Just pass in a different value. The module’s internal logic stays the same, but its output can vary based on the inputs it receives.

Remember: good parameterization isn’t about making everything configurable. It’s about identifying what truly needs to be flexible and what can be standardized. Each parameter you add is something users need to understand and maintain, so choose them thoughtfully.

Using Interfaces and Abstract Base Classes

While Terraform and OpenTofu don’t directly support object-oriented concepts like interfaces and abstract base classes, we can apply these principles to create more maintainable infrastructure code. The goal is to establish consistent patterns that help prevent technical debt and maintenance challenges as your infrastructure grows.

Consider a common scenario in growing organizations: EC2 instances that serve the same function end up with inconsistent names across different teams:

  • web-app-us-east-1-prod
  • app-web-use1-prd
  • wa-use1-prod

This naming inconsistency might seem minor initially, but it compounds over time. It complicates troubleshooting, makes automation more difficult, and often requires disruptive refactoring to standardize. By implementing interface-like patterns in our Terraform code, we can prevent this drift before it begins.

Defining Standard Interfaces Through Conventions

While we can’t enforce interfaces as strictly as in languages like Java or C#, Terraform’s built-in validation capabilities allow us to establish and maintain consistent patterns. In this section we’ll look at how to implement these guardrails.

Input Validation

Terraform’s validation blocks enable us to enforce standards during the plan phase, catching issues before they’re actually deployed to infrastructure:

variable "environment" {
type = string
description = "Environment name (e.g., prod, staging, dev)"
validation {
condition = contains(["prod", "staging", "dev"], var.environment)
error_message = "Environment must be one of: prod, staging, dev"
}
}
variable "resource_name" {
type = string
description = "Resource name following org conventions"
validation {
condition = can(regex("^[a-z][a-z0-9-]{2,30}[a-z0-9]$", var.resource_name))
error_message = "Resource names must be lowercase alphanumeric with hyphens, 4-32 chars, start with a letter, end with letter/number"
}
}

These validation blocks create effective guardrails by enforcing standardized environment names with consistent patterns and length requirements, a required tag format, and constraints that are specific and meaningful for specific resources.

Documentation Generation

Maintaining consistent documentation is crucial if you want to provide interface-like patterns for your modules. The terraform-docs utility automates this process by analyzing your code and generating standardized documentation. It captures:

  • Variable definitions and their constraints
  • Output specifications
  • Provider requirements
  • Module dependencies

You can configure documentation generation with a terraform-docs.yml in your module directory to define your desired format:

formatter: markdown table
content: |
# {{ .Name }}
{{ .Header }}
## Interface
This module implements the standard resource interface:
- Accepts environment, region, and application inputs
- Provides standardized name and tag outputs
- Follows organizational naming conventions
{{ .Inputs }}
{{ .Outputs }}
output:
file: README.md
mode: replace

Running terraform-docs (ideally in your CI/CD pipeline) ensures documentation stays current with your code, which alleviates cognitive load and provides up-to-date guidance on correct implementation of the module.

Create a Base Module for Common Metadata

A base module can serve as the foundation for consistent resource naming and tagging across your infrastructure.

Cloud Posse’s terraform-null-label is the gold standard pattern for this type of module, and it provides several features. For the sake of this article, we’re going to build a much simpler example to highlight the massive utility this implementation pattern offers:

modules/base/metadata/variables.tf
variable "environment" {
type = string
description = "Environment name (e.g., prod, staging, dev)"
validation {
condition = contains(["prod", "staging", "dev"], var.environment)
error_message = "Environment must be one of: prod, staging, dev"
}
}
variable "region_short" {
type = string
description = "Short region name (e.g., use1 for us-east-1)"
validation {
condition = can(regex("^[a-z]{3}[1-2]$", var.region_short))
error_message = "Region short name must be in format: use1, usw2, etc"
}
}
variable "application" {
type = string
description = "Application name"
validation {
condition = can(regex("^[a-z0-9-]+$", var.application))
error_message = "Application name must contain only lowercase letters, numbers, and hyphens"
}
}
variable "component" {
type = string
description = "Component name (e.g., web, api, db)"
validation {
condition = can(regex("^[a-z0-9-]+$", var.component))
error_message = "Component name must contain only lowercase letters, numbers, and hyphens"
}
}

Outputs:

modules/base/metadata/outputs.tf
output "resource_name" {
description = "Standardized resource name"
value = "${var.application}-${var.component}-${var.region_short}-${var.environment}"
}
output "tags" {
description = "Standardized tag map"
value = {
Environment = var.environment
Application = var.application
Component = var.component
Region = var.region_short
ManagedBy = "terraform"
}
}

This base module becomes particularly valuable when implementing specific resource modules. Here’s how it ensures consistency in a web server configuration:

module "metadata" {
source = "./modules/base/metadata"
environment = "prod"
region_short = "use1"
application = "portal"
component = "web"
}
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
tags = module.metadata.tags
tags_all = merge(
module.metadata.tags,
{
Name = module.metadata.resource_name
}
)
}

It takes some investment and effort up front, but if you implement these patterns as early as possible you create a foundation for consistent resource management that scales with your infrastructure. The initial investment in establishing these conventions prevents costly standardization efforts later and makes your infrastructure more maintainable over time.

Applying Design Principles to Module Development

After establishing our base module patterns, we need to consider how broader software engineering principles guide the development of our module ecosystem. Just as in traditional software development, principles like DRY (Don’t Repeat Yourself) and appropriate abstraction levels help create maintainable, scalable infrastructure code.

Finding the Right Balance

While it’s tempting to aggressively modularize every piece of infrastructure code, experience shows that both over-modularization and under-modularization can create problems. Consider this example of over-modularization:

# Too granular - separate modules for each security group rule
module "app_ingress_http" {
source = "./modules/security-group-rule"
security_group_id = aws_security_group.app.id
type = "ingress"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
module "app_ingress_https" {
source = "./modules/security-group-rule"
security_group_id = aws_security_group.app.id
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

Instead, a more balanced approach consolidates related functionality while maintaining flexibility:

module "app_security" {
source = "./modules/security-groups"
vpc_id = var.vpc_id
name = "app-security"
ingress_rules = {
http = {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
https = {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
}

Consolidating groups related resources logically, provides flexibility through configuration rather than needing to add overly granular new modules, and makes the infrastructure’s intent clearer.

Applying DRY Principles Effectively

The DRY principle suggests that “every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” In Terraform and OpenTofu, this manifests in several practical ways.

First, use local values to centralize repeated calculations:

locals {
# Define once, use many times
common_tags = merge(var.tags, {
Environment = var.environment
ManagedBy = "terraform"
})
# Derive related values in one place
name_prefix = "${var.project}-${var.environment}"
container_name = "${local.name_prefix}-container"
task_name = "${local.name_prefix}-task"
service_name = "${local.name_prefix}-service"
}

Second, create reusable expression patterns for common logic:

locals {
# Define common conditional logic once
is_production = var.environment == "prod"
instance_type = local.is_production ? "t3.large" : "t3.small"
min_capacity = local.is_production ? 3 : 1
max_capacity = local.is_production ? 10 : 3
}

Third, leverage dynamic blocks for repeated resource configurations:

resource "aws_security_group" "app" {
name = "app-security-group"
description = "Application security group"
vpc_id = var.vpc_id
dynamic "ingress" {
for_each = var.ingress_rules
content {
from_port = ingress.value.from_port
to_port = ingress.value.to_port
protocol = ingress.value.protocol
cidr_blocks = ingress.value.cidr_blocks
}
}
}

Structuring for Growth

As your infrastructure code grows, maintain clarity through thoughtful organization:

Terminal window
.
├── modules/
├── base/
├── metadata/ # Our standard metadata module
└── networking/ # Common network patterns
├── compute/
├── ecs-service/ # ECS service patterns
└── ec2-instance/ # EC2 patterns
└── data/
├── rds-cluster/ # Database patterns
└── elasticache/ # Cache patterns
├── environments/
├── prod/
├── staging/
└── dev/
└── examples/ # Working examples of module usage
├── basic/
└── complete/

This organizational structure creates clear boundaries between reusable modules and environment-specific code while providing examples that demonstrate proper module usage.

Balancing Flexibility and Constraints

When designing modules, strive to make them flexible enough to be reusable but constrained enough to enforce standards. Give users some room for customization with sensible guardrails:

module "web_cluster" {
source = "./modules/compute/ecs-service"
# Required standardized inputs
environment = var.environment
region = var.region
service = "web"
# Optional customization with sensible defaults
container_port = var.container_port != null ? var.container_port : 8080
desired_count = local.is_production ? 3 : 1
# Required but flexible inputs
container_image = var.container_image # No default - must be specified
# Standardized tags with custom additions allowed
tags = merge(local.common_tags, var.custom_tags)
}

Successful module development requires striking a careful balance between standardization and flexibility. By requiring certain inputs while making others optional, and by providing sensible defaults while allowing overrides, you create modules that guide users toward best practices without being overly restrictive. This ensures that your infrastructure remains maintainable as it scales, while still providing the flexibility needed to handle unique requirements and edge cases.

Configuration Management and Environment Isolation

Configuration and isolation is all about separation: keeping configuration separate from “code”, and keeping our environments separate from each other. For environment isolation, Terraform provides a couple of different implementations for us to choose from.

For configuration, we’ll once again borrow another concept from software engineering in this section. Specifically, the Config principle from the Twelve-Factor app framework:

…Apps sometimes store config as constants in the code. This is a violation of twelve-factor, which requires strict separation of config from code. Config varies substantially across deploys, code does not.

We’ll look at how to apply the principle of separation in our configurations and environment.

Separating Configuration from Code

One of the fundamental principles of Twelve-Factor is to separate configuration data from your actual code. This separation allows you to maintain different configurations for various environments while keeping your core infrastructure code DRY and maintainable, as well as more secure.

Here’s an example of a well-structured configuration setup:

variables.tf
variable "ami_id" {
type = string
description = "The ID of the AMI to use for the instance"
validation {
condition = length(var.ami_id) > 4 && substr(var.ami_id, 0, 4) == "ami-"
error_message = "The ami_id value must be a valid AMI ID, starting with \"ami-\"."
}
}
variable "instance_type" {
type = string
description = "The type of instance to start"
validation {
condition = can(regex("^t[23].+|m[456].+", var.instance_type))
error_message = "Instance type must be a valid AWS instance type."
}
}
variable "environment" {
type = string
description = "Environment name (e.g., dev, staging, prod)"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "tags" {
type = map(string)
description = "A map of tags to add to all resources"
default = {}
}

The main configuration file builds upon these variables, implementing common patterns and environment-specific logic:

main.tf
locals {
# Common tag structure
required_tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = "example-app"
}
# Merge required tags with user-provided tags
all_tags = merge(local.required_tags, var.tags)
}
resource "aws_instance" "web_server" {
ami = var.ami_id
instance_type = var.instance_type
tags = local.all_tags
# Add environment-specific configurations
root_block_device {
volume_size = var.environment == "prod" ? 100 : 20
encrypted = true
}
}

With this structure in place, you can create separate variable files for each environment:

environments/dev.tfvars
ami_id = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
environment = "dev"
tags = {
CostCenter = "development"
Team = "platform"
}
environments/prod.tfvars
ami_id = "ami-0947d2ba12ee1ff75"
instance_type = "t2.medium"
environment = "prod"
tags = {
CostCenter = "production"
Team = "platform"
Backup = "daily"
}

You can then apply these configurations using the -var-file flag:

Terminal window
# For development
tofu apply -var-file=environments/dev.tfvars
# For production
tofu apply -var-file=environments/prod.tfvars

Environment Isolation Strategies

When it comes to isolating environments, you have two primary approaches to consider, although the generally accepted best practice in most cases will be the first approach: separate state files.

Separate State Files

The ideal approach is to maintain completely separate state files for each environment:

environments/prod/backend.tf
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
environments/dev/backend.tf
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "dev/terraform.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-locks"
}
}

This provides complete isolation between environments, allowing different access controls per environment and enabling separate state locking. Most importantly, it makes it impossible to accidentally affect other environments during operations.

Workspaces

For simpler setups, Terraform workspaces can provide environment isolation:

main.tf
locals {
workspace_config = {
dev = {
instance_type = "t2.micro"
min_capacity = 1
max_capacity = 2
}
prod = {
instance_type = "t2.medium"
min_capacity = 2
max_capacity = 10
}
}
# Get configuration for current workspace
config = local.workspace_config[terraform.workspace]
}
resource "aws_instance" "example" {
instance_type = local.config.instance_type
ami = var.ami_id
tags = {
Environment = terraform.workspace
Name = "example-${terraform.workspace}"
}
}
resource "aws_autoscaling_group" "example" {
min_size = local.config.min_capacity
max_size = local.config.max_capacity
# ... other configuration ...
}

Workspace management is straightforward:

Terminal window
# Create environments
tofu workspace new prod
tofu workspace new dev
# Switch environments
tofu workspace select prod
tofu apply

While workspaces are simpler to manage, they come with some serious limitations. All state is stored in the same backend, which increases the risk of accidental cross-environment changes. They also provide less granular access control and aren’t recommended for production use at scale.

Environment-Specific Variable Interpolation

Using locals effectively can help manage environment-specific configurations while keeping your code DRY:

locals {
environment = var.environment
app_name = "myapp"
# Environment-specific configurations
config = {
domain_name = local.environment == "prod" ? "${local.app_name}.com" : "${local.app_name}-${local.environment}.com"
monitoring = local.environment == "prod" ? {
evaluation_periods = 2
period = 60
threshold = 90
} : {
evaluation_periods = 3
period = 120
threshold = 80
}
backup = local.environment == "prod" ? {
retention_days = 30
frequency = "daily"
} : {
retention_days = 7
frequency = "weekly"
}
}
}
# Use the configurations
resource "aws_route53_record" "www" {
zone_id = aws_route53_zone.primary.zone_id
name = local.config.domain_name
type = "A"
ttl = 300
records = [aws_eip.web.public_ip]
}
resource "aws_cloudwatch_metric_alarm" "cpu" {
alarm_name = "cpu-utilization"
evaluation_periods = local.config.monitoring.evaluation_periods
period = local.config.monitoring.period
threshold = local.config.monitoring.threshold
# ... other configuration ...
}

The success of environment-specific configurations depends on maintaining consistent, sane naming conventions and carefully documenting the reasoning behind environmental differences. Using the input validation principles we discussed earlier, you can catch configuration errors early before they reach live environments.

State file isolation should be your default approach when setting up environment separation. While Terraform workspaces provide a simpler option that may be useful during local development, they introduce unnecessary risks in production environments due to their shared state backend. Implementing proper state file isolation from the start helps establish well-defined security boundaries and makes your infrastructure easier to maintain as it grows. This upfront investment in proper state management prevents the need for complex refactoring later and provides a solid foundation for scaling your infrastructure.

Inheritance, Overrides, and Data Sharing

As your infrastructure grows, you’ll likely find yourself facing a familiar challenge: how to share information between different parts of your infrastructure without creating a tangled web of dependencies. Think of it like building a large software application - you need different components to work together while remaining maintainable and flexible.

Discovering and Sharing Infrastructure Data

When you’re working with modules, you’ll often need one piece of infrastructure to know about another. For instance, your application servers might need to know which subnets they can use, or your load balancer needs to know which instances to route traffic to. There are two main ways to handle this in Terraform: data sources and remote state. Let’s look at how to choose between them and use them effectively.

Using Data Sources: The Flexible Approach

Data sources are like infrastructure queries - they let your Terraform code discover existing resources at runtime without creating hard dependencies. This is particularly powerful when you want to keep your modules loosely coupled. Here’s a practical example:

data "aws_vpc" "selected" {
filter {
name = "tag:Environment"
values = [var.environment]
}
filter {
name = "tag:Project"
values = [var.project_name]
}
state = "available"
}
data "aws_subnets" "private" {
filter {
name = "vpc-id"
values = [data.aws_vpc.selected.id]
}
filter {
name = "tag:Tier"
values = ["Private"]
}
}

This code shows how you can discover VPCs and subnets based on their tags rather than hardcoding identifiers. It’s similar to how you might use a database query instead of hardcoding values in an application. This approach has several benefits:

  1. Your modules remain flexible - they work with any VPC that matches your criteria
  2. You can refactor underlying infrastructure without updating every dependent module
  3. Testing becomes easier since you can create different test environments with the same tags

But what if the resource you’re looking for doesn’t exist? Just like with database queries, you need to handle that case:

locals {
subnet_ids = tolist(data.aws_subnets.private.ids)
validate_subnets = length(local.subnet_ids) > 0 ? true : tobool(
"No private subnets found matching the specified criteria"
)
}
resource "aws_instance" "app_server" {
count = var.instance_count
subnet_id = element(local.subnet_ids, count.index % length(local.subnet_ids))
# Other configuration...
}

This pattern validates that you found the resources you need before trying to use them, providing clear error messages when things go wrong.

Remote State: When Data Sources Aren’t Enough

While data sources are usually the best choice, sometimes you need information that isn’t available through your cloud provider’s API. This is where remote state comes in; the state file provides a stateful representation of existing infrastructure for the Terraform program, but it can also be used as a data source as well.

Here’s when you might need to use remote state:

  • Accessing custom values that only exist in your Terraform configuration
  • Sharing complex data structures between different parts of your infrastructure
  • Reading outputs from one piece of infrastructure that aren’t exposed via APIs

Here’s how you might use remote state to access network configuration:

data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "network/terraform.tfstate"
region = "us-west-2"
}
}
locals {
# Access custom outputs not available via provider API
custom_network_configs = data.terraform_remote_state.network.outputs.custom_configs
}

However, use remote state sparingly. It creates a tight coupling between your Terraform configurations - change the output in one place, and you’ll need to update every configuration that references it. It’s like having a direct dependency on another module’s internal implementation.

Managing Default Configurations

One of the trickiest parts of managing infrastructure at scale is handling configuration across different environments and use cases. You want sensible defaults, but you also need the flexibility to override them when necessary. Let’s look at how to build a robust configuration system.

Building a Configuration Hierarchy

Configuration flows through multiple layers: organization-wide defaults form the foundation, environment-specific settings build upon those, and specific overrides provide final customization:

variable "organization_defaults" {
type = object({
region = string
tags = map(string)
backup_config = object({
retention_days = number
frequency = string
})
monitoring = object({
enabled = bool
thresholds = map(number)
})
})
default = {
region = "us-west-2"
tags = {
Organization = "MyCompany"
ManagedBy = "terraform"
}
backup_config = {
retention_days = 7
frequency = "daily"
}
monitoring = {
enabled = true
thresholds = {
cpu = 80
memory = 85
}
}
}
validation {
condition = can(regex("^[a-z]{2}-[a-z]+-[1-9]$", var.organization_defaults.region))
error_message = "Region must be a valid AWS region identifier."
}
}

This creates a strongly typed configuration object with built-in validation. The type definitions serve as documentation, making it clear what configuration options are available. The validation ensures that values meet your requirements before Terraform tries to use them.

Environment-Specific Configuration

Different environments naturally require different settings. Development environments might need minimal resources to control costs, while production demands high-reliability configurations optimized for performance. Here’s how to implement environment-specific defaults:

variable "environment_name" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment_name)
error_message = "Environment must be one of: dev, staging, prod."
}
}
locals {
environment_defaults = {
dev = {
instance_type = "t3.micro"
capacity = {
min = 1
max = 3
}
backup_config = {
retention_days = 7
frequency = "weekly"
}
}
prod = {
instance_type = "t3.medium"
capacity = {
min = 2
max = 10
}
backup_config = {
retention_days = 30
frequency = "daily"
}
}
}
# Merge configurations with environment-specific overrides
base_config = merge(
var.organization_defaults,
local.environment_defaults[var.environment_name]
)
}

This configuration system allows for clear separation between organization-wide standards and environment-specific requirements. Variables cascade from broad defaults to specific implementations, making it easy to understand and maintain your infrastructure’s configuration.

Implementing Flexible Overrides

Sometimes you need to deviate from the standard configurations for specific use cases. A well-designed override system lets you make these adjustments without compromising type safety or validation:

variable "config_overrides" {
type = object({
instance_type = optional(string)
capacity = optional(object({
min = number
max = number
}))
tags = optional(map(string))
})
default = {}
validation {
condition = var.config_overrides.capacity == null ? true : var.config_overrides.capacity.max > var.config_overrides.capacity.min
error_message = "Maximum capacity must be greater than minimum capacity."
}
}

Bringing It All Together

The real power of this configuration system emerges when you combine the different layers. Here’s how to create a flexible, type-safe configuration that handles both common cases and exceptions:

locals {
# Merge configurations with type safety
final_config = {
instance_type = coalesce(var.config_overrides.instance_type, local.base_config.instance_type)
capacity = var.config_overrides.capacity != null ? var.config_overrides.capacity : local.base_config.capacity
tags = merge(local.base_config.tags, var.config_overrides.tags != null ? var.config_overrides.tags : {})
}
}
# Example usage
resource "aws_instance" "example" {
instance_type = local.final_config.instance_type
tags = local.final_config.tags
dynamic "root_block_device" {
for_each = var.config_overrides.root_volume != null ? [var.config_overrides.root_volume] : []
content {
volume_size = root_block_device.value.size
volume_type = root_block_device.value.type
}
}
}

This implementation provides type safety throughout the configuration chain while providing clear override paths that maintain validation constraints. It supports partial overrides, allowing you to change only what you need, and creates a predictable configuration resolution process.

For example, you might use these configurations like this:

module "standard_web_server" {
source = "./modules/web-server"
environment = "prod"
}
module "special_case_server" {
source = "./modules/web-server"
environment = "prod"
config_overrides = {
instance_type = "t3.large" # Override just the instance type
capacity = {
min = 3
max = 15
}
}
}

The configuration system gracefully handles both standard deployments and special cases while maintaining consistency in your infrastructure definitions. Organization defaults establish your baseline standards, while environment configurations handle common variations between development and production. Targeted overrides complete the system by addressing specific needs without compromising the overall structure.

Scaling and Managing Complex Environments

Infrastructure requirements rarely stay simple. As your organization expands, you’ll find yourself managing multiple regions, accounts, and even cloud providers. The practices that worked for a single environment need to evolve to handle this increased complexity effectively.

Multi-Region and Multi-Account Architecture

Your infrastructure code’s organization should mirror your team’s structure rather than your cloud provider’s layout. This architectural approach simplifies access control management, makes cost allocation much easier, and makes compliance tasks vastly simpler to address.

Consider this pattern for organizing regional infrastructure:

patterns/region/variables.tf
variable "region" {
type = string
description = "The AWS region this infrastructure will be created in"
validation {
condition = can(regex("^[a-z]{2}-[a-z]+-[1-9]$", var.region))
error_message = "Must be a valid AWS region identifier."
}
}
variable "environment" {
type = string
description = "Environment name"
}
variable "network_config" {
type = object({
vpc_cidr = string
public_subnet_count = number
private_subnet_count = number
})
validation {
condition = can(cidrhost(var.network_config.vpc_cidr, 0))
error_message = "VPC CIDR must be a valid IPv4 CIDR block."
}
}

This validation ensures configuration consistency across your regions while keeping the implementation flexible. Here’s how you’d implement the networking component:

patterns/region/main.tf
module "networking" {
source = "./modules/networking"
region = var.region
environment = var.environment
vpc_cidr = var.network_config.vpc_cidr
subnet_counts = {
public = var.network_config.public_subnet_count
private = var.network_config.private_subnet_count
}
}
# Additional infrastructure components...

This structure separates regional infrastructure concerns from business logic while maintaining consistent configurations across your deployment regions. Teams can work independently on their components without stepping on each other’s toes, and new regions can be added without extensive rework.

Automation-First Design Principles

Manual infrastructure management doesn’t scale. Successful infrastructure code needs to work reliably in automated environments, which influences how we structure our configurations and handle sensitive data.

State and Authentication Management

Automated infrastructure deployments require particular attention to state management and authentication. Here’s a base configuration that supports automated workflows:

automation/backend.tf
terraform {
backend "s3" {
# These values should be provided via automation
# bucket = "my-terraform-state"
# key = "path/to/state"
# region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-locks"
}
}

Successful automation requires several core practices:

  • Backend configurations should come from your automation system, not hardcoded values
  • State locking prevents concurrent modifications that could corrupt your infrastructure
  • Authentication credentials belong in environment variables or instance roles, never in code
  • Separate state files provide isolation between environments and components

Handling Sensitive Data

Sensitive data like database credentials requires special handling in automated systems. Here’s a pattern that safely manages database configuration:

variable "database_config" {
type = object({
instance_class = string
storage_gb = number
# Note: These should be provided by a secrets manager
master_username = optional(string)
master_password = optional(string)
})
sensitive = true
}
resource "aws_db_instance" "main" {
instance_class = var.database_config.instance_class
allocated_storage = var.database_config.storage_gb
# Use secrets manager if credentials aren't provided
username = var.database_config.master_username != null ? var.database_config.master_username : data.aws_secretsmanager_secret_version.db_creds.secret_string["username"]
password = var.database_config.master_password != null ? var.database_config.master_password : data.aws_secretsmanager_secret_version.db_creds.secret_string["password"]
}

This pattern integrates with secrets management services, so storage and rotation is handled outside the IaC configuration.

Resource Management at Scale

Our earlier base module pattern becomes especially valuable when managing resources across multiple environments. Instead of creating naming conventions from scratch, we can extend our established patterns:

module "metadata" {
source = "./modules/base/metadata"
environment = var.environment
region_short = local.region_map[var.region]
application = var.application_name
component = "infrastructure"
}
locals {
# Region mapping for standardization
region_map = {
"us-east-1" = "use1"
"us-west-2" = "usw2"
"eu-central-1" = "euc1"
}
# Common configurations using our standardized naming
common_config = {
monitoring = {
interval = var.environment == "prod" ? 60 : 300
retention = var.environment == "prod" ? 30 : 7
insights_enabled = var.environment == "prod"
}
backup = {
enabled = true
retention = var.environment == "prod" ? 30 : 7
}
}
}
# Example resource using standardized naming and tagging
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
alarm_name = "${module.metadata.resource_name}-cpu-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = local.common_config.monitoring.interval
threshold = "80"
tags = module.metadata.tags
}

Consistent resource naming can now extend across your entire infrastructure, with standardized tagging simplifying resource tracking and cost allocation. Environment-specific configurations follow established patterns, and the structure creates clear relationships between resources and their parent applications.

Managing Multiple Environments

When scaling across environments, each with its own requirements, centralizing environment-specific configurations helps to ensure uniformity. Here’s how to structure environment configurations that work with our base module pattern:

locals {
environment_config = {
dev = {
instance_type = "t3.small"
auto_scaling = {
min = 1
max = 2
desired = 1
}
}
prod = {
instance_type = "t3.large"
auto_scaling = {
min = 2
max = 10
desired = 4
}
}
}
# Merge environment config with overrides
config = merge(
local.environment_config[var.environment],
var.overrides
)
}

This configuration integrates cleanly with our metadata module:

module "metadata" {
source = "./modules/base/metadata"
environment = var.environment
region_short = local.region_map[var.region]
application = var.application_name
component = "app"
}
module "application_cluster" {
source = "./modules/application-cluster"
name = module.metadata.resource_name
instance_type = local.config.instance_type
min_size = local.config.auto_scaling.min
max_size = local.config.auto_scaling.max
desired_capacity = local.config.auto_scaling.desired
tags = module.metadata.tags
}

Automation Considerations

Automation is essential for any Terraform deployment at scale. If you’re going to treat Terraform like a first-class citizen in a software environment, then it needs to be given the same considerations as application code, which means deployment via CI/CD.

Specific configuration patterns for individual platforms is beyond the scope of this article, so we’ll lay out some high-level principles that should at least help provide a foundation for how to think about employing Terraform in automated deployments.

  1. Idempotency: Ensure that multiple runs with the same inputs produce the same result
  2. Failure Handling: Design for graceful handling of failures and partial completions
  3. State Management: Implement proper state locking and version control
  4. Access Control: Use minimal-privilege service accounts and clear role boundaries
  5. Monitoring: Include detailed logging and monitoring of automation processes

Mastering Terraform/OpenTofu Code Organization

OpenTofu and Terraform help us wrangle complex infrastructure into maintainable code, but success depends on solid organizational patterns. In this article, we’ve walked through several core strategies that work at scale.

We started with the foundation: a base module pattern that enforces consistent naming and tagging across your infrastructure. This pattern scales naturally to handle more complex requirements, from basic resource creation to multi-region deployments.

Building on that foundation, we explored how to manage configuration across different environments without sacrificing type safety or validation. The data sharing patterns we covered, from data sources to remote state, help you balance flexibility with maintainability as your infrastructure grows. For teams managing infrastructure across multiple regions and accounts, we demonstrated patterns that promote independence while maintaining consistency. These patterns work together: the base module provides naming standards, environment configurations handle common variations, and specific overrides address unique requirements..

While every organization’s needs are different, these patterns provide a solid starting point for managing infrastructure at scale. Focus on establishing these practices early as they are much easier to implement from the start than to retrofit into existing infrastructure.

GitOps-First Infrastructure as Code

Ready to get started?

Build, manage, and deploy infrastructure with GitHub pull requests.