How to detect, manage, and prevent Terraform drift: The ultimate guide

What you'll learn: In this article, you'll learn what Terraform drift is, some of the common causes of drift, strategies for Terraform drift detection, and how to manage and control infrastructure drift before it becomes a production problem.

This guide covers understanding, detecting, managing, and preventing Terraform drift across real world cloud infrastructure, so that the state defined in your Terraform configuration files matches the actual state of your deployed resources.

You'll see how drift occurs, the common Terraform drift causes, how automation tools and Terraform commands fit into a healthy Terraform workflow, and how to keep your Terraform state file aligned with your real infrastructure instead of diverging over months of manual configuration changes and emergency fixes.

We walk through some built-in Terraform commands such as the terraform plan command, terraform apply, and terraform refresh, then explore additional practices and drift detection tools that help you detect drift earlier and keep Terraform management under control.

Along the way, we talk about backend configuration, remote Terraform state, and how to track infrastructure over time so that infrastructure as code remains the single source of truth.

By the end, you'll have a clear understanding of Terraform drift and practical strategies for addressing drift through both proactive practices and regular maintenance. Supported by automated drift detection, you and your team can catch drift, remediate drift safely, and prevent unknown and untracked changes from turning your infrastructure code into a liability.

What is Terraform Drift?

To properly understand Terraform drift, it helps to revisit the basics of how Terraform as a configuration management tool models your cloud resources.

Terraform relies on persisted Terraform state, which is stored in a Terraform state file and represents all the provisioned cloud resources and their corresponding outputs at the moment the last Terraform apply completed successfully.

That state file can live on your local filesystem, inside a remote backend configuration such as AWS S3 or Terraform Cloud, or in other resources that your team has chosen for remote state. We'll also soon be launching Stategraph to help you manage state and make plans and applies even faster.

Every time you run terraform plan, Terraform first refreshes the state by querying your cloud providers to get the current actual state of the infrastructure managed by Terraform. It then compares that refreshed state to your Terraform configuration, which might be a monolith of Terraform config or a carefully modularized set of Terraform configuration files stored in a version control system such as GitHub.

This refresh step ensures that Terraform has the most up to date view of your managed infrastructure before it decides what infrastructure changes to perform next.

Terraform drift is the gap between the desired state captured in your Terraform code and the real infrastructure that exists when you interrogate your cloud infrastructure.

That gap can live in the code, when Terraform configuration diverges from reality, or inside the cached Terraform state, when the Terraform state file no longer reflects what is actually running. When that happens and drift is detected, your infrastructure's actual state aligns with neither the state file nor the configuration, and the entire system starts to behave in ways that your infrastructure as code never predicted.

What causes Terraform drift?

Before you can resolve a drift issue, you need to understand how it happened, otherwise drift remediation can introduce challenges of its own, such as regressions in your environment. The last thing you want to do is cause an outage for your users.

The worst case is that you try to fix drift and instead introduce drift somewhere else, or you unwind a critical hotfix that was applied directly in the AWS console while trying to be helpful.

ClickOps

One of the most obvious and common causes of Terraform drift is ClickOps, when someone logs into one of the cloud consoles and clicks around editing configuration by hand instead of using Terraform. These are manual changes, not infrastructure code changes, and they bypass whatever Terraform workflow you had in mind.

ClickOps can have a huge impact or a relatively small one. A developer might jump into the AWS console or another provider's UI just to review configuration, then tweak memory settings or change a security group without thinking about how that change relates to the Terraform state.

Adjusting security group rules so that a security group is effectively open from anywhere looks harmless in the interface, but in practice it can enable unauthorized access, introduce security architecture vulnerabilities, and amount to granting unintended public access to something that was supposed to stay private. That kind of unintended public access is a very real cause of Terraform drift, because Terraform is still convinced that the desired state is more restrictive.

In cases like this, it's very important to reconcile your drift to ensure that your applications are properly secured, observing the principle of least privilege, instead of the principle of whoever last clicked in the UI.

S3 buckets flipped from private to public, IAM users or roles created outside Terraform, or security group rules edited directly in the console are all examples of infrastructure drift that quietly erodes any guarantee that state defined in Terraform matches what the network is actually exposing.

CI pipelines (managing infrastructure outside Terraform)

ClickOps is not the only way drift occurs.

Drift also appears when CI/CD pipelines manage parts of your infrastructure outside Terraform. Terraform is an excellent configuration management tool for provisioning infrastructure, but teams often update certain resources using other automation tools so that they can keep deployments fast and focused.

Imagine that you have provisioned an Elastic Container Registry and Elastic Container Service using Terraform on AWS. Those resources are managed by Terraform configuration files and everything looks tidy in Git.

However, your team is running Github Actions as part of your CI/CD pipeline to build and publish new container images, then updates ECS task definitions using the AWS CLI. Instead of running Terraform to change the task definition, the pipeline downloads the current definition, patches the container image tag, and pushes the updated definition directly.

From Terraform's perspective, that ECS service is drifting.

This is, by definition, Terraform drift, as the image that is cached in the Terraform state is no longer the same image that is deployed on your site. It may not seem very significant, but depending on the policies attached to your ECR, the container tag that is persisted in Terraform state may no longer exist in the registry.

Managing infrastructure outside Terraform is sometimes necessary, but every one of these patterns is a place where unknown and untracked changes can sneak in and introduce drift.

Automatic changes

Another common cause of Terraform drift is automatic reconfiguration of managed infrastructure by cloud providers themselves. Cloud resources do not sit still.

In AWS, auto-scaling groups automatically add or remove EC2 instances based on demand, which can change the actual number of instances compared to what's in your Terraform state.
Similarly, RDS instances may receive automatic maintenance updates during scheduled maintenance windows, potentially altering database engine versions or parameter group settings.
EC2 instances configured for auto recovery might be silently replaced, generating new instance IDs that Terraform did not expect. Other resources behave similarly across cloud providers, changing gradually as managed services evolve or heal themselves.

These automatic behaviors happen behind the scenes and do not care whether infrastructure is managed by Terraform. Over time they accumulate into infrastructure drift that the Terraform state does not know about yet.

Until you run terraform plan or terraform refresh and reconcile the difference, the infrastructure managed in reality and the infrastructure managed in code are no longer the same.

The cost of drift to your business

Terraform drift has two major negative effects on a business, and they show up both in time and in misconfiguration – both of which, ultimately, cost your business money.

Most businesses have limited resources, so having engineers review Terraform plans to compare expected state to real state is not the best use of time, and is rarely a cost saving measure. Depending on the number of provisioned resources, this can take an engineer hours to reconcile.

When drift has piled up for months, reconciling drift means scanning through pages of infrastructure changes, trying to work out which resources Terraform should fix, which changes to keep, and whether a particular manual configuration change was intentional during an incident. Every minute spent on that detective work increases operational costs, because it's time not spent on product work.

Unwinding critical changes can be even more expensive.

Imagine a situation where a developer has logged into your AWS console to make an ALB publicly accessible from everywhere instead of just from a specific subnet. Later, someone else might run terraform apply from an old branch, with an out of date state file, and unknowingly reverse the fix drift that kept production healthy.

That kind of accidental rollback can cause a second outage, and every outage has direct financial implications, from missed revenue to brand damage. In other words, unmanaged drift does not just clutter your Terraform management story, it can increase operational costs in ways that are hard to predict but very easy to feel on the balance sheet.

Why it's important to detect Terraform drift early

The costs of Terraform drift compound over time.

A single ClickOps tweak that goes undetected for days or weeks can cascade into multiple discrepancies as your team continues making legitimate infrastructure changes.

When you finally run terraform plan weeks later, you're not just reconciling one change, but untangling a web of interdependent drift that makes it difficult to determine which changes were intentional, which were accidental, and which might break critical functionality if reverted.

If you only detect drift occasionally, every reconciliation is painful. You're not just trying to detect drift in one resource, you are trying to reconstruct a history of unknown and untracked changes, decide which ones are safe to keep, and guess what will happen when Terraform attempts drift remediation by force.

The result is that people avoid running terraform plan, and once that habit sets in, it is only a matter of time before a terraform apply performs a sweeping set of updates that nobody fully understands.

Early Terraform drift detection transforms drift from a crisis into a routine maintenance task.

If you can catch drift within hours or days using automated drift detection, then reconciling drift becomes a routine maintenance task rather than a high risk event. When you catch a security group that's open to the world within hours of the change, you can quickly assess the risk and close it before it becomes a security incident. When you detect that an ECS task definition has been updated outside of Terraform, you can import the change or update your configuration while the context is still fresh in your team's memory.

When a CI/CD pipeline updates an ECS task definition outside Terraform, you capture that drift quickly and either import the change or push a new terraform config that restores control.

The longer the drift goes undetected, the more likely it is that someone has built dependencies on those changes, making reconciliation risky and time-consuming.

Moreover, undetected drift creates a false sense of security in your infrastructure as code (IaC) practices. Your Terraform configuration appears to be the source of truth, but your actual infrastructure has quietly diverged, meaning your next terraform apply could introduce unexpected changes or outages.

Regular drift detection ensures that your Terraform state remains a reliable snapshot of real infrastructure. The promise of infrastructure as code is that infrastructure managed in Git matches what is running in production.

That promise only holds if the infrastructure's actual state aligns with the desired state encoded in Terraform, and that alignment only holds when you actively manage drift instead of hoping nothing has changed behind your back.

How to detect Terraform drift

There is a straightforward way to reconcile existing Terraform-managed infrastructure with your Terraform state file using native Terraform commands.

You can run terraform plan -refresh-only, which is a variant of the terraform plan command that focuses solely on updating state. This command will perform an interrogation against your provider(s) and create a plan showing how your state file would be updated to match the current infrastructure.

To actually update the state file (whether local, S3, Stategraph or other), you then need to run terraform apply -refresh-only and update the stored state without touching configuration.

In older workflows you might see terraform refresh used for similar purposes, although -refresh-only is a more explicit fit for modern Terraform workflow design.

To adopt unmanaged resources, you still need to use terraform import so that Terraform knows they exist.

If your team relies heavily on manual configuration changes in cloud consoles or third-party automation software that offers drift detection only as a side effect rather than a core feature, then you'll regularly discover resources that are not represented in your configuration management tool. Until you define appropriate infrastructure in your Terraform config and import those resources, you cannot truly manage drift for those pieces of your environment.

How to manage and fix Terraform drift

While you can never eliminate drift entirely in a dynamic environment, you can minimize it, manage drift more confidently, and fix drift before it hurts you. A handful of deliberate practices around state, imports, and planning frequency go a long way toward drift management that feels boring instead of terrifying.

Remote Terraform state

The first move you can make to prevent Terraform drift is to take advantage of a feature-rich remote Terraform state provider.

Terraform's default behavior is to write state to a local state file, which means every engineer has their own divergent view of what infrastructure managed by Terraform looks like. Even without ClickOps, this increases the risk that someone runs an old terraform plan and applies it, accidentally overwriting changes that happened elsewhere.

Remote state, configured through a proper backend configuration, gives everyone a single Terraform state to work with. S3 based remote state is a common baseline. Adding proper state locking through S3's newer features or through a DynamoDB table prevents multiple people from trying to run terraform apply against the same state simultaneously, which minimizes drift and avoids certain race conditions.

Terraform Cloud and other remote backends also provide locking, versioning, and history so that you can see how Terraform management evolved over time and how state changed between runs.

Remote state alone does not eliminate drift, but it creates a stable foundation where the state defined in your Terraform backend is at least consistent.

When that backend is combined with clear processes around who can apply changes and when, and when it is integrated with a version control system so that terraform code is always reviewed, you greatly reduce the chance that local experiments quietly leak into production.

While Terraform remote state is an essential tool for minimizing the potential for Terraform drift, it will not eliminate it. Combining other techniques, such as Terraform import, helps you keep your infrastructure managed by a single configuration.

Import infrastructure

Unless you operate in an extremely locked down environment, ClickOps and out of band automation are facts of life.

Developers will occasionally update environment variables in a console to test something, or a DevOps engineer will adjust a setting during an incident and forget to circle back to the Terraform configuration files later. These action can introduce drift, but not all of them can be avoided in the moment.

Terraform's import capability exists to reconcile these situations with the infrastructure code. When a resource has been created or heavily modified outside Terraform, you can create the matching resource block in your terraform config and run terraform import to associate the existing resource ID with that configuration. Once the import succeeds, future runs of terraform plan and terraform apply will treat that resource as fully managed infrastructure.

Imports are not a replacement for creating resources with Terraform from the beginning. Some APIs intentionally do not expose all original creation data, especially secrets.

For example, if you create an AWS IAM access key with Terraform, the Terraform state file records both key and secret at creation time. If you instead import an existing key, the provider cannot reconstruct the secret, and Terraform will never know it. This kind of limitation is intentional, to reduce security vulnerabilities, but it also reminds you that imports are a pragmatic tool for drift remediation, not a magic trick to recreate perfect history after the fact. Read about Importing an S3 bucket in our article.

Plan often

Another strategy for drift management is simply to run Terraform plans regularly so that you detect drift quickly instead of waiting for a major feature release.

The act of running terraform plan compares Terraform configuration against stored state and makes fresh queries to your cloud providers to see where real infrastructure disagrees. Even if you do not intend to apply the plan immediately, you now know which resources have diverged and where drift occurs.

Because standard plans show both intentional changes from your code and out of band changes from reality, many teams also run periodic terraform plan -refresh-only jobs, often as part of CI CD schedules, to isolate infrastructure drift.

A refresh-only plan focuses on the relationship between the Terraform state and the actual state of each resource, without considering new configuration changes. To actually update your state with these changes, you need to run terraform apply -refresh-only after reviewing the plan.

Over time, running plans frequently gives you a much clearer idea of how often drift appears and which parts of your cloud infrastructure are most volatile. It also minimizes drift size, turning massive reconciliations into small, predictable updates that fit neatly into a GitHub pull request and a human review.

Use Terrateam

Terrateam is a TACOS that allows you to manage and deploy Terrateam infrastructure changes using GitOps workflows.

With Terrateam, you can schedule automation drift detection checks on your IaC workspaces to make sure your Terraform configuration and actual resources haven't drifted apart.

You can schedule drift detection during specific time windows and per environment, with complete audit trails of your drift history recorded and accessible.

Choose to get notified when drift is detected, or set up automatic applies that occur on drift for immediate remediation. Book a demo to see Terrateam in action.

How to prevent Terraform drift

Completely eliminating Terraform drift is nearly impossible in any environment that changes and grows.

However, you can build habits and guardrails that minimize drift, make it easier to address, and ensure that your infrastructure's actual state aligns as closely as possible with what you keep in Git.

Most strategies for preventing drift start with access control and process.

When you enforce the principle of least privilege and limit write access in cloud consoles, you reduce the number of people who can introduce drift through manual configuration changes.

When you require that all infrastructure changes land through code, reviewed as part of regular pull requests and passed through CI/CD, you turn Terraform code into the single front door for infrastructure changes instead of an optional convenience.

Regular, automated planning jobs, whether built with GitHub Actions, other CI/CD systems, or specialized tools, are also part of prevention.

Jobs that run terraform plan -refresh-only on a schedule and report differences help you catch drift early, and jobs that run full plans against feature branches keep configuration honest. When these jobs are connected to a robust remote backend, whether Terraform Cloud, S3 with DynamoDB locking, or another backend configuration that supports locking and history, you get reliable coordination so that only one apply happens at a time.

State locking minimizes drift that comes from concurrent applies. Careful separation of workspaces and environments minimizes blast radius when drift does sneak in.

Clear ownership of modules and environments minimizes drift around neglected corners of the system that nobody feels responsible for. Together, those practices form a Terraform management story in which infrastructure as code remains trustworthy and infrastructure drift is a rare event rather than a constant background hazard.

Conclusion

Terraform drift is a reality of modern infrastructure management. Whether it stems from ClickOps, automated provider updates, or managing resources outside of Terraform, it introduces risks ranging from security vulnerabilities to application downtime.

Understanding the causes is the first step, but having a robust strategy for Terraform drift detection and management is what keeps your infrastructure reliable and repeatable.

While native Terraform commands like plan and import are powerful, relying on manual processes often isn't enough for scaling teams. This is where specialized Terraform drift detection tools shine. By automating the detection process and enforcing strict state management policies, you can manage discrepancies before they become outages.

Terrateam helps solve this problem by integrating directly into your existing workflow. It automates drift detection, ensuring you are alerted to changes as they happen, not just when you remember to run a plan.

If you're looking to streamline your IaC operations and sleep better at night knowing your state matches your reality, sign up for Terrateam today and take control of your infrastructure.

Features

Learn More

Learn

Connect