What about GitHub Actions?

What about GitHub Actions? blog post

Why can’t I just use GitHub Actions for Terraform?

I recently stumbled upon this FAQ from Cloud Posse about Terraform continuous delivery. They ask the following question from a user’s perspective:

What about using GitHub Actions?

Cloud Posse FAQ

Cloud Posse dives into a question and answer explanation on why running Terraform with a homegrown GitHub Actions solution is not a good idea. All of the answers are 100% on point. I was excited to read this post because Terrateam addresses all of the issues that Cloud Posse calls out as being a problem with a homegrown GitHub Actions solution.

The devil is in the details

One of the reasons we started down the path of building Terrateam is because it is in fact very difficult to create a homegrown Terraform solution for GitHub Actions. The problem is that users oftentimes think the bare minimum plan and apply workflow is enough. This is simply not the case. The devil is in the details.

The mission

There are a lot of edge cases when it comes to running Terraform with GitHub Actions. Terrateam is on a mission to fix those edge cases while providing a superb developer experience. I wanted to respond to all of the points Cloud Posse makes as they relate to Terrateam.

What’s the problem with GitHub Actions?

Simply put, you need state, as in stored data, for safe Terraform continuous delivery. When I say state, I’m not talking about the Terraform state file. State, as in the set of stored data, as it relates to Terraform continuous delivery.

The Terrateam backend has sophisticated tracking mechanisms that are necessary to store Terraform operational state for safe continuous delivery.

The core issue is that standalone GitHub Actions does not contain any way to store state making it a dangerous solution for Terraform continuous delivery.

How does Terrateam do it?

Before going into the question and answer portion of this blog post, it’s important to understand how Terrateam works. We solve all of the problems Cloud Posse outlines by having our custom GitHub Actions workflow talk to the Terrateam backend. This API provides a secure way for each Terrateam installation to initiate and track Terraform operations backed by GitHub Actions and store encrypted Terraform plan files.

Terrateam is a GitHub application that translates GitHub events into Terraform executions. There are two major components of the Terrateam service:

  • The backend which receives GitHub events and makes decisions using the event payload
  • The Terrateam runner which executes the jobs that the backend creates

For each repository event (open, close, comment, etc.), the Terrateam GitHub application receives a webhook from GitHub. Each webhook contains a payload detailing the event. For example, if a user opens a pull request with a Terraform code change, the Terrateam backend receives a payload in JSON format with the details of the open pull request (organization, repository, pull request number, branch, files changed, who created the pull request, etc.).

How It Works

The Terrateam backend server is able to keep track of all of these events and decide when to perform certain operations with GitHub Actions, store encrypted plan files, lock directory/workspace combinations, and invalidate plan files.

When developing a homegrown Terraform solution with GitHub Actions, most people do not have a service providing this level of sophistication.

On to the answers…

The questions and answers

Cloud Posse didn't ask us these questions directly. I'm answering them because they relate to Terrateam. The intent of answering these questions is to give readers clarity on why creating a homegrown Terraform solution with GitHub Actions can be a dangerous rabbit hole to fall in.

1. Plan file storage

Where will you store plan files? Plan files, which are required for approvals in a plan -> approve -> apply workflow need to be saved. These plan files may contain credentials for resources such as RDS databases which cannot be avoided.

Terrateam stores customer plan files in encrypted storage. Access is strictly controlled and logged. A plan file is only accessible by the GitHub organization, repository, and pull request.

In the future, we’ll be giving customers the option to store plan files in customer-owned s3 buckets creating more isolation of customer data.

2. Plan file cleanup

How will you clean up plan files? Should they persist after a terraform apply succeeds or crashes?

Terrateam plan files are deleted as soon as they are used by their respective apply, after an unlock is performed, or after 14 days.

If an Apply operation fails, then the plan file is still destroyed which causes the user to have to re-plan. We feel like this is the safest course of action.

3. Approvals

How will you implement approval steps? If the approval is denied, how will you clean up the terraform plan file?

Approval steps are controlled by Apply Requirements and Access Control rules.

Users have the ability to implement the following requirements for Apply operations:

  • Number of approvals
  • Merge conflicts
  • Status checks

Additionally, directory and workspace combinations may have granular access control rules. This allows teams to have distinct access control policies for Terraform resources and environments in a monorepo.

4. Multiple PRs and Plans against overlapping resources

If you have multiple open PRs (e.g. many plans) for one workspace, after applying one, all other plans need to be invalidated. How will you implement that invalidation?

Planning on a directory is allowed by any number of Pull Requests. We feel that users should be able to review plans alongside existing pull requests.

If a change is applied, any locks are released which invalidates any plans against open Pull Requests on that same Terraform directory or set of resources.

Only when the pull request that was applied is merged into the default branch, the set of locks are released.

This ensures consistency between your Terraform repository and resources.

After a successful merge to the default branch, Terrateam will re-plan any invalidated plans based on the repo-level autoplan configuration.

We go into this in our blog post: Safety First.

5. Source of truth

Git is only one source of truth for infrastructure as code. Data sources is another (e.g. terraform remote state). How will you reconcile that your state is current and update it when it drifts? When it drifts, how will you be notified?

Terrateam recently released Drift Detection and Reconciliation.

Run Drift Detection and Reconciliation on a schedule and automatically create a new GitHub Issue, Email or Slack notification, execute a custom command, or send a webhook.

6. Failed Apply after the merge

How will you know that your infrastructure changes are applied everywhere? If a build fails, but the code is already merged, how do you escalate and ensure it’s resolved?

Terrateam locks are only released when the pull request that is supposed to be applied is successfully applied and merged into the default branch.

If an apply fails then the Terrateam locks are not released. Terrateam has special logic for this scenario because it’s common and dangerous.

Any number of custom notifications can be created in this scenario. Additoinally, if a user attempts to apply another pull request, they will immediately receive a comment with an access denied notification explaining the situation and the steps to take to resolve.

We go into this in our blog post: Safety First.

7. Locking environments

If you need to lock an environment from being updated, how will you do it?

Terrateam is configuration as code. Here’s an explanation using our Terrateam YAML configuration file:

workflows:
- tag_query: "dir:production"
apply:
- type: run
cmd: ["echo", "Environment", "locked"]
capture_output: true

To lock an environment (directory/workspace combination, optionally using regular expressions) a user can create a pull request similar to the change above.

To unlock an environment, the pull request can be reverted.

8. Large comments and noisy notifications

How will you suggest the changes? If the plan is to comment on the PR, that gets VERY noisy and everyone subscribed will receive the notification. Runs may also accidentally leak secrets in the output. GitHub comments are limited to 65K bytes, which means large plans will need to be split across multiple comments.

Terrateam Plan and Apply operations are posted back to the GitHub Pull Request as comments. If the Terrateam output is too large for a single comment, then the user is given a link to the full GitHub Actions execution log file.

9. Multiple PRs merged against the same environment

What happens if you have multiple PRs merged that want to modify the same environment? How will you enforce an ordered consistency?

If you follow the apply after merge workflow then the first PR that comes into the queue is applied.

Other plans that touch the same set of resources are invalidated and a re-plan must occur. This will create a Terrateam lock and should be unlocked by applying missing changes.

We go into this in our blog post: Safety First.

10. Terraform restrictions for plan and applies

How will you restrict who can run terraform plans and applies? Furthermore, how will you restrict it to specific environments?

Terrateam is configuration as code. Here’s an example on how to restrict plan and applies:

dirs:
aws/qa:
tags: [aws, qa]
aws/production:
tags: [aws, production]
access_control:
policies:
- tag_query: aws qa
plan: ["*"]
apply: ["*"]
- tag_query: aws production
plan: ["team:engineering"]
apply: ["team:sre"]

The above configuration is very basic. You can easily create granular access control rules using directory, subdirectory, workspace, and regular expression combinations.

See the Terrateam Access Control documentation for details.

11. Short-lived credentials

How will you provide the short-lived IAM credentials to the terraform processes? e.g. any hard coded credentials exposed will be a major liability

Terrateam has secure OIDC for AWS and GCP which allows the easy use of short-lived credentials.

See the Terrateam OIDC documentation for details.

Conclusion

This Cloud Posse article really spoke to us. They pointed out all of the problems with GitHub Actions.

It’s common for users to not think about all of the edge cases when it comes to Terraform and continuous delivery.

Terrateam is striving to solve these problems for all GitHub users so they can enjoy Terraform with pull requests in a safe way.

GitOps-First Infrastructure as Code

Ready to get started?

Build, manage, and deploy infrastructure with GitHub pull requests.