Why we built a better Atlantis
Atlantis deserves respect. It pioneered GitOps for infrastructure, brought Terraform automation to thousands of teams, and did it as open source. For a long time, it was the only game in town if you wanted PR-based infrastructure workflows without selling your soul to a vendor. But here's the thing about pioneering technologies. They establish patterns, prove concepts, and then (inevitably) they reveal their limitations. Atlantis did exactly that. It showed us what was possible, and in doing so, it showed us what was necessary.
So we built Terrateam, not because we wanted to build "Atlantis but slightly better," but because we had fundamentally different opinions about how this problem should be solved. Opinions rooted in painful experience, informed by modern distributed systems thinking, and yes, driven by values about what software should be.
The problem wasn't Atlantis. The problem was reality.
When Atlantis launched, most teams had a handful of engineers making infrastructure changes, Terraform was young, state files were small, and the idea of running dozens of concurrent infrastructure operations seemed like premature optimization. Then reality happened. Teams grew, infrastructure exploded, monorepos became the norm, and suddenly you had 50 engineers, 200 Terraform workspaces, and a single-threaded bottleneck called Atlantis sitting in the middle of everything like a traffic cop who fell asleep at the intersection.
I've talked to teams who calculated they lost 15 engineering hours per week waiting for Atlantis to finish processing the queue. That's not a tool problem, that's a systemic tax on engineering productivity. And the response from the Atlantis project? Essentially, "Sequential execution is the safe choice to avoid state corruption." Which, fine, it's a choice, but it's not the only choice, and for scaled infrastructure, it's increasingly the wrong choice.
Concurrency is not the enemy. Unsafe concurrency is.
The technical heart of this is straightforward enough, Atlantis chose simplicity over scale, opting for one operation at a time, globally, across your entire organization. It's the distributed systems equivalent of wrapping your entire application in a single giant mutex. Does it prevent race conditions? Sure. Does it scale? Absolutely not.
The fundamental insight that drove Terrateam's architecture is that Terraform operations have dependencies, and those dependencies can be tracked. If you're changing aws/production/vpc
and I'm changing gcp/staging/kubernetes
, there is zero reason those operations should block each other, they touch completely different state files, different cloud providers, different blast radii. They should run in parallel.
The trick (and this is where the engineering gets interesting) is building a system that understands workspace dependencies, tracks them deterministically, and only serializes operations when they actually conflict, apply-only locks, not plan-and-apply locks; dependency graphs, not global queues. We built this in OCaml for a reason. Not because OCaml is trendy (it isn't), but because we needed deterministic concurrency through our async runtime (abb
), type safety that eliminates entire classes of state management bugs, and predictable behavior under load. This isn't about showing off the type system, this is about building infrastructure that you can trust. When you're running concurrent Terraform applies across production infrastructure, "mostly works" isn't good enough.
GitHub is not a deploy log. It's your infrastructure control plane.
Here's where Atlantis started to show its age. It predates the modern GitHub Actions era. It was built when Jenkins was still king, when CI/CD meant "spin up a server and wire up some webhooks." We took a different view, GitHub itself is the interface, not as a UI convenience, but as the actual security boundary.
Every Terraform operation in Terrateam happens in a pull request, uses GitHub's native review system for approvals, leverages GitHub secrets (no long-lived credentials on servers), integrates with OIDC for cloud provider authentication, and respects GitHub's RBAC model. This isn't "nice to have", this is treating your version control system as the source of truth for who can do what, which is exactly what it should be. Your infrastructure permissions should be declared in the same place as your infrastructure code.
And here's the kicker. When you treat GitHub as your control plane, you don't need to build a parallel universe of user management, permission systems, and audit logs, you use GitHub's. They've already solved this problem at scale for millions of developers.
Open source isn't a license. It's a value system.
Look, I get it. "Open source" has been diluted to meaninglessness. Every vendor slaps an MIT license on their SDK and calls themselves "open source." But open source as a value system means something different, it means you can read every line of code that controls your infrastructure, you can run it yourself on your own hardware with your own policies, you can fork it if we make decisions you disagree with, and you're not locked into pricing models that quintuple when you hit scale.
Atlantis got this right. They built an open-source tool that teams could actually use without vendor lock-in, and we wanted to preserve that (and extend it). Terrateam's open-source core gives you the full platform, we have a cloud offering because some teams prefer managed services, but you are never forced into it. Self-hosting is a first-class citizen, not an afterthought. This matters because infrastructure is too important to be held hostage by vendor pricing. I've seen teams pay $500k+/year for what is, fundamentally, "run Terraform safely." That's not a feature set. That's a racket.
Determinism is not optional.
There's a technical point that might seem esoteric but is absolutely critical, infrastructure automation must be deterministic. What does that mean? Given the same inputs, you get the same outputs, every time, with no race conditions, no "it worked on my machine," no "try running it again." This is why we built our own async runtime in OCaml, not because we love building infrastructure (okay, we do), but because the alternative was accepting non-deterministic behavior in a system that controls production infrastructure.
Think about what happens when concurrency is non-deterministic. Plans run in unpredictable order, lock acquisition becomes a lottery, debugging failures requires reconstructing timing-dependent state, and retries might succeed or fail based on factors you can't control. When you're managing stateful infrastructure with Terraform, non-determinism isn't a minor bug, it's a fundamental architectural flaw. Our OCaml stack (abb
for async runtime, brtl
for web framework, and terrat
for application) forms a cohesive, type-safe system where concurrency is explicit, tracked, and deterministic. This isn't academic computer science. This is practical reliability engineering.
Why not just fix Atlantis?
This is the obvious question, and it deserves an honest answer. We could have contributed patches to Atlantis, we could have proposed architectural changes, we could have forked it and made incremental improvements. But here's the reality. The changes we needed were architectural, they touched the concurrency model, the execution engine, the state management, the security model. At that point, you're not fixing a codebase, you're rewriting it. And if you're going to rewrite it, you might as well make the choices you actually believe in, use the languages and tools that give you the properties you need, build the abstractions that match your mental model, design for the scale and security requirements you know are coming. So that's what we did.
What we actually built
Terrateam is GitHub-native (your PRs, your reviews, your security model), concurrent by default (parallel plans, apply-only locks, dependency tracking), deterministically concurrent (OCaml-powered runtime with predictable behavior), policy-driven (RBAC, apply requirements, OPA/Rego support), observable (audit trails, operation history, structured logs), self-hostable (open source core, run it anywhere), and actually scalable (horizontally scalable clusters, high availability). It does what it says it does, it runs Terraform so you don't have to. It doesn't lock you in, doesn't charge you per-workspace like you're feeding quarters into an arcade machine, and doesn't require a sales call to see pricing.
The values underneath
At the end of the day, this isn't really about Atlantis vs. Terrateam, it's about the decisions we make when building infrastructure software, and the values those decisions reflect. We believe that concurrency is a feature (not a risk to be avoided), that security is mandatory (not an "enterprise upsell"), that open source means self-hostable (not just "visible source code"), that determinism is achievable (with the right architectural choices), and that fair pricing matters (because infrastructure shouldn't be a budget line item). These aren't just principles. They're the technical and economic foundation of how we built Terrateam.
Try it yourself
If you're running Atlantis and hitting its limits, try Terrateam. You can run both in parallel (Terrateam in shadow mode) and see the difference, most teams migrate in about 30 minutes, and if you don't like it, you can revert in 5. If you're on a TACOS platform paying $30k/month for what is fundamentally "run plan and apply safely," talk to us about self-hosting. If you just want to see how we built this, architecturally, technically, philosophically, dig into the open-source repository. It's all there.
We built a better Atlantis because Atlantis showed us what was possible, and what was necessary. Now it's time to build what should exist.
Disclosure. I'm on the team that built Terrateam, so yes, I'm biased. But I'm also an engineer who's spent two decades in systems software, and I wouldn't have built this if I didn't think the architecture was sound and the problem was real. Draw your own conclusions.