System Initiative: Not So Far From IaC

System Initiative went GA last week. We have been following SI for a while as competitors but also just curious about a different approach to doing infrastructure. Adam Jacob, the CEO of SI, has impacted Terrateam in his thoughts about how to build a business and he has been incredibly generous with his time to the point of jumping on calls with me in their Discord to give advice.

For those that aren't sure what System Initiative is, I would describe it as "Visual Smalltalk for Infrastructure". It's a visual editor, kind of like how Blender lets you define filters visually, as well as a real editor (you can write code in it), and it's focused on managing infrastructure, and everything exists inside of its black box.

System Initiative certainly is bold. If you choose to use it, it will replace your editor, your VCS, your pull request workflow, everything. But despite describing itself as "a revolutionary technology that is the future of how you will do DevOps", System Initiative does not represent a new model of DevOps. It's an interesting take on what we're already doing.

That is not to say that they aren't doing things different and, in some cases, better or that there is no innovation. But it's not a fundamental change to infrastructure management.

Conceptualizing Infrastructure as Code

In order to understand the following comparison, it's important to understand what I think Infrastructure as Code is. In the talk What if Infrastructure as Code never existed, Adam Jacob shows a slide multiple times that looks like:

Infrastructure as Code applies only to the "code" rectangle in the picture. Everything after that is fundamental to infrastructure management. It doesn't matter how you're feeding input into the interpreter, whether it be code or data, it is performing some operations against some service and managing some state. There are some details you can modify in there, but nothing changes the fundamental model of infrastructure management.

As things are right now, we in the IaC world value describing our infrastructure as code. But that isn't required. We could have an interface that reads .tf files, displays them in a GUI, allows us to manipulate it, and writes it back out, and executes OpenTofu. We have chosen to approach this model with code as the input.

So, when I say that System Initiative is not a fundamental change from how we do infrastructure now, if you believe the "code" part is a requirement for managing infrastructure, then you are going to disagree with me. But if you agree with me that the interpreter -> service -> state loop is the fundamental part, then I think you will agree.

There are clearly details that can be different here. The interpreter can expose a more interactive interface or it can be closer to a batch job, but you can't get away from this relationship between the data, the interpreter, the state, and reality.

Digital Twins

The core of SI's contribution is this concept of digital twins, also referred to as simulations or models. The rest of SI flows from this foundation of digital twins. And, while I don't know the specifics, this digital twin can be implemented at different scales of your infrastructure. You can define an EC2 digital twin but you could also define a digital twin of your entire infrastructure. In theory, this digital twin has high-fidelity to the thing it is modeling.

For example, if you update the access control list of the digital twin of your S3 bucket, now the rest of your infrastructure defined in SI can act on that change. Maybe you have a public website built on an S3 bucket and that resource can inspect the access control of the S3 bucket it's built on to validate its public. All of this is happening in hypothetical land of digital twins.

This is all great, but if you've been using Terraform or OpenTofu, this sounds a lot like the planning step. Except for being dressed up in new language, it's not really clear how this is meaningfully different from planning.

What you do in SI:

Make a change
Evaluate the change
Get feedback on what will happen.

What you do in OpenTofu:

Make a change
Evaluate the change
Get feedback on what will happen.

There are clearly differences in the details but the point remains: SI is not a fundamentally new way to develop infrastructure. But it is new.

Feedback Loop

One stated benefit of the digital twin is that, assuming it is a high-fidelity twin, the iteration speeds improve. No longer are you bound to running a whole planning step, just for it to tell you that you made a mistake. With SI, the model runs when you make a change and you get instant feedback.

There is some truth to this, I believe, but not because digital twins are so different than a planning step but because we, in the IaC world, don't trust our digital twins.

By default, when planning, OpenTofu queries the service your provider represents to determine the difference between the real world, your state file, and your code. Most IaC users I've met want that extra layer of safety. But with the flip of a switch, OpenTofu can compare your state file to your code, which takes only as long as it takes to get the state file.

There is nothing fundamental about OpenTofu that prevents faster feedback loops, it just so happens that as a community we've been conditioned to always want to compare our code and state to the real world, and that takes time due to cloud API calls.

One difference between SI and OpenTofu in this area is that OpenTofu is designed more of a batch interface: here's my change, run the whole thing, and get back to me. SI is reactive, having a more fine-grained iterative experience. The difference here is real but not that wide of a gap. OpenTofu exposes all of the tools to have a more fine-grained experience (Terrateam offers a planning strategy called fast-and-loose that does this), but as a community that's just not how we see the problem.

Bidirectional

The way SI implements these digital twins is as code + data. The digital twin is code however its parameters are just pure data stored in a database. There are no source files written by a user.

This makes reconciling drift from ClickOps a breeze in SI while still an unsolved problem in OpenTofu. It's just really hard problem to generate the code from an arbitrary ClickOps operation. Some companies, like Control Monkey, are solving this for a particular set of resources and slowly expanding, but it's non-trivial.

There are a few reasons for this but at the end of the day, because our infrastructure is defined as code, there is a human artistry element to code that makes it hard to automatically generate in a consumable way. Think of decompilers: turning an executable into the Rust source code is near impossible because the source code was written for humans to read and all of that is lost in the compilation process. IaC is similar. Our artifact is the infrastructure and how we have organized our code is lost in the realization of that infrastructure.

SI doesn't quite have that problem, though. It is data and, assuming the digital twin exactly matches the infrastructure it is modeling, the properties can be updated to match reality. How far this goes in SI, I'm not sure. Can you import whole chunks of real infrastructure and get what you would have created in SI by hand? I don't know. It depends, probably.

This is definitely a different experience from existing infrastructure tooling but mostly because of decisions we have made to prefer to define our infrastructure as code.

Validation

Where I think SI genuinely blows what we're currently doing, in the OpenTofu world, out of the water is the ability to easily attach arbitrary code to resources. Want to ensure some property is true of a particular resource? Just write up some code and it'll get executed when that resource changes. I am jealous.

Fidelity

A requirement of all this working is the quality of these digital twins. Can they truly capture all of the complex interactions that a service provider has? Time will tell. SI makes it sound easy. In the OpenTofu world, we have struggled with this, some things are just big and complicated (k8s, anyone?) and modeling them as well as the specifics of how the cloud providers serve it is no small task.

We Welcome the Disruption

I hope that SI's push for digital twins pays off because I think we, in the OpenTofu world, are not so far off in terms of technology. We have chosen this path of preferring to compare to the real world when planning. SI forcing its users out of this paradigm, and it working, would be a great benefit to the infrastructure management.

I'm also curious how digital twins being bi-directional works out. Does this end up being a killer feature? I think OpenTofu supporting bi-directional resources would be a pretty big lift, so we have to see if it's worth it.

OpenTofu has added provider-defined functions, which is a step into making the whole system more programmable. I think the next step is to add the ability to attach functions to resources. Testing IaC has improved over the last year but I think if we could easily validate our resources were what we expected, that would be a big step forward not only in simplifying testing but also in gaining more trust in our plans.

While I don't believe SI to be revolutionary, I'm hopeful that System Initiative will disrupt the status quo. It has some great ideas that OpenTofu should enthusiastically steal and say thank you. Since Terraform was released, we've both learned a lot but also have been slow to adopt those learnings. SI, in being bold, is taking those learnings, applying them, and challenging us to rethink how we manage infrastructure. It's an exciting time to be in DevOps.

Learn

Connect