Ocaml to PostgreSQL: Building Dynamic Queries with Type Safety
Malcolm Matalka
On this page
Introduction
We recently released the Audit Trail feature of Terrateam. Audit Trail is part of the Terrateam Console, allowing users to view and query their Terrateam operations. Audit Trail is a new type of feature for us. Most of Terrateam is driven by reacting to GitHub Webhooks: receive an event, evaluate it, and initiate a GitHub Action. All of these database queries are under our control and optimized. Audit Trail gives users the ability to query the database with their own queries. This is a more challenging feature to implement, communicate, and ensure we maintain quality of service. The development and implementation of Audit Trail is a good view into how Terrateam, as a company, works.
Our Technology Stack
Terrateam is different from a lot of other companies in this space in the technology choices we make. The biggest standout is that our backend is implemented entirely in Ocaml. I’ve spoken at length about why we chose Ocaml, see here and here.
We’ve also written a lot of custom frameworks, libraries, and tooling for Ocaml. Our rule of thumb is: no to frameworks, yes to libraries. This is because we want control. Ocaml is such a powerful and expressive language, we are able to deliver features quickly despite taking on so much development. A lot of the underlying protocols and technology we interact with are quite stable, so while we might have a higher initial development cost, our maintenance cost is essentially zero as we do not have to modify our code as frameworks we do not control change over time.
Dynamic Querying with Type Safety
One of our core libraries is Pgsql_io
, an implementation of the PostgreSQL wire protocol in Ocaml. A major feature of Pgsql_io
is that it supports adding type annotations to SQL queries. In Pgsql_io
, one writes a SQL query string and then provides extra information to specify what kind of values it takes and what kind of values it returns.
Example of Using Pgsql_io
The SQL is straightforward, with placeholders for values that will be bound later. The annotated query in Pgsql_io
specifies the types of these values and how they are processed.
Type Safety at Compile Time
What’s powerful about this is that it all gets verified at compile time. That is to say, before we run the program, we know that all uses of the SQL expression match the type annotation. If we refactor an SQL expression, we change the type annotation to match, and the compiler will tell us every call site that we need to fix.
Dynamic Queries and Audit Trail
With that background, on to the interesting part: we had decided to implement a new feature called Audit Trail. While Terrateam is designed to be used within GitHub for the standard workflow, reporting operations live inside of the Terrateam Console. A common request is that users want to be able to see what operations Terrateam has performed. In a busy organization, it’s hard to do this just by scrolling through the list of runs. We wanted to give users a powerful and robust…
Terrateam already makes use of a simple query language we call the Tag Query Language. It’s used in many places in the product. The nice thing about it is that it has a syntax that is straight forward to learn and it can be used in different contexts. We just needed to define which attributes of an operation we wanted to expose to users.
The way Audit Trail works is a user types a query into the UI which makes an API call passing the query as a string. The backend parses the query and translates it to SQL and executes it. For example, the query pr:123 and user:joe
would translate to an SQL query like pull_number = 123 and username = 'joe'
.
Our Solution
At Terrateam, we generally approach software development problems with types. This is one reason we like our own frameworks: we are willing to pay the price of a less ergonomic API if it lets us use more expressive types. We look at developer productivity not just as how quickly we can develop the code, but also how difficult it is to use the code incorrectly, and how easy it is to refactor. Types are the tool that we’ve chosen to enforce these.
Implementing Dynamic Queries in Audit Trail
Now, back to our original problem: how can we implement Audit Trail queries without losing type safety?
Another way to look at this problem is that we want to generate a query where the query references values passed in by the user, for example the pull request number or the username (something like pr:123 and user:joe
). When we turn a user’s query into SQL, we need to do it in such a way that it doesn’t change the type annotation that we gave the query with Pgsql_io
.
Using SQL Arrays for Dynamic Queries
SQL arrays to the rescue! SQL supports passing arrays. We actually use this in another context: bulk inserts. Rather than construct an insert statement with a bunch of entries in the values
section, we pass in an array for each part of the row and turn it into a set of rows and pass it into insert.
Applying the Solution to Audit Trail
How does this help us with creating dynamic queries? For the types of values users can pass in, there are only a few: strings and integers. So, what if, for each value a user passes in, we put it into an array of the correct type, and then reference that array in the query. For a pull request, that would be an integer value, and for users it would be a string.