Cohttp vs. libcurl: Why Terrateam switched to libcurl

Malcolm Matalka

On this page
HTTP/1.1 is easy and that’s about it
Terrateam uses a lot of in-house frameworks. For a long time, we have used Cohttp. Cohttp is a library for creating both HTTP/1.1
client and servers. While it has served us well for a server, for a client it has begun to show its limits. Terrateam has to operate in a range of environments and, while HTTP/1.1
is relatively straightforward, the rest of it gets complicated quickly. In particular, security can impose a wide range of requirements on an application that needs to reach out to the internet.
Cohttp is bare-bones
Cohttp is pretty bare-bones, it does not implement TLS or any proxy interfaces. A customer was deploying Terrateam behind an HTTPS proxy with a custom certificate and despite my attempts to implement that in Cohttp (it worked in every test environment I could create) we simply could not get it to work in this environment and because of security constraints we were limited in the debugging information we could obtain. But, we knew that curl worked in the environment.
We decided that the best path forward would be to switch out HTTP requests from Cohttp to curl. Not only would it solve this problem but it comes with some other benefits:
- While most people know curl from piping it into bash, curl supports a huge range of protocols and proxies. This would make Terrateam much more robust.
- Curl implements HTTP pipelining and caching, giving an efficiency benefit. Terrateam does a lot of requests to GitHub and GitLab, making the most out of those connections would be great.
- Curl has a paid support channel, so if we hit a scenario that we run into an issue, we have a fallback option (more on this later).
So, with that, we started working on replacing our Cohttp client with a Curl-based one. Luckily, curl bindings for OCaml already exist in the opam package called ocurl
, so we just needed to implement the integration into the Terrateam concurrency toolkit, Abb.
* Abb (Asynchronous Building Blocks) is Terrateam’s concurrency framework.
Curl: easy and multi
Curl has two modes: easy and multi. The easy interface is, as the name implies, meant to be easy to use. It’s not the most versatile. The multi interface allows for integrating curl into an existing program’s event loop. The easy interface requires using multi-threading to do concurrent requests. The multi interface scales very well as it allows doing any number of requests in a single thread.
easy
Doing a request in the easy interface:
- Create an easy handle.
- Configure the handle for the request, such as setting the URL and request type.
- Tell it to perform the operation.
- Read the results
multi
Doing a request with the multi interface:
- Create a multi handle. This is created once and is used for multiple requests.
- Setup the
socketfunction
callback. This is used by libcurl to register file descriptors to be tracked by the caller. This is done once as well. - Create an easy handle.
- Setup the easy handle.
- Add the easy handle to the multi handle.
- Tell the multi handle to perform any work it can. As part of that, it will call the
socketfunction
callback to ask the caller to track any file descriptors. - The caller registers the file descriptors with however it does that (in the case of Terrateam, that’s
kqueue
). - When an event on a file descriptor happens, the caller calls the
action
function on the multi handle with the file descriptor that is ready. - The caller calls the
perform
function on the multi handle to perform any work that can be. - The caller asks the multi handle for any completed easy handles.
- Remove the easy handle from the multi interface.
- Clean up.
- Clean up the multi handle at the end of the program.
Certainly there is more work involved in the multi interface but that is because it is exposing a more expressive interface in order to scale better.
Try #1: The multi interface in Abb
Abb is the event loop that Terrateam uses, so it seems like that should work great in the multi interface. The implementation is around 900 lines of code and the test program works under basic load. We have an extensive system test suite, so I started running it and noticed some really weird errors. Abb does not like file descriptors disappearing without being told first, and that was the error I was getting. This sort of error is one of the strengths of having a monorepo, I added some logging to the scheduler, which is the lowest level library in Abb, recompiled, and ran the test suite again.
socketfunction
To understand what was going on, it’s necessary to understand how the socketfunction
callback is meant to work and how it’s implemented in Abb.
The socketfunction
callback is part of the multi interface and is used by libcurl to tell the caller when to start and stop monitoring a file descriptor. The callback is called with a file descriptor and an action:
POLL_IN
- Monitor the file descriptor for reading.POLL_OUT
- Monitor the file descriptor for writing.POLL_REMOVE
- Stop monitoring the file descriptor.
libcurl expects the operation to be done in the callback. That is, for POLL_IN
and POLL_OUT
, the file descriptor should be registered in the callback, and for POLL_REMOVE
, it should be unregistered in the callback. The problem is, that’s really hard to do in Abb.
In every operation in Abb, it is secretly threading the scheduler through the program. In a read operation, like the following, the >>|
operator is actually suspending the execution of the program, then later getting called with the scheduler as an input, registering the operation with the scheduler, then calling the function once the read is ready.
Buffered.read ic ~buf ~pos:0 ~len:n>>| function | Ok n -> Bytes.sub_string buf 0 n
Problems integrating libcurl with Abb
There is no way to make this work in the socketfunction
callback, we just don’t have any way to get access to the scheduler. To get around this, the implemented solution actually adds the operation to a queue that is processed after the callback:
Curl.Multi.set_socket_function t.Server.mt (fun fd poll -> Queue.add (Event.Socket (fd, poll)) t.Server.ev_queue);
For POLL_IN
and POLL_OUT
, that works fine, because libcurl will not be able to do anything with the file descriptor until we call the corresponding action
function on the file descriptor. But the problem is POLL_REMOVE
because libcurl may close the socket after calling socketfunction
. But we haven’t actually done the work to handle that yet! We’ve just added the event to our queue. So the socket is closed before we actually unregister it. And that’s why Abb is getting confused about missing file descriptors.
For those familiar with the libcurl, they might be saying “but there is a closesocketfunction
callback, you could use that to close the socket when you’re ready”. True! However, libcurl, as it’s compiled on most operating systems, opens more than just sockets. By default, for some operations, it uses a file descriptor for inter thread communication. The file descriptors are not part of the easy handle (closesocketfunction
is an easy handle callback). So libcurl calls socketfunction
with those file descriptors, it does not provide any way to override the close operation. And, in reality, using closesocketfunction
for this is a hack. That’s really not what the callback is for. It’s for handling open and close in special ways, but not really changing when the operation is performed.
To get around this, we could compile libcurl with --disable-socketpair
, but I don’t really like this because there is no guarantee that this will be the only case of something like this, and I don’t really want to force users to compile their own libcurl and deal with all that management. What’s the alternative?
What are the semantics here anyways?
Part of the problem here is that I have a different view of the semantics of the socketfunction
callback than libcurl. And I understand where the difference comes from, even if I disagree.
In the libcurl view, the multi interface is very recipe-like. Do this, do that, call action
when the file descriptor is ready.
In my view, it’s more of a question of ownership. socketfunction
is either asking the caller to take ownership of the file descriptor (POLL_IN
and POLL_OUT
) or asking it to give ownership back (POLL_REMOVE
). When it comes to POLL_IN
and POLL_OUT
, the caller is expected to call action
, which tells libcurl that it can perform some operations on file descriptor during the action
call. In this view, what’s missing is that there is no corresponding action
call for POLL_REMOVE
. I believe the correct solution here is treat POLL_REMOVE
as telling the caller to unregister the socket and wait for the corresponding action
call to do the next work. That is not backwards compatible, of course, so another option would have to be added to the multi interface to support it.
I talked to the folks at libcurl about sponsoring the work. Unfortunately, our usage of libcurl is not common enough to really motivate such a large change to libcurl. I don’t know the libcurl codebase very well but I can understand that it’s probably a heavy lift to add a whole different way to interact with socketfunction
, just for one use case that has not been seen in 20+ years of development.
Try #2: The easy interface in a thread pool
We really needed this functionality out to enable our customer, so we decided to go with a less optimal solution for now: we just run the easy interface in a thread pool. It’s less efficient but it works. Customer unlocked. Great.
Soon to be try #3: Multi interface in a thread
libcurl won’t change for us, which is understandable. We won’t change for libcurl either, as there is no real way to access to the scheduler inside the socketfunction
callback. We’re at somewhat of a stalemate. We also know that the thread pool solution is not what we want long-term. You can see the extra resource consumption in the graphs during moderate load and Terrateam is only growing.
I thought for a long time about how to integrate the multi interface into the main event loop. The problem is that Abb really wants to unregister any file descriptors it’s monitoring before they are closed. The fact that file descriptors are reused in a Unix-like OS makes it hard to work in an uncoordinated environment. You’re guaranteed to get the same file descriptor the next time one is made.
But, we have to work with what we have, and despite any complaints I may have about libcurl, it’s an amazingly powerful and robust library, so I want to use it.
The next iteration
The plan for the next iteration is to implement a secondary, curl specific, event loop that runs in its own thread. We will interact directly with kqueue
rather than Abb. There will be some overhead in this as we’ll need to use our own pair of file descriptors to trigger communication between the main event loop and the curl event loop, but it won’t be bad, and very little compared to the thread queue. We’ll get all of our benefits of pipelining and caching.
Final score: libcurl 1, Terrateam 0.5. We’re getting a robust HTTP client out of it, albeit it’s not implemented exactly how we want, but that’s OK. The power of curl is worth it.