Status update, December 2022

The days are getting colder and shorter here in Germany. Christmas preparations need to be made, cookies have been baked. The next month will likely see reduced activity while I spend time with the family over Christmas and New Year’s. So lets take a look at what happened in the last one!

The forge

A lot of time was dedicated to the forge last month, unfortunately much of that is not really (yet) visible. One of the time killers was the issue of builds not being triggered by patches sent to mailing lists. Even just attempting to reproduce this issue was quite involved: you need lists.sr.ht and builds.sr.ht (obviously), meta.sr.ht (always), git.sr.ht (for the build manifests), and hub.sr.ht (for the webhooks). Add in various problems with Python dependencies and you’ve got yourself a “busy for a while”.

All this was in vain, too, because it turned out the issue could not be reproduced locally. While frustrating, that is of course a strong signal for what type of things to look for. So I started poking around in production, for which the GraphQL API was absolutely fantastic. After improving the error handling of the relevant webhooks a bit we were able to zoom in on one specific action (creating a job group for the potentially multiple builds to be started) that was simply taking too long, causing a timeout. As always, once the issue was found the fix was rather simple: change the SQL query to utilize an already existing index (though we will also add another index for other queries).

The lesson learned? I look at this way: it had worked before, so it was essentially caused by SourceHut’s growth, which is something to celebrate! ;)

Another scaling issue - though of a different kind - that I looked into is with pages.sr.ht. It was reported that uploading tarballs that are well below our size limit but contain a significant amount of small files will often time out.

This is due to classic problem from the micro-services world: fan-out. I don’t consider sr.ht to be micro-services, but the principle is the same. The user makes a single request, uploading a tarball. The service (pages.sr.ht) unpacks the tarball and has to make one request to the blob storage for every single file. This is obviously something that can not be trivially changed. But in fact, the real situation is even a little worse. For valid reasons, pages.sr.ht has to make three requests to storage for every file. This is however something that can be worked around by making some changes to the service architecture (hopefully fully transparent to the user). So, while the problem per se will remain, we hope to reduce its impact three-fold. That might just be good enough for reasonable uploads to no longer fail.

Some other things I worked on, mostly still pending review, include a shot at switching the invoice generation from Python to Go, addressing the infamous inline favicon, and letting people use <del> tags in their markdown again.

Kubernetes

Our experimental Kubernetes cluster gained a Ceph “cluster” - currently using only one disk on one server. This is just so that we have something for proof-of-concepts that require it. We will work in parallel on building a more realistic setup. But having this allowed me for example to deploy a Postgres cluster using the Kubegres controller.

Just from an infrastructure point-of-view, I am quite happy with Kubegres. You only need Kubegres itself and it works with the upstream Postgres Docker images, both of which are easy to reproduce.

Ceph integration, on the other hand, gave me the shivers. Apparently there was once a built-in plugin, but it is deprecated and I could not get it to work. Instead, you now need a “Container Storage Interface” plugin. This is provided by ceph-csi, but in addition to that it wants to run no less than five images from registry.k8s.io/sig-storage. That’s Kubernetes, alright…

With this in place, however, I am now working on the actual sr.ht services.

Tokidoki

In lighter news, I’d like to mention Tokidoki again. Over the past couple of months, I have pestered my colleague emersion a great deal (<3) to merge lots of changes into go-webdav, the underlying library.

Tokidoki itself is a simple-to-use, performant front-end and storage layer for go-webdav. The main limitation (in go-webdav) right now is probably that you can still only have one calendar/address book per user. But we did complete a major refactoring that will make adding support for multiple resources easy. If that’s not an issue for you then it works great as a personal CalDAV/CardDAV server.

Its initial development was sponsored by Migadu, a privacy-focused email provider. I will of course continue extending and maintaining Tokidoki. I will also do a separate write-up of problems past and future. But in the meantime, if you are interested in a simple CalDAV/CardDAV server, why not give it a try?

Vomit & vsync

Just when I thought I had vsync in a pretty good state, I was painfully reminded that IMAP is a very “flexible” protocol. Sometimes dog-fooding your own software just doesn’t cut it - it’s the diversity that brings all the fun :)

I did rectify this immediately though, and I am confident that now it is in a pretty good state and should be usable by the general public. I’ve also picked the IMAP LIST-STATUS extension as next target for making vsync even faster (thanks Tim for pointing that out), however that needs to go into rust-imap first.

Meanwhile, the vomit CLI - now in its own crate - is undergoing a major refactoring. Using the concepts of URIs for addressing (parts of) mails turned out flawed, and has been replaced by a more path-like approach. I also decided to start using the Message-Id as a way to identify an email instead of its file name. It makes for a much more recognizable identifier. The only drawback is of course that there are occasionally mails that don’t have one. So far, this was rare enough that I just ignored this. We’ll see how this works after using it for a while.

Finding myself a new rabbit hole

As you may have noticed from my passion of talking about the initramfs: I have a strong interest in the early boot process. In large-scale deployments, the initramfs is often loaded from a remote host through a process called PXE. But why not go deeper? That process in turn is often performed using the open source iPXE firmware. iPXE offers some nice features, for example basic scripting and the ability to download the initramfs over HTTP instead of TFTP. In fact, iPXE even implements some basic TLS for HTTPS.

I can actually imagine a few really neat things one could do with this. However, the TLS support (which is not enabled in the default builds) is limited to begin with, and unfortunately has a problem with your average Let’s Encrypt certificate. The latter basically breaks it for half the internet, so that’s not great.

At this point, I must emphasize that you have to appreciate what iPXE is doing here. iPXE compiles to firmware that is essentially being run by the BIOS. There is no operating system, no file system. No sockets. No threads. Not even processes! Nothing!

Hence, iPXE includes actual network drivers, its own networking stack, and its own “standard library” - if that’s what you want to call it. As you can maybe start to imagine, this is not the kind of environment where you just “use some library” and expect it work. I hope in this context, the lack of modern TLS appears as what it is: a simple lack of significant engineering resources.

Now, I have some engineering resources, but I wouldn’t dare call them significant. I have also never written a TLS library before, so no experience to make up for that, either. What I do have a lot experience with, though, is shoehorning stuff into other stuff. So I decided to take a look at just how hard it might be to get an existing TLS library to work on iPXE.

Given the unique constraints of the environment, I hope it’s obvious that OpenSSL wouldn’t be the right choice here. But I figured a library aimed at embedded devices just might work (with sufficient shoehorning applied).

My first try was with WolfSSL. It seemed pretty well-suited, as you can remove just about any feature at build time. Remember, there is no file system, hence there is no such support in the iPXE build environment. So not only can we not run any code that would access the file system - we cannot even build it. Fun. However, WolfSSL fared well on that front. I was able to compile an iPXE image that included it, with very minor modifications to iPXE and WolfSSL. I was even able to perform a handshake. However, I then experienced something that looked much like a stack corruption or such. Most likely of my own doing, but it was extremely hard to debug, so I got frustrated. I decided to try something else.

The second try was Mbed-TLS. And it turns out that bug of mine may just have been fate, as I actually ended up liking it a bit better than WolfSSL. It, too, can be heavily customized by means of #ifdefs at build time. The integration into iPXE was much quicker, thanks to my previous efforts. And this time, I have actually managed to handshake, transfer a file, and send and receive the final close-notify. In wireshark, it looks perfect.

The only problem? I have not yet figured out how to close the connection without making iPXE think the transfer was aborted. I know it sounds silly, but the abstractions that iPXE is using are pretty unique. I am, however, 99% sure that this problem is not related to Mbed-TLS. If only I would have had two more days…

So there, this is my new rabbit hole. But at this point, I really hope that next month I will be able to present a solution. I think it would be a great step forward for the iPXE project if this were to work. It provides something extremely valuable, and the resources consumed by developing a custom TLS stack could probably be put to good use in many other parts of the project.

Celebrate

This will be my last post for the year, so I wish you happy calendar wrapping. And regardless of which if any holidays you celebrate around this time of the year - do take some time to reflect and celebrate the little things in life.

As always, you can send any feedback to my public inbox, or come talk to bitfehler on libera.chat!