/ Ideas / A strongly consistent index for email using git and MirageOS

This is an idea proposed in 2019 as a Cambridge Computer Science Part II project, and has been completed by Oliver Hope. It was supervised by David Allsopp and Anil Madhavapeddy as part of my Unikernels project.

Summary

Maildir is a widely used format for storing emails. Its main benefit is that it uses the filesystem in such a way that client programs do not have to handle locking themselves. The downside of this is that it makes it hard to create a consistent index as we cannot guarantee that the filesystem is in a consistent state when we try to update it. If we did have a consistent index, it would allow for safer concurrent support and the implementation of new features.

The aim of this project therefore is to solve the consistency problem. This can be done by using git, the version control system, to build an overlay on top of maildir in the filesystem, allowing multiple filesystem operations to be bundled into commits. These can be used to keep track of all changes made to the maildir. As these changes are being recorded by a version control system, we can be sure that any index built on top will be strongly consistent. As git also provides branching, we can extend this model to add new features described in the possible extensions section.

The project successfully implemented this git overlay using libraries provided by MirageOS which provide git functionality, maildir operations, and even email parsing. With the overlay, and therefore consistent index implemented, the project was able to make many more guarantees about the state of the maildir at any time. This allowed for dealing with conflicting operations in an easier and more reliable manner. Furthermore, the overlay also provided the possibility of easily implementing novel features such as roll-back and separate branches for different use cases.

Oliver Hope published his dissertation repository and the source code to gitmaildir online.

Related Ideas