Under the hood with Apple's new Containerization framework / Jun 2025
Apple made a notable announcement in WWDC 2025 that they've got a new containerisation framework in the new Tahoe beta. This took me right back to the early Docker for Mac days in 2016 when we announced the first mainstream use of the hypervisor framework, so I couldn't resist taking a quick peek under the hood.
There were two separate things announced: a Containerization framework and also a container CLI tool that aims to be an OCI compliant tool to manipulate and execute container images. The former is a general-purpose framework that could be used by Docker, but it wasn't clear to me where the new CLI tool fits in among the existing layers of runc, containerd and of course Docker itself. The only way to find out is to take the new release for a spin, since Apple open-sourced everything (well done!).
[…1934 words]ZFS replication strategies with encryption / Jun 2025
This is an idea proposed as a good starter project, and is currently being worked on by Becky Terefe-Zenebe. It is co-supervised with Mark Elvers.
We are using ZFS in much of our Planetary Computing infrastructure due to its ease of remote replication. Therefore, its performance characteristics when used as a local filesystem are particularly interesting. Some questions that we need to answer about our uses of ZFS are:
- We intend to have an encrypted remote backups in several locations, but only a few of those hosts should have keys and the rest should use raw ZFS send streams.
- Does encryption add a significant overhead when used locally?
- Is replication faster if the source and target are both encrypted vs a raw send?
- We would typically have a snapshot schedule, such as hourly snapshots with a retention of 48 hours, daily snapshots with a retention of 14 days, and weekly snapshots with a retention of 8 weeks. As these snapshots build up over time, is there a performance degradation?
- Should we minimise the number of snapshots held locally, as this would allow faster purging of deleted files?
- How does ZFS send/receive compare to a peer-to-peer backup solution like Borg Backup, given that it gives a free choice of source and target backup file system and supports encryption?
- ZFS should have the advantage of knowing which blocks have changed between two backups, but potentially, this adds an overhead to day-to-day use.
- On the other hand, ZFS replicants can be brought online much more quickly, whereas Borg backup files need to be reconstructed into a usable filesystem.
Solving Package Management via Hypergraph Dependency Resolution / Jun 2025
New preprint survey on energy-aware deep learning on embedded hardware / May 2025
Josh Millar has just released the latest survey paper he lead on energy-aware approaches to optimise deep-learning training and inference on embedded devices, such as those benchmarked in "Benchmarking Ultra-Low-Power µNPUs" recently.
We present an overview of such approaches, outlining their methodologies, implications for energy consumption and system-level efficiency, and their limitations in terms of supported network types, hardware platforms, and application scenarios. We hope our review offers a clear synthesis of the evolving energy-aware DL landscape and serves as a foundation for future research in energy-constrained computing.
Any comments, please do let any of us know!
Energy-Aware Deep Learning on Resource-Constrained Hardware
Josh Millar, Hamed Haddadi, and Anil Madhavapeddy.
Working paper at arXiv.
Talks from LOCO24 are now available online / Apr 2025
The sister conference to PROPL was held late last year in Scotland with a bumper attendance from Cambridge. All of the talks from it are now available online at YouTube, or on our ad-free EEG video site. The keynote from Anne Currie was fantastic and wide-ranging (she is the author of the eerily predictive Panopticon series):
[…197 words]Webassembly on exotic architectures (a 2025 roundup) / Apr 2025
It's about the time of the academic year to come up with project ideas! KC Sivaramakrishnan, Andy Ray and I have been looking into FPGA/OCaml matters recently so I thought I'd review the latest in the land of Webassembly for non-traditional hardware targets. It turns out that there are very fun systems projects going on to turn wasm into a "real" target architecture on several fronts: a native port of Linux to run in wasm, a port of wasm to run in kernel space, a POSIX mapping of wasm, and fledgling wasm-CPUs-on-FPGAs.
[…1130 words]Lineage first computing: towards a frugal userspace for Linux / Apr 2025
Unikernels wins the ASPLOS most influential paper award / Apr 2025
I was gobsmacked to get a note from the SIGARCH ASPLOS steering committee that our 2013 paper "Unikernels: library operating systems for the cloud" won the most influential paper award at the conference last week! I couldn't make it to Rotterdam myself due to the travel time, but Richard Mortier was already there and so accepted the award on the whole team's behalf!
[…1524 words]Semi distributed filesystems with ZFS and Sanoid / Apr 2025
Over in my EEG group, we have a lot of primary and secondary datasets lying around: 100s of terabytes of satellite imagery, biodiversity data, academic literature, and the intermediate computations that go along with them. Our trusty central shared storage server running TrueNAS stores data in ZFS and serves it over NFSv4 to a bunch of hosts. This is rapidly becoming a bottleneck as our group and datasets grow, and Mark Elvers has been steadily adding lots more raw capacity. The question now is how to configure this raw SSD capacity into a more nimble storage setup. If anyone's seen any systems similar to the one sketched out below, I'd love to hear from you.
[…1676 words]Autoscaling geospatial computation with Python and Yirgacheffe / Apr 2025
This is an idea proposed as a good starter project, and is available for being worked on. It may be co-supervised with Michael Dales.
Python is a popular tool for geospatial data-science, but it, along with the GDAL library, handle resource management poorly. Python does not deal with parallelism well and GDAL can be a memory hog when parallelised. Geo-spatial workloads -- working on global maps at metre-level resolutions -- can easily exceed the resources available on a given host when run using conventional schedulers.
To that end, we've been building Yirgacheffe, a geospatial library for Python that attempts to both hide the tedious parts of geospatial work (aligning different data sources for instance), but also tackling the resource management issues so that ecologists don't have to also become computer scientists to scale their work. Yirgacheffe can:
- chunk data in memory automatically, to avoid common issues around memory overcommitment
- can do limited forms of parallelism to use multiple cores.
Yirgacheffe has been deployed in multiple geospatial pipelines, underpinning work like Mapping LIFE on Earth, as well as an implementation of the IUCN STAR metric, and a methodology for assessing tropical forest interventions.
[…453 words]A hardware description language using OCaml effects / Mar 2025
This is an idea proposed as a Cambridge Computer Science Part III or MPhil project, and is available for being worked on. It may be co-supervised with KC Sivaramakrishnan and Andy Ray.
Programming FPGAs using functional programming languages is a very good fit for the problem domain. OCaml has the HardCaml ecosystem to express hardware designs in OCaml, make generic designs using the power of the language, then simulate designs and convert them to Verilog or VHDL.
HardCaml is very successfully used in production at places like Jane Street, but needs quite a lot of prerequisite knowledge about the full OCaml language. In particular, it makes very heavy use of the module system in order to build up the circuit description as an OCaml data structure.
Instead of building up a circuit as the output of the OCaml program, it would be very cool if we could directly implement the circuit as OCaml code by evaluating it. This is an approach that works very successfully in the Clash Haskell HDL, as described in this thesis. Clash uses a number of advanced Haskell type-level features to encode fixed-length vectors (very convenient for hardware description) and has an interactive REPL that allows for exploration without requiring a separate test bench.
[…296 words]Towards a frugal userspace for Linux / Dec 2024
All the work we've been doing on biodiversity (such as LIFE) comes at a fairly large computation and storage cost due to the amount of data that we churn through. This gets worse when you consider the exploratory nature of science -- we sometimes just need to mess around with the large dataset to test hypotheses which are often shown to be wrong. So then, when the LOCO conference came around, we wrote up our thoughts on what a frugal Linux userspace might look like.
The key insight is that the Linux kernel already exposes a number of namespace mechanisms (that we use in Docker, for example), and so we explore a new OS architecture which defaults to deterministic, reusable computation with the careful recording of side-effects. This in turn allows Linux to guide complex computations towards previously acquired intermediate results, but still allowing for recomputation when required by the user. We're putting this together into a new shell known as "Shark", and this first abstract describes our early results.
Prototyping carbon-aware domain name resolution / Dec 2024
Ryan Gibb and I have been thinking about how the current Internet architecture fails to treat the carbon emissions associated with networked services as a first-class metric. So when the LOCO conference came up, we tried extending the DNS with load balancing techniques to consider the carbon cost of scheduling decisions. A next step was then to build a custom DNS server written in OCaml to actively wake machines running networked services as a side effect of the name resolution.
Extending DNS means that we maintain compatibility with existing Internet infrastructure, unlocking the ability for existing applications to be carbon-aware. This is very much a spiritual follow on to the Signposts project that I worked on back in 2013, and have always wanted to return to!
Ryan Gibb, Patrick Ferris, and Anil Madhavapeddy.
Abstract in the 1st International Workshop on Low Carbon Computing.
Paper on scheduling for reduced tail task latencies / Nov 2024
Smita Vijayakumar went along to Seattle to SOCC 2024 to present her PhD research on Murmuration. This is a new scheduler for Kubernetes that allows for 15%--25% faster job completion times than the default scheduler for different job arrival characteristics in datacenters that are very busy.
[…71 words]Mapping greener futures with planetary computing / Oct 2024
I got invited by Sertaç Sehlikoglu to deliver a lecture to the Masters students down at the UCL Institute for Global Prosperity. I talked about the recent work on planetary computing, with an overview of the LIFE and FOOD papers.
Displaying the 15 most recent news items out of 73 in total (see all the items)