Towards a frugal userspace for Linux / Dec 2024
All the work we've been doing on biodiversity (such as LIFE) comes at a fairly large computation and storage cost due to the amount of data that we churn through. This gets worse when you consider the exploratory nature of science -- we sometimes just need to mess around with the large dataset to test hypotheses which are often shown to be wrong. So then, when the LOCO conference came around, we wrote up our thoughts on what a frugal Linux userspace might look like.
The key insight is that the Linux kernel already exposes a number of namespace mechanisms (that we use in Docker, for example), and so we explore a new OS architecture which defaults to deterministic, reusable computation with the careful recording of side-effects. This in turn allows Linux to guide complex computations towards previously acquired intermediate results, but still allowing for recomputation when required by the user. We're putting this together into a new shell known as "Shark", and this first abstract describes our early results.
Prototyping carbon-aware domain name resolution / Dec 2024
Ryan Gibb and I have been thinking about how the current Internet architecture fails to treat the carbon emissions associated with networked services as a first-class metric. So when the LOCO conference came up, we tried extending the DNS with load balancing techniques to consider the carbon cost of scheduling decisions. A next step was then to build a custom DNS server written in OCaml to actively wake machines running networked services as a side effect of the name resolution.
Extending DNS means that we maintain compatibility with existing Internet infrastructure, unlocking the ability for existing applications to be carbon-aware. This is very much a spiritual follow on to the Signposts project that I worked on back in 2013, and have always wanted to return to!
Ryan Gibb, Patrick Ferris and Anil Madhavapeddy.
Abstract in the 1st International Workshop on Low Carbon Computing.
Paper on scheduling for reduced tail task latencies / Nov 2024
Smita Vijayakumar went along to Seattle to SOCC 2024 to present her PhD research on Murmuration. This is a new scheduler for Kubernetes that allows for 15%--25% faster job completion times than the default scheduler for different job arrival characteristics in datacenters that are very busy. […71 words]
Mapping greener futures with planetary computing / Oct 2024
I got invited by Sertaç Sehlikoglu to deliver a lecture to the Masters students down at the UCL Institute for Global Prosperity. I talked about the recent work on planetary computing, with an overview of the LIFE and FOOD papers.
Towards security specifications for agentic AIs / Sep 2024
A very fun talk at ACM HOPE 2024 on some new work with Cyrus Omar and Patrick Ferris on how we can formally specify systems to be robust to code generation by AI agents. For instance, if you were to ask GitHub Copilot to generate you code to filter endangered animals out of a folder of images, it might interpret that as to delete the image, or to move it to another folder (which might be public), or just remove it from the index. Any of those options are potentially valid, so what do we do? Our idea is to use F* to specify a rich set of allowable behaviours which can then be dynamically enforced in less expressive languages, and thus offer layers of protection against over-eager (or rogue) AI agents. […183 words]
A Case for Planetary Computing / Mar 2024
Revision of planetary computing preprint
Composable diffing for heterogenous file formats / Jan 2024
This is an idea proposed as a Cambrige Computer Science Part III or MPhil project, and is available for being worked on. It will be supervised by Patrick Ferris and Anil Madhavapeddy.
When dealing with large scale geospatial data, we also have to deal with a variety of file formats, such as CSV, JSON, GeoJSON, or GeoTIFFs, etc. Each of these file formats has its own structure and semantics, and it is often necessary to compare and merge data across different file formats. The conventional solution with source code would be to use a tool such as Git to compare and merge data across different file formats. However, this approach is not always feasible, as it requires the data to be in a text-based format and the data to be structured in a way that can be compared line by line.
This project explores the design of a composable diffing specification that can compare and merge data across heterogenous file formats. The project will involve designing a domain-specific language for specifying the diffing rules, and implementing a prototype tool that can compare and merge data across different file formats. Crucially, the tool should be composable, meaning that it should be possible to combine different diffing rules to compare and merge data across different file formats. […309 words]
Where on Earth is the Spatial Name System? / Nov 2023
Paper on spatial networks on DNS at HotNets 2023
Information Flow Tracking for Heterogeneous Compartmentalized Software / Oct 2023
Paper on DIFC Deluminator interface at RAID 2023
Enabling Lightweight Privilege Separation in Applications with MicroGuards / Oct 2023
Paper on microgrounds memory API at ACNSW
Eio 1.0 – Effects-based IO for OCaml 5 / Sep 2023
An update on the OCaml EIO library at the OCaml Workshop 2023
A Case for Planetary Computing / Mar 2023
Preprint of planetary computing paper
Scheduling for Reduced Tail Latencies in Highly Utilised Datacenters / Jan 2023
This is an idea proposed as a Cambridge Computer Science PhD topic, and has been completed by Smita Vijayakumar. It was supervised by Evangelia Kalyvianaki and Anil Madhavapeddy.
Modern datacenters have become the backbone for running diverse workloads that increas- ingly comprise data-parallel computational jobs. Due to the ease of use and diversity of resources they host there has been an exponential rise in the demand for datacenters leading to high volume of traffic. Datacenters execute thousands of jobs by scheduling billions of tasks every day. To meet these demands, datacenters providers operate their clusters at levels of high utilisation. We show that under such conditions existing scheduling designs impose large wait times on tail tasks. This leads to large tail task completion times and consequently elevated job completion times that can potentially cost datacenter providers millions of dollars in terms of total cost of operations of these datacenters.
This PhD explores a new decentralised scheduling model, Murmuration, that uses multiple communicating scheduler instances to ensure tasks are scheduled in a manner that reduces their total wait times. It achieves this by scheduling all tasks of a job such that their start times are as close together as possible, thereby ensuring small tail task completion times and better average job completion times.
Computational Models for Scientific Exploration / Jan 2023
This is an idea proposed as a Cambridge Computer Science PhD topic, and is currently being worked on by Patrick Ferris. It is supervised by Anil Madhavapeddy and Srinivasan Keshav.
The modern scientific method has become highly computational, but computer science hasn't entirely caught up and is sometimes hindering research progress.
We use climate science and ecology computation needs as a case study, we are conducting a systematic study in the sources of uncertainty in these fields. We are also designing and implementing a specification language and hermetic computation environment that empowers climate scientists and ecologists to create less ambiguous, more precise and testable scientific methodologies and results, while preserving the ability to explore and introspect intermediate results. […125 words]
Trusted Carbon Credits / May 2022
With the recent controversies over low-integrity carbon credits, I spoke to Vox magazine about my skepticism about Adam Neumann's new startup.
"The problem with the current markets is nothing to do with how we can trade these more effectively," said Anil Madhavapeddy, who is an associate professor of computer science and technology at Cambridge University and the director of the Cambridge Center for Carbon Credits. "We just do not have enough supply." -- Vox
The Cambridge Centre for Carbon Credits is an initiative I started with Andrew Balmford, David A Coomes, Srinivasan Keshav and Thomas Swinfield, aimed at issuing trusted and verifiable carbon credits towards the prevention of nature destruction due to anthropogenic actions. We researched a combination of large-scale data processing (satellite and and sensor networks) and decentralised Tezos smart contracts to design a carbon marketplace with verifiable transactions that link back to trusted primary observations. […230 words]
Displaying the 15 most recent news items out of 62 in total (see all the items).