Personal Containers
As cloud computing empowered the creation of vast data silos, I investigated how decentralised technologies might be deployed to allow individuals more vertical control over their own data. Personal containers was the prototype we built to learn how to stem the flow of our information out to the ad-driven social tarpits. We also deployed personal containers in an experimental data locker system at the University of Cambridge in order to incentivise lower-carbon travel schemes.
I've had a passion for self-hosted, decentralised computing for many years since Nick Ludlam and I set up the recoil.org collective in the late 90s. In late 2008, I'd been working on early cloud computing as part of the Xen Hypervisor project and already seeing the rapid rise of centralised data gathering in the early cloud providers. When I left Citrix in 2009, I joined Derek McAuley and Jon Crowcroft in their new Horizon Digital Economy centre to lead a charge into building more privacy-centred digital infrastructure. I had the huge privilege of receiving a strings-free 5-year postdoctoral fellowship in Cambridge. It's rare to see such long term postdoc opportunities these days, but something I am hugely supportive of for new projects.
My hacking first began with Nick Ludlam in 2008 on a prototype of a lifedb server and app, which we envisioned as a place to aggregate all the messages from disparate sources (for example, to mirror the then-new Twitter service into my IMAP email). I worked on Privacy Butler: A Personal Privacy Rights Manager for Online Presence to add a policy engine to this prototype. While the prototype worked well enough for me, it was largely a negative result since it was just too risky to put all that private data in one location (especially aggregated).
Now back at Cambridge in 2010, I began working with Thomas Gazagnaire on a more robust implementation of data aggregation that would have stronger end-to-end security and privacy. We started coding up an implementation in OCaml to followup my Functional Internet Services work, and built out infrastructure like an OCaml ORM in Dynamics for ML using Meta-Programming to make it easier to work with databases. It became obvious pretty quickly that having this much data in one place required end users to become sysadmins, and so I started to lay out a new architecture for this sort of end-user managed data in Multiscale not multicore: efficient heterogeneous cloud computing.
Our first prototype of a personal container running as a unikernel was published in Turning Down the LAMP: Software Specialisation for the Cloud, and would form the basis of the MirageOS project. To this day, the MirageOS community remains passionate about decentralised systems from these origins! We explored a number of directions in the early days:
- Using Dust Clouds to Enhance Anonymous Communication looked into spawning tiny unikernels on public cloud infrastructure to form a "fast flux" for onion routing. This remains a pretty good idea and something I'd like to see implemented on modern public clouds!
- The personal container, or your life in bits was the evolution of the lifedb into the "personal container". Although its domain name is now offline, you can still find the original perscon.net blog repository. I worked pretty hard on a perscon prototype that you can read about in Pulling together a user interface and Yurts for Digital Nomads.
- CIEL: A universal execution engine for distributed data-flow computing investigated what a distributed dataflow engine might look like to help with processing the vast amounts of personal data we were working with. The primary author of CIEL Derek Murray went on to develop Naiad and other influential systems in this space, but I still like CIEL's very simple model. I built a simple continuation based implementation in DataCaml: distributed dataflow programming in OCaml, and as of 2021 am continuing this work again with OCaml's multicore effects in OCaml Labs.
- From an Internet architecture perspective, another fascinating line of thought we came up with was the notion of giving every user their own domain name server that would give them fine-grained control over network connectivity. The Signposts: end-to-end networking in a world of middleboxes and Lost in the Edge: Finding Your Way with DNSSEC Signposts papers both lay out an architecture for a DNSSEC-based dynamic DNS server that users can control. We explored how a "polyversal TCP" might look for making p2p connections from this in Evolving TCP: how hard can it be?, as well as a software Openflow switch to route data from cloud to edge devices in Cost, Performance & Flexibility in OpenFlow: Pick three.
- Exploring Compartmentalisation Hypotheses with SOAAP was the result of my collaboration with the just-established CHERI project at the Computer Lab on compartmentalisation interfaces, another area of programming that continues to need improvement.
One of the main drivers for personal containers was to drive applications that would otherwise be too invasive from a privacy perspective. Ian Leslie and I worked on the "c-aware" project in Confidential carbon commuting: exploring a privacy-sensitive architecture for incentivising 'greener' commuting to figure out if personal containers could help influence user behaviour to reduce carbon usage. Overall, this project taught us just how much effort it would be to deploy real-world infrastructure in corporate environments like the University of Cambridge. We also struggled to get any users to deploy our prototype servers, something explored more in user studies with colleagues in Horizon Nottingham in Perceived risks of personal data sharing.
My work on personal data processing petered out from a research perspective in around 2013 since the underlying infrastructure I had built really started gathering steam with Unikernels and OCaml Labs. We hadn't quite cracked the problem of how to break the cloud hegemony, but (as with XenoServers and Xen), the pieces that succeeded emerged from the research questions we asked. However, I don't consider this project permanently closed by any means -- after all, I've been self hosting my email since 1997! We've been working steadily over the past decade of MirageOS (as of 2021) to build out a really solid, self-hosted protocol stack that will work as a unikernel. I am revisiting the question of decentralisation in the form of physical infrastructure in the Interspatial OS project, and you can read my early thoughts in An architecture for interspatial communication.
Related News
- An architecture for interspatial communication / Apr 2018
- Interspatial OS / Jan 2018
- Using Dust Clouds to Enhance Anonymous Communication / Mar 2014
- Lost in the Edge: Finding Your Way with DNSSEC Signposts / Aug 2013
- Perceived risks of personal data sharing / Feb 2013
- Evolving TCP: how hard can it be? / Dec 2012
- Signposts: end-to-end networking in a world of middleboxes / Sep 2012
- Exploring Compartmentalisation Hypotheses with SOAAP / Sep 2012
- Cost, Performance & Flexibility in OpenFlow: Pick three / Jun 2012
- Confidential carbon commuting: exploring a privacy-sensitive architecture for incentivising 'greener' commuting / Apr 2012
- OCaml Labs / Jan 2012
- Dynamics for ML using Meta-Programming / Jul 2011
- DataCaml: distributed dataflow programming in OCaml / Jun 2011
- CIEL: A universal execution engine for distributed data-flow computing / Mar 2011
- The personal container, or your life in bits / Oct 2010
- Turning Down the LAMP: Software Specialisation for the Cloud / Jun 2010
- Yurts for Digital Nomads (via perscon.net) / Apr 2010
- Pulling together a user interface (via perscon.net) / Apr 2010
- Multiscale not multicore: efficient heterogeneous cloud computing / Apr 2010
- Privacy Butler: A Personal Privacy Rights Manager for Online Presence / Mar 2010
- Unikernels / Jan 2010
- Functional Internet Services / Jan 2003
- Xen Hypervisor / Jan 2002
Relevant Research Ideas
An interest in self-hosting data and in developing local-first processing approaches is essential.
Improving Resilience of ActivityPub Services
Completed (Part II) by Gediminas Lelešius in 2023Simulating XMPP Group Communication
Completed (Part II) by Farhān Mannān in 2011