Planetary Computing

Planetary computing is our research into the systems required to handle the ingestion, transformation, analysis and publication of global data products for furthering environmental science and enabling better informed policy-making. We apply computer science to problem domains such as forest carbon and biodiversity preservation, and design solutions that can scalably processing geospatial data that build trust in the results via traceability and reproducibility. Key problems include how to handle continuously changing datasets that are often collected across decades and require careful access and version control.

"Planetary computing" originated as a term back in 2020 when a merry band of us from Computer Science (Srinivasan Keshav and me, later joined by Sadiq Jaffer, Patrick Ferris, Michael Dales and bigger EEG group now) began working on Trusted Carbon Credits and implementing the large-scale computing infrastructure required for processing remote sensing data. We wrote up our thoughts in "Planetary computing for data-driven environmental policy-making".

Background

Then in early 2024, Dominic Orchard and I decided to find others interested in the problem domain, and organised the first "Programming for the Planet" (PROPL) workshop in London, co-located with POPL2024. This turned out to be a fully subscribed event, with chairs having to be brought in at one point for some of the more popular talks! Either way, it convinced us that there's a genuine momentum and need for planetary computing research as a distinct discipline.

Projects

I'm working on various systems involved with the ingestion, processing, analysis and publication of global geospatial data products. To break them down:

Data Ingestion. Ingesting satellite data is a surprisingly tricky process, usually involving lots of manual curation and trying not to crash nasa.gov or the ESA websites with too many parallel requests. We're working on a system that can ingest data from multiple sources while keeping track of provenance, including satellite imagery (see Remote Sensing of Nature), ground-based sensors (see Terracorder: Sense Long and Prosper), and citizen science data gathering. This involves a lot of data cleaning and transformation as well as parallel and clustered code, and we're investigating how to make this process more efficient and scalable.
Developer Workflow. Once data is available, we are also building a next-generation "Docker for geospatial" system that can package up precisely versioned data, code and OS environment into a single container that can be run anywhere. This is a key part of our reproducibility story, and is a work-in-progress at quantifyearth/shark.
Specification Languages. We're also working on a domain-specific language for specifying geospatial data processing pipelines, which can be compiled down to efficient code that can run on our planetary computing infrastructure. Ideally, this language would also be able to capture elements of the specification of the data at different levels of precision, so that we can swap out different data sources or processing steps without having to rewrite the entire pipeline or change the intent behind the domain expert writing the code. You can see an example of a manually written and extremely detailed pipeline in our PACT Tropical Moist Forest Accreditation Methodology v2.1 whitepaper -- converting this to readable code is a pretty big challenge!

There's a lot more to say about ongoing projects, but the overall message is: if you're interested in contributing to some part of the planetary computing ecosystem, either as a collaborator or a student: get in touch!

Cyrus Omar and his team over at Hazel language have also been working on a similar problem domain, and we're looking forward to collaborating with them. Read Toward a Live, Rich, Composable, and Collaborative Planetary Compute Engine here or watch their PROPL 2024 talk.

I've also given several talks on planetary computing, including a keynote at ICFP 2023 and at LambdaDays. Both are linked below, but the latter is the most recent one.

# 1st Jan 2022

projects conservation satellite sensing systems

Relevant Research Ideas

The efforts here center around constructing system interfaces for hermetic large-scale data processing, with careful support for versioning and spotting sources of non-determinism that lead to non-reproducibility.

Autoscaling geospatial computation with Python and Yirgacheffe
Available and cosupervised with Michael Dales
Using computational SSDs for vector databases
Available (MPhil) and cosupervised with Sadiq Jaffer
ZFS replication strategies with encryption
Currently ongoing with Becky Terefe-Zenebe and cosupervised with Mark Elvers
Bidirectional Hazel to OCaml programming
Currently ongoing with Max Carroll and cosupervised with Patrick Ferris and Cyrus Omar
Gradually debugging type errors
Currently ongoing (Part II) with Max Carroll and cosupervised with Patrick Ferris
An imperative, pure and effective specification language
Currently ongoing (Part II) with Max Smith and cosupervised with Patrick Ferris
Effective geospatial code in OCaml
Currently ongoing (Part II) with George Pool and cosupervised with Michael Dales and Patrick Ferris
Privacy preserving emissions disclosure techniques
Currently ongoing (PhD) with Jessica Man and cosupervised with Martin Kleppmann
Computational Models for Scientific Exploration
Currently ongoing (PhD) with Patrick Ferris and cosupervised with Srinivasan Keshav
Assessing high-performance lightweight compression formats for geospatial computation
Completed (MPhil) by Omar Tanner and cosupervised with Sadiq Jaffer in 2023
Towards reproducible URLs with provenance
Expired (Part II) and cosupervised with Patrick Ferris
Composable diffing for heterogenous file formats
Expired (MPhil) and cosupervised with Patrick Ferris

Anil Madhavapeddy, Professor of Planetary Computing

Planetary Computing

Background

Projects

Related News

Terracorder: Sense Long and Prosper / Aug 2024

PACT Tropical Moist Forest Accreditation Methodology v2.1 / Aug 2024