Summary. Planetary computing is our research into the systems required to handle the ingestion, transformation, analysis and publication of global data products for furthering environmental science and enabling better informed policy-making. We apply computer science to problem domains such as forest carbon and biodiversity preservation, and design solutions that can scalably processing geospatial data that build trust in the results via traceability and reproducibility. Key problems include how to handle continuously changing datasets that are often collected across decades and require careful access and version control.
"Planetary computing" originated as a term back in 2020 when a merry band of us from Computer Science (Srinivasan Keshav and me, later joined by Sadiq Jaffer, Patrick Ferris, Michael Dales and bigger EEG group now) began working on Trusted Carbon Credits and implementing the large-scale computing infrastructure required for processing remote sensing data. We wrote up our thoughts in Planetary computing for data-driven environmental policy-making.
Background. Then in early 2024, Dominic Orchard and I decided to find others interested in the problem domain, and organised the first "Programming for the Planet" (PROPL) workshop in London, co-located with POPL2024. This turned out to be a fully subscribed event, with chairs having to be brought in at one point for some of the more popular talks! Either way, it convinced us that there's a genuine momentum and need for planetary computing research as a distinct discipline.
Projects. I'm working on various systems involved with the ingestion, processing, analysis and publication of global geospatial data products. To break them down:
Data Ingestion. Ingesting satellite data is a surprisingly tricky process, usually involving lots of manual curation and trying not to crash nasa.gov or the ESA websites with too many parallel requests. We're working on a system that can ingest data from multiple sources while keeping track of provenance, including satellite imagery (see Remote Sensing of Nature), ground-based sensors (see Terracorder: Sense Long and Prosper), and citizen science data gathering. This involves a lot of data cleaning and transformation as well as parallel and clustered code, and we're investigating how to make this process more efficient and scalable.
Developer Workflow. Once data is available, we are also building a next-generation "Docker for geospatial" system that can package up precisely versioned data, code and OS environment into a single container that can be run anywhere. This is a key part of our reproducibility story, and is a work-in-progress at quantifyearth/shark.
Specification Languages. We're also working on a domain-specific language for specifying geospatial data processing pipelines, which can be compiled down to efficient code that can run on our planetary computing infrastructure. Ideally, this language would also be able to capture elements of the specification of the data at different levels of precision, so that we can swap out different data sources or processing steps without having to rewrite the entire pipeline or change the intent behind the domain expert writing the code. You can see an example of a manually written and extremely detailed pipeline in our PACT Tropical Moist Forest Accreditation Methodology v2.1 whitepaper -- converting this to readable code is a pretty big challenge!
There's a lot more to say about ongoing projects, but the overall message is: if you're interested in contributing to some part of the planetary computing ecosystem, either as a collaborator or a student: get in touch!
Cyrus Omar and his team over at Hazel language have also been working on a similar problem domain, and we're looking forward to collaborating with them. Read Toward a Live, Rich, Composable, and Collaborative Planetary Compute Engine here or watch their PROPL 2024 talk:
I've also given several talks on planetary computing, including a keynote at ICFP 2023 and at LambdaDays. Both are linked below, but the latter is the most recent one.
[»] Emission Impossible: privacy-preserving carbon emissions claims |
[»] Cooperative Sensor Networks for Long-Term Biodiversity Monitoring |
[»] Lineage first computing: towards a frugal userspace for Linux |
[»] Modularizing Reasoning about AI Capabilities via Abstract Dijkstra Monads |
[»] Planetary computing for data-driven environmental policy-making |
[»] Uncertainty at scale: how CS hinders climate research |
[»] Homogeneous Builds with OBuilder and OCaml |