Programming for the Planet at ICFP/SPLASH 2025 / Oct 2025 / DOI
This is part 1 of 5 of a See also in the ICFP25 series: chairing PROPL25, the OxCaml tutorial, multicore at Jane Street and Docker, post-POSIX IO and what I learnt.
The first outing of PROPL was
last year in London, and this time around

The workshop itself had slightly less in-person attendance than last year, but
this was due to the heavily multi-track structure of ICFP. We had less total
time than
last year as the morning slot was taken up by the ICFP keynote (the
awesome
The papers were exactly what I'd dreamed would happen -- a variety of practitioners describing their computational challenges mixed together with solutions. I'll summarise the day's proceedings next!

Computational challenges
The first batch of talks I'll cover were about how to specify some core computational models related to climate and biodiversity science.
Firstly, climate modelling was up with Chinmayi Prabhu presenting a paper on climate model coupler verification that discussed the difficulty of folding multiple global climate models into combined ones, something that is normally done via (underspecified and somewhat black magic) coupler components. Chinmayi had found some bugs in production coupler code, and described a hybrid verification strategy that used both static and runtime techniques to improve the state of affairs.
The continuous exchange of data through couplers creates the risk of subtle errors propagating across components, potentially distorting scientific conclusions. In this paper, we argue for lightweight formal verification techniques applied at the coupler interface to improve both coupler and model correctness. -- Towards Modelling and Verification of Coupler Behaviour in Climate Models

Then we heard about hydrology modeling using GPUs over in India, where algorithms to trace the path of surface water flows (e.g. flow accumulation, watershed delineation or runoff simulation) are hard to execute for large areas at reasonably fine spatial and temporal resolutions.
Libraries like GDAL that use multi-threaded CPU-based implementations running on a single host may be slow, and distributed infrastructures like Google Earth Engine may not support the kind of computational primitives required by these algorithms.
We have developed a GPU-accelerated framework that re-engineers these four algorithms and is able to process areas as large as river basins of 250,000 km2 on commodity GPU workstations. -- GPU-Accelerated Hydrology Algorithms for On-Prem Computation

Continuing on in the previous theme of novel computation models,
Michael used our work on the

Geospatial data management
Switching tack from computation to managing large-scale datasets, we had a number of papers discussing how to orchestrate these both for full execution but also in a developer friendly way for local use.
The first extremely ambitious talk was from

Jean-Michel described how off-the-shelf software could (almost) be enough to integrate the world's biodiversity dataset pipelines, but needed some help from their maintainers. Most notably, BON-in-a-box facilitates peer review of computation pipelines (as opposed to the science underpinning them), which is the first time I've seen peer review applied to scientific code. This "connecting the dots" across diverse biodiversity datasets is vital towards building a comprehensive model of life on this planet, and computer science is a crucial piece of the puzzle to make sense of all the data.
We propose STACD (STAC extension with DAGs), an extension to STAC specifications that incorporates Directed Acyclic Graph (DAG) representations along with defining algorithms and version changes in the workflows. We also provide a reference implementation on Apache Airflow to demonstrate STACD capabilities such as selective recomputation when some datasets or algorithms in a DAG are updated, complete lineage construction for a dataset, and opportunities for improved collaboration and distributed processing that arise with this standard. -- STAC Extension with DAGs
This one really reminded me of the work I did ages ago with
The third talk was from the the Catalysts Foundation from India, and highlighted the importance of data science for promoting health and wellbeing in some of the most vulnerable rural communities, who are at serious risks of increasing intensity due to climate change.
Climate change presents multifaceted public health challenges, from heat-related mortality and vector-borne disease expansion to water contamination and respiratory ailments. The 2022 Lancet Countdown Report demonstrates a host of health effects of climate change ranging from heat-related illness and mortality to the spread of vector-borne and water-borne pathogens, to rising food insecurity as cropping patterns change. Current public health systems lack integrated, real-time data capabilities to identify vulnerable populations and coordinate timely responses to these climate-induced health threats, particularly in resource-constrained settings. -- Precision Action Towards Climate and Health (PATCH)

Prerak talked about the difficulty of combining geospatial data with machine learning inference, and keeping track of the resulting outputs in a systematic way. What I found particularly interesting about their "PATCH" system is that it not only has core computing facilities (a health reporting platform, a spatial counterfactual map for interventions and a communications channel for different stakeholders), but they also extensively partner with local state governments in India (like the India Meteorological Department and the All India Institutes of Medical Sciences.

All these talks highlighted the difficulty of managing large and often very messy datasets in practise. So how do we move towards more principled platforms to fix the situation? That's what the next group of talks covered!
Towards a giant planetary wiki of code
By far my favourite aspect of PROPL was he sheer ambition on display when it comes to leveraging the network effects around computer technology to accelerate the pace of environmental action. The next batch of papers is all about evolving notebooks to be global scale!
This paper proposes Fairground, a computational commons designed as a collaborative notebook system where thousands of scientific artifacts are authored, collected, and maintained together in executable form in a manner that is FAIR, reproducible, and live by default. Unlike existing platforms, Fairground notebooks can reference each other as libraries, forming a single planetary-scale live program executed by a distributed scheduler. -- A FAIR Case for a Live Computational Commons, Omar et al 2025
Many of the answers to how to do this lay in the talks at this week's ICFP/SPLASH: programming languages with clean semantics for incremental compilation, purely functional with effect tracking, and mergeable semantics for managing large scale data structures. Cyrus' own Hazel language is a perfect example of an ergonomic interactive language that has clean, functional semantics while retaining usability.

With traditional print media, the figures, text and other content are disconnected from the underlying data, making them hard to understand, evaluate and trust. Digital media, such as online papers and articles, present an opportunity to make visual artifacts which are connected to data and able to reveal those fine-grained relationships to an interested user. This would enable research outputs, news articles and other data-driven artifacts to be more transparent, self-explanatory and explorable . -- fluid, explorable, self-explanatory research outputs
Roly showed in his talk how the latest advances in Fluid helped to automate providing "drill down" explanations for topics in the energy transition and decarbonisation, adaptation to climate change, or risk mitigation strategies that required policy changes justified by data.

The Fluid website has loads of interactive examples for you to explore, so continue on there if interested in this topic.
Current scientific computing practices pose major barriers to entry, particularly for interdisciplinary researchers and those in low and middle-income countries (LMICs). Challenges include steep learning curves, limited access to expert support, and difficulties with legacy or under-documented software. Drawing on real-world experiences, we identify recurring obstacles in the usability, accessibility, and sustainability of scientific software. -- Bridging Disciplinary Gaps in Climate Research, 2025
I greatly enjoyed the framing of this paper of "reimagining scientific programming as a shared public good", a point also made by Roberto Di Cosmo recently in his Nature comment that we must stop treating code like an afterthought, and instead record, share and value it.
Onto Programming the Planet!
Once we have all these planetary scale notebooks, what sorts of new programs might we run on them? The last group of talks covered some radically different ideas here, and I was involved with all of them!
First,
Remote sensing observations from satellites are critical for scientists to understand how our world is changing in the face of climate change, biodiversity loss, and desertification. However, working directly with this data is difficult. For any given satellite constellation, there are a multitude of processed products, data volume is considerable, and for optical imagery, users must contend with data sparsity due to cloud cover. This complexity creates a significant barrier for domain experts who are not specialists.
Pre-trained, self-supervised foundation models such as TESSERA aim to solve this by offering pre-computed global embeddings. These rich embeddings can be used in-place of raw remote sensing data in a powerful “embedding-as-data” approach. For example, a single 128-dimensional TESSERA embedding for a 10-meter point on Earth can substitute for an entire year of optical and radar imagery, representing its temporal and spectral characteristics. While this could democratise access to advanced remote sensing-derived analytics, it also creates a new programming challenge: a lack of tools designed for this new approach.
-- Building a Usable Library for Planetary-Scale Embeddings, Jaffer 2025

Sadiq did a live demo by demonstrating the
Geotessera Python library I've been
After this,
Using high-resolution LiDAR (Vegetation Object Model), Sentinel 2 imagery, and open geospatial datasets for over 28 million buildings across England, we integrate raster, vector, and socioeconomic data within a scalable computational framework. Tree segmentation was performed using adaptive local-maximum filtering, canopy cover estimated at 1 m resolution, and park accessibility derived from network-based walking distances.
Inequality in access to nature was quantified via Gini coefficients and modelled with spatial error regressions against socioeconomic deprivation. Our results reveal that while most urban areas meet the 3-tree proximity rule, fewer than 3% achieve 30% canopy cover, and only a minority satisfy all three components simultaneously. --
Airborne assessment uncovers socioeconomic stratification of urban nature in England

You can read more about this in his
And last but definitely not least,

It was also wonderful to see
Reflections on the 2nd PROPL
As always, the corridor track of discussions after the conference was the most valuable part of attending PROPL. We had the opportunity to put up some posters during the main banquet session, and it was busy!
One thing that leapt out at me from the discussions was the need for a hosted service with the ergonomics of Docker, the interactive flexibility of Jupyter, the peer community of Wikipedia, and the semantic cleanliness of Hazel. This is overwhelmingly difficult to do in a topdown manner, but we do have now a growing community of practitioners and computer scientists who share a vision of making this happen. I had long conversations with most of the attendees of PROPL about how we might make this a reality, and my plan is spend a good chunk of my sabbatical time this year hacking on this.
The other really energizing thing was seeing all the side hacking going on. I
spotted
All of these strands could weave together powerfully into groundup systems that solve the problems we've been defining at PROPL! I'm feeling energized and tired, and look forward to continuing the discussions started in London (2024) and Singapore (2025) into real systems in 2026!


References
- Eyres et al (2025). LIFE: A metric for mapping the impact of land-cover change on global extinctions. 10.1098/rstb.2023.0327
- Madhavapeddy (2025). What I learnt at the National Academy of Sciences US-UK Forum on Biodiversity. 10.59350/j6zkp-n7t82
- Dales et al (2025). Yirgacheffe: A Declarative Approach to Geospatial Data. Association for Computing Machinery. 10.1145/3759536.3763806
- Madhavapeddy (2025). Holding an OxCaml tutorial at ICFP/SPLASH 2025. 10.59350/55bc5-x4p75
- Madhavapeddy (2025). What I learnt at ICFP/SPLASH 2025 about OCaml, Hazel and FP. 10.59350/w1jvt-8qc58
- Millar et al (2025). An Architecture for Spatial Networking. arXiv. 10.48550/arXiv.2507.22687
- Madhavapeddy et al (2025). Proceedings of the 2nd ACM SIGPLAN International Workshop on Programming for the Planet. 10.1145/3759536
- Madhavapeddy (2025). It's time to go post-POSIX at ICFP/SPLASH 2025. 10.59350/mch1m-8a030
- Madhavapeddy (2025). A Roundup of ICFP/SPLASH 2025 happenings. 10.59350/4jf5k-01n91
- Madhavapeddy (2025). Programming for the Planet at ICFP/SPLASH 2025. 10.59350/hasmq-vj807
- Omar et al (2025). A FAIR Case for a Live Computational Commons. Association for Computing Machinery. 10.1145/3759536.3763802
- Madhavapeddy (2025). 2nd Programming for the Planet workshop CFP out. 10.59350/728q9-5ct54
- Madhavapeddy (2025). GeoTessera Python library released for geospatial embeddings. 10.59350/7hy6m-1rq76
- Madhavapeddy (2025). Jane Street and Docker on moving to OCaml 5 at ICFP/SPLASH 2025. 10.59350/3jkaq-d3398
- Zuniga-Gonzalez et al (2025). Airborne assessment uncovers socioeconomic stratification of urban nature in England. arXiv. 10.48550/arXiv.2510.13861
- Romanello et al (2022). The 2022 report of the Lancet Countdown on health and climate change: health at the mercy of fossil fuels. The Lancet. 10.1016/S0140-6736(22)01540-9
- Baramashetru et al (2025). Towards Modelling and Verification of Coupler Behaviour in Climate Models. 10.1145/3759536.3763801
- Kumar et al (2025). GPU-Accelerated Hydrology Algorithms for On-Prem Computation: Flow Accumulation, Drainage Lines, Watershed Delineation, Runoff Simulation. 10.1145/3759536.3763805
- Laud et al (2025). STACD: STAC Extension with DAGs for Geospatial Data and Algorithm Management. 10.1145/3759536.3763803
- Urlea et al (2025). Bridging Disciplinary Gaps in Climate Research through Programming Accessibility and Interdisciplinary Collaboration. 10.1145/3759536.3763804
- Shaw (2020). Myths and mythconceptions: what does it mean to be a programming language, anyhow?. Proceedings of the ACM on Programming Languages. 10.1145/3480947
- Cosmo et al (2025). Stop treating code like an afterthought: record, share and value it. Nature. 10.1038/d41586-025-03196-0