home Anil Madhavapeddy, Professor of Planetary Computing  

Autoscaling geospatial computation with Python and Yirgacheffe

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with Michael Dales.

Python is a popular tool for geospatial data-science, but it, along with the GDAL library, handle resource management poorly. Python does not deal with parallelism well and GDAL can be a memory hog when parallelised. Geo-spatial workloads -- working on global maps at metre-level resolutions -- can easily exceed the resources available on a given host when run using conventional schedulers.

To that end, we've been building Yirgacheffe, a geospatial library for Python that attempts to both hide the tedious parts of geospatial work (aligning different data sources for instance), but also tackling the resource management issues so that ecologists don't have to also become computer scientists to scale their work. Yirgacheffe can:

Yirgacheffe has been deployed in multiple geospatial pipelines, underpinning work like Mapping LIFE on Earth, as well as an implementation of the IUCN STAR metric, and a methodology for assessing tropical forest interventions.

The summer project

Whilst Yirgacheffe solves some of the resource management problems involved in geospatial coding, it does so conservatively and statically. It does not currently assess the current state of the host on which it is being run: how much memory or how many CPU cores are free? How much memory is each thread using? How to react if someone else fires up a big job on the same machine?

If it gets this wrong via overcommitting resources, then the dreaded the Linux OOM killer can (at best) take down your job or (at worst) take down the entire system including other users' work. Therefore, we want Yirgacheffe to be more clever about scaling up resource usage on a large host, without compromising overall system stability.

In this project we'd like to:

This would be a good summer project for a student interested both operating systems and scientific computing, looking to help work on enabling real sustainability and environmental research.

For background reading:

You can also watch a (slightly tangential but on the same topic of geospatial processing) talk from Michael Dales at LOCO24.

# 1st Apr 2025   iconideas biodiversity idea-available idea-beginner python spatial systems urop

Related News