/ Ideas / Effective geospatial code in OCaml

This is an idea proposed in 2024 as a Cambridge Computer Science Part II project, and is currently being worked on by George Pool. It is supervised by Michael Dales, Patrick Ferris and Anil Madhavapeddy as part of my Planetary Computing project.

Summary

Geospatial data processing is a critical component of many scientific and engineering workflows, from environmental monitoring to urban planning. However, writing geospatial code that scales to multiple cores and makes best use of available memory can be challenging due to the scale of the data involved. To deal with this, we have been developing some domain-specific tools to improve the state of affairs.

Yirgacheffe is a wrapper to the GDAL library that provides high-level Python APIs that take care of figuring out if datasets overlap, and if vector layers need to be rasterised, and manages memory efficiently for large layers. There is only one problem: we would like to write similar code to this, but in a high level functional language rather than an imperative one!

OCaml has recently gained supported for multicore parallelism, and is also one of the first mainstream languages with support for effects. This project will involve writing a library in OCaml that provides similar functionality to Yirgacheffe, but with a focus on high-level functional programming. This will involve interfacing with the GDAL library, and also writing some high-level abstractions for geospatial data processing. As an alternative to depending on GDAL, you may also choose to contribute to the emerging GeoCaml ecosystem which Patrick Ferris created.

A successful project will demonstrate a direct-style, readable interface to geospatial code, with the scheduling of parallel operations and memory management delegated to a separate library written in OCaml which can be customised to the local computing environment (e.g. a large local multicore machine, or a cloud computing cluster).

Related reading

Related Ideas