Effective geospatial code in OCaml

This is an idea proposed in 2024 as a Cambridge Computer Science Part II project, and is currently being worked on by George Pool. It is co-supervised with Michael Dales and Patrick Ferris.

Geospatial data processing is a critical component of many scientific and engineering workflows, from environmental monitoring to urban planning. However, writing geospatial code that scales to multiple cores and makes best use of available memory can be challenging due to the scale of the data involved. To deal with this, we have been developing some domain-specific tools to improve the state of affairs.

Yirgacheffe is a wrapper to the GDAL library that provides high-level Python APIs that take care of figuring out if datasets overlap, and if vector layers need to be rasterised, and manages memory efficiently for large layers. There is only one problem: we would like to write similar code to this, but in a high level functional language rather than an imperative one!

OCaml has recently gained supported for multicore parallelism, and is also one of the first mainstream languages with support for effects. This project will involve writing a library in OCaml that provides similar functionality to Yirgacheffe, but with a focus on high-level functional programming. This will involve interfacing with the GDAL library, and also writing some high-level abstractions for geospatial data processing. As an alternative to depending on GDAL, you may also choose to contribute to the emerging GeoCaml ecosystem which Patrick Ferris created.

A successful project will demonstrate a direct-style, readable interface to geospatial code, with the scheduling of parallel operations and memory management delegated to a separate library written in OCaml which can be customised to the local computing environment (e.g. a large local multicore machine, or a cloud computing cluster).

Planetary computing for data-driven environmental policy-making covers the data processing pipelines we need to integrate into.
Retrofitting effect handlers onto OCaml, PLDI 2021 describes how the effect system in OCaml works.
EIO is the high-performance direct-style IO library we have been developing for OCaml.

# 1st Jan 2024

ideas distributed idea-medium idea-ongoing multicore ocaml spatial system

Anil Madhavapeddy, Professor of Planetary Computing

Effective geospatial code in OCaml

Related News

Planetary computing for data-driven environmental policy-making / Mar 2024

Remote Sensing of Nature / Jan 2023

Mapping LIFE on Earth / Jan 2023

Planetary Computing / Jan 2022

Retrofitting effect handlers onto OCaml / Jun 2021

Effective geospatial code in OCaml

Related reading

Related News

Planetary computing for data-driven environmental policy-making / Mar 2024

Remote Sensing of Nature / Jan 2023

Mapping LIFE on Earth / Jan 2023

Planetary Computing / Jan 2022

Retrofitting effect handlers onto OCaml / Jun 2021