home Anil Madhavapeddy, Professor of Planetary Computing  

Using graph theory to define data-driven ecoregion and bioregion maps / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with Daniele Baisero and Michael Dales.

Maps of biologically driven regionalization (e.g. ecoregions and bioregions) are useful in conservation science and policy as they help identify areas with similar ecological characteristics, allowing for more targeted, efficient, and ecosystem-specific management strategies. These regions provide a framework for prioritizing conservation efforts, monitoring biodiversity, and aligning policies across political boundaries based on ecological realities rather than arbitrary lines. However these products have historically been "hand drawn" by experts and are mostly based on plant distribution data only.   […270 words]

# 1st Apr 2025   iconideas biodiversity conservation idea-available idea-beginner spatial urop

Runtimes à la carte: crossloading native and bytecode OCaml / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with David Allsopp.

In 1998, Fabrice le Fessant released Efuns ("Emacs for Functions"), an implementation of an Emacs-like editor entire in OCaml and which included a library for loading bytecode within native code programs[^1].

This nearly a decade before OCaml 3.11 would introduce Alain Frisch's native Dynlink support to OCaml. Natdynlink means that this original work has been largely forgotten, but there remain two interesting applications for being able to "cross-load" code compiled for the OCaml bytecode runtime in an OCaml native code application and vice versa:

  1. Native code OCaml applications could use OCaml as a scripting language without needing to include an assembler toolchain or solutions such as ocaml-jit.
  2. The existing bytecode REPL could use OCaml natdynlink plugins (.cmxs files) directly, allowing more dynamic programming and exploration of high-performance libraries with the ease of the bytecode interpreter, but retaining the runtime performance of the libraries themselves.   […310 words]
# 1st Apr 2025   iconideas effects functional idea-available idea-beginner ocaml urop

Effects based scheduling for the OCaml compiler pipeline / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with David Allsopp.

In order to compile the OCaml program foo.ml containing:

Stdlib.print_endline "Hello, world"

the OCaml compilers only require the compiled stdlib.cmi interface to exist in order to determine the type of Stdlib.print_endline. This separate compilation technique allows modules of code to be compiled before the code they depend on has necessarily been compiled. When OCaml was first written, this technique was critical to reduce recompilation times. As CPU core counts increased through the late nineties and early 2000s, separate compilation also provided a parallelisation benefit, where modules which did not depend on each other could be compiled at the same time as each other benefitting compilation as well as recompilation.

For OCaml, as in many programming languages, the compilation of large code bases is handled by a separate build system (for example, dune, make or ocamlbuild) with the compiler driver (ocamlc or ocamlopt) being invoked by that build system as required. In this project, we'll investigate how to get the OCaml compiler itself to be responsible for exploiting available parallelism.   […697 words]

# 1st Apr 2025   iconideas effects functional idea-available idea-beginner ocaml urop

Bidirectional Hazel to OCaml programming / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is under discussion with a student but not yet confirmed. It may be co-supervised with Patrick Ferris and Cyrus Omar.

Hazel is a pure subset of OCaml with a live functional programming environment that is able to typecheck, manipulate, and even run incomplete programs. As a pure language with no effects, Hazel is a great choice for domains such as configuration languages where some control flow is needed, but not the full power of a general purpose programming language. On the other hand, Hazel only currently has an interpreter and so is fairly slow to evaluate compared to a full programming language such as OCaml.   […277 words]

# 1st Apr 2025   iconideas functional hazel idea-beginner idea-discuss javascript ocaml types wasm

Battery-free wildlife monitoring with Riotee / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with Josh Millar.

Monitoring wildlife in the field today relies heavily on battery-powered devices, like GPS collars or acoustic recorders. However, such devices are often deployed in remote environments, where battery replacement and data retrieval can be labour-intensive and time-consuming. Moving away from battery-powered field devices could radically reduce the environmental footprint and labour cost of wildlife monitoring. The rise of batteryless energy-harvesting platforms could enable ultra-low-power, long-term, maintenance-free deployments. However, existing battery-less devices are severely constrained, often unable to perform meaningful on-device computation such as ML inference or high-frequency audio capture.

This project explores the development of next-generation, battery-less wildlife monitoring platforms using Riotee, an open-source platform purpose-built for intermittent computing. Riotee integrates energy harvesting with a powerful Cortex-M4 MCU and full SDK for managing state-saving, redundancy, and graceful resume from power failures.   […273 words]

# 1st Apr 2025   iconideas biodiversity conservation embedded idea-available idea-beginner sensing urop

Autoscaling geospatial computation with Python and Yirgacheffe / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with Michael Dales.

Python is a popular tool for geospatial data-science, but it, along with the GDAL library, handle resource management poorly. Python does not deal with parallelism well and GDAL can be a memory hog when parallelised. Geo-spatial workloads -- working on global maps at metre-level resolutions -- can easily exceed the resources available on a given host when run using conventional schedulers.

To that end, we've been building Yirgacheffe, a geospatial library for Python that attempts to both hide the tedious parts of geospatial work (aligning different data sources for instance), but also tackling the resource management issues so that ecologists don't have to also become computer scientists to scale their work. Yirgacheffe can:

  • chunk data in memory automatically, to avoid common issues around memory overcommitment
  • can do limited forms of parallelism to use multiple cores.

Yirgacheffe has been deployed in multiple geospatial pipelines, underpinning work like Mapping LIFE on Earth, as well as an implementation of the IUCN STAR metric, and a methodology for assessing tropical forest interventions.   […453 words]

# 1st Apr 2025   iconideas biodiversity idea-available idea-beginner python spatial systems urop

An access library for the world crop, food production and consumption datasets / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with Alison Eyres and Thomas Ball.

Agricultural habitat degradation is a leading threat to global biodiversity. To make informed decisions, it's crucial to understand the biodiversity impacts of various foods, their origins, and potential mitigation strategies. Insights can drive actions from national policies to individual dietary choices. Key factors include knowing where crops are grown, their yields, and food sourcing by country.

The FAOSTAT trade data offers comprehensive import and export records since 1986, but its raw form is complex, including double counting, hindering the link between production and consumption.   […372 words]

# 1st Apr 2025   iconideas biodiversity conservation food idea-available idea-beginner urop

3D printing the planet (or bits of it) / Apr 2025

This is an idea proposed in 2025 as a good starter project, and is available for being worked on. It may be co-supervised with Michael Dales.

Thanks to a combination of satellite information, remote sensors and data-science, we now are able to reason about places all over the globe from the comfort of our desks and offices. But sometimes, you just want to be able to see or touch an area to understand it properly: the flat 2D-projection on a screen doesnt necessarily reveal the subtle geography of a landscape, and data locked into a computer feels less immediate than even a physical model of the same area.

In recent work, Michael Dales has experimented with making 3D-printed models of surface terrain to make some areas of study more relatable. By combining high resolution Digital Elevation Maps (DEMs), and CAD software we were able to scale and print this section of a Swedish forest used to observe Moose migrations.

  […403 words]

# 1st Apr 2025   iconideas 3dprinting biodiversity conservation idea-available idea-beginner spatial urop

A hardware description language using OCaml effects / Mar 2025

This is an idea proposed in 2025 as a Cambridge Computer Science Part III or MPhil project, and is available for being worked on. It may be co-supervised with KC Sivaramakrishnan and Andy Ray.

Programming FPGAs using functional programming languages is a very good fit for the problem domain. OCaml has the HardCaml ecosystem to express hardware designs in OCaml, make generic designs using the power of the language, then simulate designs and convert them to Verilog or VHDL.

HardCaml is very successfully used in production at places like Jane Street, but needs quite a lot of prerequisite knowledge about the full OCaml language. In particular, it makes very heavy use of the module system in order to build up the circuit description as an OCaml data structure.

Instead of building up a circuit as the output of the OCaml program, it would be very cool if we could directly implement the circuit as OCaml code by evaluating it. This is an approach that works very successfully in the Clash Haskell HDL, as described in this thesis. Clash uses a number of advanced Haskell type-level features to encode fixed-length vectors (very convenient for hardware description) and has an interactive REPL that allows for exploration without requiring a separate test bench.   […296 words]

# 1st Mar 2025   iconideas fpga idea-available idea-hard ocaml systems

Using computational SSDs for vector databases / Feb 2025

This is an idea proposed in 2025 as a Cambridge Computer Science Part III or MPhil project, and is available for being worked on. It may be co-supervised with Sadiq Jaffer.

Large pre-trained models can be used to embed media/documents into concise vector representations with the property that vectors that are "close" to each other are semantically related. ANN (Approximate Nearest Neighbour) search on these embeddings is used heavily already in RAG systems for LLMs or search-by-example for satellite imagery.

Right now, most ANN databases almost exclusively use memory-resident indexes to accelerate this searching. This is a showstopper for larger datasets, such as the terabytes of PDFs we have for our big evidence synthesis project, each of which generates dozens of embeddings. For global satellite datasets for remote sensing of nature at 10m scale this is easily petabytes per year (the raw data here would need to come from tape drives).   […398 words]

# 1st Feb 2025   iconideas data fpga idea-available idea-hard spatial storage

Affordable digitisation of insect collections using photogrammetry / Feb 2025

This is an idea proposed in 2025 as a Cambridge Computer Science Part III or MPhil project, and is currently being worked on by Beatrice Spence and Arissa-Elena Rotunjanu. It is co-supervised with Tiffany Ki and Edgar Turner.

Insects dominate animal biodiversity and are sometimes called "the little things that run the world". They play a disproportionate role in ecosystem functioning, are highly sensitive to environmental change and often considered to be early indicators of responses in other taxa. There is widespread concern about global insect declines[^1] yet the evidence behind such declines is highly biassed towards the Global North and much is drawn from short-term biodiversity datasets[^2] [^3].

The Insect Collection at the University Museum of Zoology, Cambridge holds over 1.2 million specimens. These include specimens collected from the early 19th century to the present day. Most specimens remain undocumented and unavailable for analysis. However, they contain data that are critical to understanding long-term species and community responses to anthropogenic change, and vital to evaluating whether short-term declines are representative of longer-term trends[^4] [^5]. As such, unlocking these insect collections is of paramount importance, and the large-scale nature of these collections necessitates the development of an efficient and effective digitisation process.

The 3D digitisation of specimens using current methods is either highly time-intensive or expensive, rendering it impossible to achieve across the collection in a reasonable time-frame. Yet, 3D models of specimens have huge potential for investigating species morphological responses to anthropogenic changes over time and identification of trade-offs in morphological responses within a 3D morphospace.   […540 words]

# 1st Feb 2025   iconideas 3d biodiversity conservation idea-hard idea-ongoing insects urop

Parallel traversal effect handlers for OCaml / Sep 2024

This is an idea proposed in 2024 as a Cambridge Computer Science Part II project, and is currently being worked on by Sky Batchelor. It is co-supervised with Patrick Ferris.

Most existing uses of effect handlers perform synchronous execution of handled effects. Xie et al proposed a traverse handler for parallelisation of independent effectful computations whose effect handlers are outside the parallel part of the program. The paper [^1] gives a sample implementation as a Haskell library with an associated λp calculus that formalises the parallel handlers.   […162 words]

# 1st Sep 2024   iconideas effects fp idea-medium idea-ongoing multicore ocaml scheduling

Gradually debugging type errors / Sep 2024

This is an idea proposed in 2024 as a Cambridge Computer Science Part II project, and is currently being worked on by Max Carroll. It is co-supervised with Patrick Ferris.

Reasoning about type errors is very difficult, and requires shifting between static and dynamic types. In OCaml, the type checker asserts ill-typedness but provides little in the way of understanding why the type checker inferred such types. These direct error messages are difficult to understand even for experienced programmers working on larger codebases.

This project will explore how to use gradual types to reason more effectively about such ill-typed programs, by introducing more dynamic types to help some users build an intuition about the problem in their code. The intention is to enable a more exploratory approach to constructing well-typed programs.   […131 words]

# 1st Sep 2024   iconideas functional hazel idea-medium idea-ongoing javascript ocaml types

Using wasm to locally explore geospatial layers / Aug 2024

This is an idea proposed in 2024 as a Cambridge Computer Science Part II project, and is currently being worked on by Sam Forbes. It is co-supervised with Michael Dales.

Some of my projects like Mapping LIFE on Earth or Remote Sensing of Nature involve geospatial base maps with gigabytes or even terabytes of data. This data is usually split up into multiple GeoTIFFs, each of which has a slice of information. For example, the LIFE persistence maps have around 30000 maps for individual species, and then an aggregated GeoTIFF for mammals, birds, reptiles and so forth.

This project will explore how to build a WebAssembly-based visualisation tool for geospatial ecology data. This existing data is in the form of GeoTIFF files, which are image files with embedded georeferencing information. The application will be applied to files which include information on the prevalence of species in an area, consisting of a global map at 100 m2 scale. An existing tool, QGIS, allows ecologists to visualise this data across the entire world, collated by types of species, but this is difficult to work with because of the scale of the data involved.   […341 words]

# 1st Aug 2024   iconideas idea-medium idea-ongoing spatial wasm web

Towards reproducible URLs with provenance / Aug 2024

This is an idea proposed in 2024 as a Cambridge Computer Science Part II project, and is available for being worked on. It may be co-supervised with Patrick Ferris.

Vurls are an attempt to add versioning to URI resolution. For example, what should happen when we request https://doi.org/10.1109/SASOW.2012.14 and how do we track the chain of events that leads to an answer coming back? The prototype vurl library written in OCaml outputs the following:   […323 words]

# 1st Aug 2024   iconideas distributed idea-available idea-medium ocaml provenance web

Displaying the 15 most recent items out of 69 in total (see all the items).