/ Ideas / Using wasm to locally explore geospatial layers

This is an idea proposed in 2024 as a Cambridge Computer Science Part II project, and is currently being worked on by Sam Forbes. It is supervised by Michael Dales and Anil Madhavapeddy as part of the Mapping LIFE on Earth project.

Summary

Some of my projects like Mapping LIFE on Earth or Remote Sensing of Nature involve geospatial base maps with gigabytes or even terabytes of data. This data is usually split up into multiple GeoTIFFs, each of which has a slice of information. For example, the LIFE persistence maps have around 30000 maps for individual species, and then an aggregated GeoTIFF for mammals, birds, reptiles and so forth.

This project will explore how to build a WebAssembly-based visualisation tool for geospatial ecology data. This existing data is in the form of GeoTIFF files, which are image files with embedded georeferencing information. The application will be applied to files which include information on the prevalence of species in an area, consisting of a global map at 100 m2 scale. An existing tool, QGIS, allows ecologists to visualise this data across the entire world, collated by types of species, but this is difficult to work with because of the scale of the data involved.

Therefore, it would be useful to have a tool which can work across a smaller subset of locations and species, which allows ecologists to more quickly and easily visualise the subset of data that they are working with. Additionally, the use of WebAssembly means this tool can be run entirely in-browser. This enables offline use in a cross-platform environment, and avoids the need for a central webserver. The project could also be extended to online applications more easily because of this.

The files will be requested from a local server process, as WebAssembly is unable to manipulate local files directly. This will be implemented via a separate JavaScript-based process. Then, the application will collate and crop information from the files, as specified by the user through the interface, to display the desired species distribution map.

To ensure that the application can process the data sufficiently fast for a real-time application, the implementation will exploit the inherent parallelisms of the data through concurrency. This can be on a file level, by concurrently processing multiple files, or on a pixel level when generating independent parts of the map.

Related Ideas