iconAnil Madhavapeddy, Professor of Planetary Computing

GeoTessera Python library released for geospatial embeddings / Aug 2025

We've been having great fun at the EEG recently releasing embeddings of our new TESSERA geospatial foundation model.

TESSERA is a foundation model for Earth observation that processes Sentinel-1 and Sentinel-2 satellite data to generate representation (embedding) maps. It compresses a full year of Sentinel-1 and Sentinel-2 data and learns useful temporal-spectral features. -- Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

A foundation model is designed to be used for downstream tasks without having to retrain a full model for every individual task. Our preprint paper describes what sorts of geospatial tasks you can solve more quickly, ranging from crop type classification, forest canopy height estimation, above-ground biomass calculations, wildfire detection, forest stocks, and many more.

Parametric UMAP false colour visualisation of TESSERA embeddings for Cambridgeshire
Parametric UMAP false colour visualisation of TESSERA embeddings for Cambridgeshire

TESSERA is an open model that is trained only on public satellite data (thanks ESA!), and we make tiled embeddings available for download that have 128-dimensional vectors precomputed for every 10m2 surface of the planet. This makes using TESSERA really simple in existing GIS workflows.

Taking the GeoTessera CLI for a spin

To make it even simpler, I've just published a new Python library called geotessera which provides a programmatic and CLI interface to accessing these.

The full set of TESSERA embeddings are petabytes when generated[1], so it's important that you download just the ones you need for a given region of interest. We chunked up the embeddings into image tiles where each pixel represents a 10m2 are, and are hosting these at the Computer Lab on dl.geotessera.org.[2] Geotessera uses Pooch to build up a registry of manifests in Git, and provides helper functions to calculate which tiles you need. automate.

You can take this for a spin very quickly if you have uv installed. First, let's check global coverage of what's available:

uvx geotessera coverage

This will drop a figure like the below into tessera_coverage.png. You can also refine the map to an area of interest by passing in a GeoJSON, shapefile or manual bounding box to the command-line arguments.

We're still churning through the inference (and prioritising areas of interest for our early adopters), so the green spots represent full coverage from 2017-2024, the blue represents just 2024, and orange for in-between. Now, we want to download the embeddings themselves. Let's do Cambridgeshire:

uvx geotessera download \
  --output cb \
  --region-file https://raw.githubusercontent.com/ucam-eo/geotessera/refs/heads/main/example/CB.geojson

This will drop a bunch of GeoTIFFs into the cb/ directory which you can inspect using GDAL in the normal way. Note that the local UTM coordinates are preserved (this varies by latitude) and that there are 128 bands per TIFF.

% gdalinfo -stats cb/tessera_2024_lat51.95_lon0.05.tif
<...>
Pixel Size = (10.000000000000000,-10.000000000000000)
Metadata:
  GEOTESSERA_VERSION=0.5.1
  TESSERA_DATASET_VERSION=v1
  TESSERA_DESCRIPTION=GeoTessera satellite embedding tile
  TESSERA_TILE_LAT=51.95
  TESSERA_TILE_LON=0.05
  TESSERA_YEAR=2024
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (  293612.221, 5765288.255) (  0d 0'24.03"W, 51d59'59.39"N)
Lower Left  (  293612.221, 5753888.255) (  0d 0' 0.61"E, 51d53'50.90"N)
Upper Right (  300932.221, 5765288.255) (  0d 5'59.33"E, 52d 0' 9.01"N)
Lower Right (  300932.221, 5753888.255) (  0d 6'23.09"E, 51d54' 0.49"N)
Center      (  297272.221, 5759588.255) (  0d 2'59.76"E, 51d56'59.99"N)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
  Description = Tessera_Band_0
  Minimum=-4.017, Maximum=11.446, Mean=3.649, StdDev=2.089
  Metadata:
    STATISTICS_MINIMUM=-4.0171508789062
    STATISTICS_MAXIMUM=11.445509910583
    STATISTICS_MEAN=3.6492421642927
    STATISTICS_STDDEV=2.088705775134
    STATISTICS_VALID_PERCENT=100

Once you have the GeoTIFFs locally, you can drop into your normal GIS workflows. But you can also continue to use the CLI to do false colour visualisations, for example using PCA, to help visualise what's going on.

uvx geotessera visualize cb cb.tiff
uvx geotessera serve cb.tiff --open

These two commands will first output an RGB mosaic of the tiles (false colour, like the one at the start of this post), and then tile them using LeafletJS so you explore them with OpenStreetMap in the background.

A geospatial classification workflow

At this point you are probably itching to do some actual machine learning. You can try out the Tessera interactive Jupyter notebook next!

git clone https://github.com/ucam-eo/tessera-interactive-map
cd tessera-interactive-map
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
code app.ipynb

This will spin up the environment in VS Code as a notebook, where if you run the cells you get an interactive bounding box that you can use to do manual classification by simply marking labels. Here's a video that demonstrates this, courtesy of Robin Young:

Have fun with this! All of this is really bleeding edge stuff, so if you run into issues (likely) then please do file an issue and let us know. In fact, let us know if you build something where you didn't find any bugs, too!

What next?

Coding in Python again after a few years has been a fun experience for me, but I'm yearning to return to OCaml again. Accordingly, I've been building out an implementation of GeoTessera in native OCaml, using eio. This is also a perfect usecase for oxcaml extensions to speed up floating point processing, and Thibaut Mattio has just published Raven for handling numpy format arrays in OCaml. Stay tuned for more on that...

There's also plenty to be done on improvements to GeoTessera; I'll be adding in modules to help with machine learning workflows as soon as the external uses of them have stabilised a bit more. I'm also enjoying getting re-familiar with modern Python tooling. uv is a remarkable piece of work, but I'm still figuring out how to (e.g.) run notebooks directly using it without running into package issues.

And finally, storage management remains a real headache as we are striping and syncing hundreds of terabytes of storage and keeping it performant. As we go back to generate embeddings for earlier years, we'll be hitting petabytes easily. While the normal answer is to store this on a cloud, the problem is the egress bandwidth is hugely expensive, and it's imporant we have a local storage cluster for this. Any tips on how to build out a cheap such cluster are welcome!

  1. We ran the inference on a combination of AMD MIX300 and the Dawn cluster you may have seen me talking about on the BBC a while back.

    ↩︎︎
  2. The humungous number of SSDs are jury-rigged onto machines kindly donated to the Lab by Jane Street.

    ↩︎︎
# 31st Aug 2025 iconnotes ai satellite spatial tessera

Related News