TESSERA, a pixelwise geospatial foundation model

TESSERA is an open and pixel-wise foundation model for multi-modal (Sentinel-1/2) earth observation time series that learns robust, label-efficient embeddings.

Our goal with TESSERA is to make manipulating global satellite intelligence as easy as conventional programming tasks are. Towards this we release global, annual, 10m, pixel-wise embeddings together with open weights and code and lightweight adaptation heads. We also develop practical tooling for large-scale retrieval and inference at planetary scale.

As with any good foundation model, there are a staggering array of downstream tasks which can benefit. TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency across diverse classification, segmentation, and regression tasks.

1 Storage, Zarr, and cloud-native distribution

A lot of the early 2026 work has been on the plumbing needed to actually use TESSERA at scale. We restructured the store around a Zarr v3 layout and a shared geo-embeddings convention, iterating on the chunking after community feedback and shipping it through geotessera 0.8 with multi-year support and a browser-based TZE explorer backed by HTTP range requests.

On the storage side, we expanded the Cambridge Ceph cluster to 1.4PB just in time to mirror the full half-petabyte to AWS Open Data, with the sync finishing a week or so later. The geotessera client now discovers tiles from multiple registries so consumers can pull from whichever copy is closest. In parallel, Mark Elvers has been porting Brotli/Zstd/Snappy to OxCaml and building ocaml-zarr as the basis for native OCaml access to the cloud-native stores.

Activity

Sadiq Jaffer speaks at Pint of Science at the Cambridge Station Tavern about TESSERA geospatial foundation modelling (slides).
Louise Hulland from BBC Cambridgeshire interviews Anil Madhavapeddy about spotting hedgehogs from space using TESSERA. Mirror of <https://x.com/BBCCambs/status/2057760666266558867>
Behind the scenes of a week of BBC/ITV news and radio appearances about hedgehogs and TESSERA, but also what to expect when a research story catches the news cycle.
Consolidating my OCaml trees for easier OxCaml deployment, shipping native system packages for OxCaml which then got into space, and remembering Peter Neumann
Celebrating David Attenborough's 100th birthday at a Conservation Research Institute retreat in Norwich, a Parliament POST briefing on Evidence for Nature Recovery lands, and a TESSERA talk at the Cambridge Ring alumni evening at Jane Street.
Welcoming Akshay to Cambridge, TESSERA AWS sync done, oi now self-hosts this site, and a new 4C forest leakage preprint appears.
Notes from a Royal Society policy meeting with the European Commission on responsible AI, interoperable data and UK–EU alignment in AI for science; covering AI-poisoned literature, federated TESSERA-scale infrastructure, disclosure standards and the practical value of sustained UK–EU dialogue.
A week of hops between Chennai, Cambridge and Belfast for the FP Launchpad takeoff at IIT Madras, a surprise Publication of the Year at the Cambridge Ring Hall of Fame, meeting the VC on the upcoming Rokos School of Governance, mirroring half a petabyte of TESSERA tiles and hacking on oi
A day at the launch of the FP Launchpad at IIT Madras, covering talks on hardware design, trusted execution on Shakti, verifiable Indian tax law, precise JIT analysis, AI-assisted Lean metatheory, constraint-based diagramming, and my own TESSERA talk.
Travelling from Ireland to IIT Madras for the FP Launchpad launch, mirroring half a petabyte of TESSERA embeddings to AWS Open Data, antibotty discussions, and Tangled trust boundaries for AI code review.
Mythos Preview and the urgent need for internet immune systems, cognitive DDoS and AI screen time for code, a proposal for voluntary disclosure in OCaml, desktop focus and printed papers, iOS misery, GeoTessera 0.8, Ceph at 1.4PB, OCaml CI migration, hardware perf counters for OxCaml, and the FP Launchpad launch at IIT Madras.
Publishing the OxCaml Labs year-one review, POSSE and AI content disclosure for the web, adopting the geo-embeddings Zarr convention for TESSERA, action PROPL at PLDI, the death of the grant application, and NASA's new swathe lidar mission.
Community feedback reshaped our Zarr store layout — years became a dimension, shards got bigger, and we retired the TESSERA-specific convention in favour of a shared geo-embeddings standard that also covers other models.
Reworking the TESSERA Zarr store layout after community feedback, Springer's API woes for evidence synthesis, vibecoding introspection, and git remote helpers for ATProto.
Evidence synthesis at the DEFRA science conference, TESSERA transcoding and building a new SPA, OpenStreetMap/DuckDB bindings in OxCaml, and early thoughts on vibecoding etiquette.
How we restructured TESSERA's geospatial embeddings from millions of individual numpy files into sharded Zarr v3 stores for efficient HTTP streaming, enabling everything from single-pixel mobile lookups to regional-scale analysis with just a couple of range requests.
A little screencast of a fully browser based streaming interface to manipulate TESSERA embeddings. All the classification and UMAPs run directly in a browser, with no server required aside from static HTTP serving of the embeddings!
TESSERA streaming in the browser, planetary programming at WG2.8, biodiversity action papers, FP Launchpad opens, and Docker CACM buzz
Summary of the Nine Recommendations and Biodiversity Monitoring Standards Framework papers from the NAS/Royal Society US-UK Forum in summer 2025, and how they connect to my work on collective knowledge systems, TESSERA, and evidence synthesis.
Mark Elvers. Mainly for my future reference here is a walk-through of the Tessera pipeline.
Trip report from the Indian AI Impact Summit in New Delhi, covering the massive expo, a conversation with Yann LeCun, a hackathon/talk at IIT-Delhi, networking at the British High Commission, and reflections on the summit declaration's shift from safety to progress and equitable access.
First TESSERA hackathon held at the Indian AI Impact Summit in Delhi, exploring integration with IIT-Delhi's CoRE Stack for geospatial analysis and testing TESSERA labeling workflows.
Growing the Ceph cluster for TESSERA embeddings, a Lego brainstorming session for the Evidence TAP, hosting Echo Labs from ARIA, and Shane's IUCN Red List seminar.
Mark Elvers. The Tessera pipeline is written in Python. What would it take to have an OCaml version?
Andres Zuñiga-Gonzalez. Introduction This is quite a large update as it includes everything I’ve done for the past two weeks. I’ll talk about the LCZ classification and road mapping projects as well as my first actual experience with Claude Code and a cool toy example. LCZ Classification It turns out that getting the r…
Andrew Gonzalez, Tom August et al. — Proceedings of the National Academy of Sciences
Release of GeoTessera Python library and CLI for accessing TESSERA geospatial foundation model embeddings with interactive visualization tools.
OxCaml LabsJan 2025