A scorching CNG London during Climate Action Week

#tessera #biodiversity #conservation #climate #ai #nature #academia24 Jun 2026

My notes from the first Cloud-Native Geospatial Forum gathering outside the US, up on the top floor of the Jellicoe; covering Source Cooperative's open data economics, Argentina's invisible settlements, and provenance and trust for geospatial decisionmaking.

It's London Climate Action Week in the midst of a searing heatwave, which was a good backdrop for the Cloud-Native Geospatial Forum meeting (the first outside the US!). The venue was the Jellicoe where ARIA is based up on the top floor with a panoramic view over the City. CNG is a Radiant Earth initiative that I joined last year when I heard about their drive to make geospatial data available as a public good. This gathering was a rather excellent collection of 50 practitioners who were geeking out on coordinate systems and Zarr access patterns.

My talk notes run through unlocking the geospatial commons, going from maps to decision systems, on making EO embeddings actionable, to wrangling multidimensional data and getting ground-level nature data into the cloud, my favourite talk on Argentina's invisible settlements, and compressing the Earth embeddings and making them more agent accessible, and finally a round of lightning talks and closing thoughts.

1 Unlocking the value of geospatial data in the commons

Jed Sundwall is the CEO of Radiant Earth, and is the person who previously founded AWS Open Data when on the social responsibility team at Amazon. He opened proceedings by explaining that the CNG consortium aims to bring together data users with wildly different budgets (big corps, individual contributors, etc) but who all have the same geospatial problems. His idea is that "data has a lot of potential energy stored inside it, and the sweet spot is how we maximise the potential value of that data."

CNG exists to turn this data into something useful by promoting modern cloud-native methods to drive down the cost of the public good overall. If you've read my writing recently, you'll know that the biggest professional problem in my life is juggling TESSERA embeddings without running out of disk space and/or crashing the Cambridge University egress bandwidth from eager downloaders! So I'm enormously excited at the idea of having a public data commons to help share the load.

Jed Sundwall opens proceedings on the top floor of the Jellicoe

Jed then showed us the actual unit economics of their operation. Source Cooperative is Radiant Earth's data-publishing utility built on cloud object storage. As of a few months ago, it has 6.16 PB stored, 739M objects, over a billion requests a month, at a blended cost of around $20/terabyte/month. Most vendors guard their cost base for competitive reasons, but as a not-for-profit they share it so the community can reason about what it actually costs to share data at scale.

An important technical point is that Source Coop uses Cloudflare R2 as their CDN, so hot data is edge-cached around the world. This was a big feature request for making TESSERA easier to access at the first Indian hackathon recently, as their egress international bandwidth was pretty bad.

Source Cooperative's publishers include NASA, CarbonPlan, Planet and Asterisk Labs.

2 From maps to decision systems

Luca Budello, the Geospatial Lead at Innovate UK Business Connect and formerly of the CCI, gave his policy view learnt from running the GeoAI Festival. He noted that decision-makers are interested in "reliable answers to real problems, grounded in trustworthy data", so data provenance and trustworthiness matters enormously.

Luca traced how every wave of geospatial innovation has changed the way we interpret the world: first maps drew the world, then dashboards queried it, then cloud APIs let us program it, then reporting and analytics informed decisions, and now the next step is decision systems that automate them.

Luca's arc from first using data to draw the world through to automating decisions

The worrying part here is the jump from analytical workflows we can predict (deterministic, linear, static data) to these future decision systems that are probabilistic and dynamic and running continuous loops that can act directly on the world. AI is being imposed on us fast, so the obvious question is what the checks and balances are on these future decision systems. His answer was to build trust as a federated property and engineer in properties to the federation protocols to be explainable, accountable, auditable and trackable.

The analogy Luca used was that "GeoAI needs its open banking moment" to unlock the value of data the way open banking did for finance, and that the UK has a genuine advantage here with decades of authoritative, temporally rich, high-integrity data. The policy scaffolding is arriving (AI growth zones, BridgeAI and a sovereign AI fund) which roughly mirrors the EU's approach but with unfortunately rather less funding. He noted that a recent major AI policy report (I missed which one exactly) mentioned geospatial exactly once (urban planning), which is a strange omission for something so foundational as landuse planning is for a government.

3 Making Earth observation embeddings actionable

Earth Genome is a mission-driven non-profit out of California, behind ClimateTRACE and a dozen-plus geospatial products, funded by a hybrid of targeted projects and philanthropic donations for R&D. Noelia Jiménez Martínez and Glen Low walked us through their Earth Index work to make foundation-model embeddings usable by non-experts:

In the short time that Earth Index has been available, we’ve been amazed by the impact our users have made. Just to highlight a few: they’ve exposed narcotrafficking airstrips in the Peruvian Amazon; mapped illegal palm oil expansion in Brazil; uncovered hazardous quarries in the Balkans; and even mapped how rose farming is contributing to wetland loss in Uganda. -- Earth Genome, 2026

Their worked example was quantifying Jamaican seagrass with The Nature Conservancy. They used 70 datasets across four regions, blending field surveys, drone and high-resolution imagery with interpolation and modelling. The outputs need geolocalisation, a class (seagrass or not) and a density estimate as input data.

I was of course delighted to see our own Tessera embeddings up on the slide alongside AlphaEarth (Google, 64-dim) and OLMo Earth (Allen AI, vision transformer at 768-dim). Reassuringly, all three embedding models beat the no-embedding baseline comfortably on a benthic-environment case (using the Allen Coral Atlas ground truth). Their Tessera tests were using v1.0, and afterwards I explained how v1.1 has better coastal maps and so should see even better performance.

The embedding landscape: AlphaEarth OLMo Earth and Tessera (yay)

Every embedding outperforms the no-embedding baseline on the benthic case.

Glen's closing thoughts were that these systems should be globally comprehensive but also locally useful. This usually comes down to partnering and roles; choose where you add the most (multi-benefit) value; ensure that data and AI have actual users with a human-in-the-loop. Jamaica seagrass is an example of getting to a practical outcome rather than just frontier AI for its own sake.

4 Wrangling multidimensional data

Sol Cotton from Open Climate Fix used an excellent terminal/markdown presentation to guide us through the large multidimensional datasets challenge for cloud-native workflows, and how careful Zarr chunking strategies have transformed access efficiency across multiple dimensions.

This topic aligned well with the Icechunk discussions at PROPL last week. It looks like the geospatial community is converging on chunked/sharded, compressed, cloud-native ndarray storage. The main remaining question I have is how to find optimal chunking strategies for the queries and data appends, and also to layer a query interface over it (but I believe Icechunk has this capability).

I asked in the Q&A how coordinate transforms are handled, since in Tessera we use a MegaZarr one-group-per-utm-zone which requires some client-side stitching for ROIs that span UTM zones. There's no satisfactory answer for this yet, except perhaps shifting to equal size projections in the future.

Sol Cotton on taming multidimensional data with cloud-native Zarr chunking.

I also gave a talk of my own in this session on Tessera, first explaining global 10m pixel-wise embeddings with open weights, our move to Zarr v3 and a preview of the 1.1 and (forthcoming) 2.0 models. It was a happy coincidence to follow Earth Genome with them having just independently benchmarked us, as it made it much easier for me to motivate some of our recent improvements on coastal regions!

The reception to my talk from the audience was awesome; I spent most of the rest of my attendance chatting with people about it all. Questions ranged from whether we could help with weather (answer: yes coming soon!), ice caps (nope but a similar approach might work), ocean (nope but see Laure Zanna's work) and ecological modeling (see below).

It's quite difficult to point people to our eeg.zulipchat.com chat service when discussing in person, so I'll look into printing some Tessera 'project cards' that we can hand to people with the QRCodes and links. This seems more useful than personal business cards (which I haven't used in years!)

5 Getting ground-level nature data into the cloud

Echo Labs (represented by the wonderful Molly Blank and Kaja Wasik) were up next and talking about how to turn ecological complexity into useful signal. As background, this is a FRO backed by ARIA and Convergent Research who visited the CCI earlier in the year. They've also just launched their shiny new website this week!

Echo want to take fragmented, multimodal ground-level ecological data and transform it into representations of ecosystem condition ('ecosystem vectors'), as a shared foundation for measuring change and evaluating impact of interventions on the ground.

Echo Labs: turning ecological complexity into useful signal, ants and all.

Their proposed primitive for the representation is an ecosystem state vector, which is a compact representation that fuses different ground-level modalities (camera traps, acoustics, sensors) into one object the client can compute over.

Tessera is an obvious source of input here, but also a lot of other modalities of sensor data and ground truth info from the CLR would be useful to them as well. David Coomes has been discussing this with them since their visit to Cambridge!

A new data primitive from Echo is the 'ecosystem state vector'.

Their roadmap is staged sensibly to me with a first sprint on a proof that multimodal ground signals carry useful information in the first place. Then they're working on mid-term pilot projects grounding that utility in an ecological intervention context.

Longer-term, they want to release a shared resource/benchmark of embedded multimodal sensor data for research, policy and industry. I'm extremely excited to see other people intending to work on benchmarks in this space, as it's really difficult to evaluate techniques right now.

Where Echo Labs is heading from proof-of-concept to a shared multimodal resource.

In the Q&A, I brought up a topic that Mike Harfoot and I have been discussing. We're wondering whether synthetic data generation (e.g. from a process model like Madingley) could help to accelerate the training of their ecosystem model, since ground truth data is quite sparse. This isn't on their near-term roadmap but one of the things they're considering.

I had a quick chat with Stefan Istrate (who has been working with Silviu Petrovan on frog vision models) and am delighted to see that he's recently joined Echo as their head of machine learning! They're shaping up to have a very classy team indeed and I look forward to seeing how their ecosystems vectors progress.

6 The invisible settlements of Argentina

The talk I enjoyed the most was Nissim Lebovits (Radiant Earth), who explained the Barrios Visibles project (read paper as well). They used building-footprint data to surface a systematic population undercount in Argentina's informal settlements. And not just a small undercount: he reported they found some 3.4 million people missing from their national record, a significant fraction of the estimated 45 million inhabitants across the country!

RENABAP, Argentina's official registry of barrios populares, lists 1.24 million families across 6,467 settlements. But satellite imagery reveals 1.97 million buildings within those same boundaries—59% more structures than recorded families.

This isn't about the registry being outdated. RENABAP's own quality-control protocol requires that family counts match dwellings visible in satellite imagery. The gap documented here is a departure from that standard. Closing it requires methodological change, not just updated data. -- Barrios Visibles explainer, 2026

3.4 million people missing from Argentina's national record of informal settlements.

The talk was (to me anyway) an incredible demonstration of cloud-native open data doing politically consequential work. He ran a big query over the Parquet files hosted on Source Coop, doing in a few queries a full spatial cross-referencing run that combined Google+Microsoft+OpenStreetMap building footprints against the official registries from Argentina.

The point of doing this over the hosted Parquet is that it's not necessary to download everything to run the query (hence the importance of the cloud native approach).

As an example, in La Plata alone, roughly 72,000 building footprints intersect polygons for which the registry lists only 34,000 families. This kind of gap seems very important to account for when budgeting services and infrastructure development in the country.

Improving on the census: 72,000 buildings detected against 34,000 families registered.

Another point that he made (show in the video below) is that debugging/visualising this dataset is pretty easy, since the entire map is zoomable. To validate a given region, the officials just directly navigate there and find the polygons which are marked as settlements, as use normal visual satellite imagery to verify that there are in fact settlements there.

This left me wondering about the role of OpenStreetMap here: it was used as an input, but what's the mechanism to then propose updates to it so that the crowdsourced database remains accurate? I met another attendee Petya Kangalova who works for Humanitarian OpenStreetMap, which is a community of mappers focussed on the disaster response utility of the database.

Petya explained to me that HumOSM has a bunch of specialised tech products for disaster response. Two cool ones are a multiuser coordination layer for planning how to update an area, and OpenAerialMap to explore decently licensed imagery.

7 Compressing the Earth

Jacqueline Campbell of Asterisk Labs talked next. She's a planetary scientist who came to Earth's oceans via looking for life in the Mars dust, and presented "Earth Compress".

Their goal is to have open source, publicly owned, compression infrastructure for a variety of Earth data, built with the National Oceanography Centre. The domain-aware compression stack will make petabyte-scale analysis accessible to everyone rather than only to those with the biggest egress budgets:

The critical bottleneck limiting high-impact environmental research is how difficult it has become to process increasingly huge and complex datasets. To overcome this bottleneck we will build open source, AI-powered software infrastructure and data-as-AI models that are trustworthy and publicly-owned. We will not only build technology, but establish a multi-institutional cooperative, reducing current fragmentation and complexity to massively increase the number of organisations that can access and manipulate Earth-scale environmental data.

Our software infrastructure will simultaneously empower data producers (so they can easily create Earth Embedding models) and data users (so they can easily access and manipulate them). Therefore, we will enable a transformative shift, massively reducing the compute costs and complexity for all. -- Future of Environment Data Collective

This describes the problems we're having in Tessera pretty accurately. Srinivasan Keshav has also been leading an effort on our side to use residual vector quantization to dramatically shrink the size of the Tessera embeddings, so we've started a direct conversation with the Asterisk team to see how we can join forces!

In theory, this will allow for hugely faster 'sketches' of global analyses without much loss in accuracy for many downstream tasks. And because the Asterisk team is also applying the same trick to other embeddings, it'll make fusion of multimodal data sources much easier as well!

Earth Compress is a publicly-owned, domain-aware compression stack for Earth data.

Their architecture splits a compression toolbox (i.e. either classical compression, AI-based data fields, and AI embedding models like Tessera) and feeds into tailored data archives (file, columnar and vector databases) on the server side, with a transmission protocol that streams dynamic data through a manager out to decompressed data and embeddings on the client side.

I don't think there are many standards for what this custom VBR decoder might be yet, so this seems a good opportunity to establish one, much like the Zarr conventions community is doing.

The Earth Compress architecture, from compression toolbox to client-side transmission.

8 When the developer is an agent

My laptop started running out of juice (too many demos), so my remaining notes are a bit sketchy.

Stefan Amberger, co-founder of Tilebox, made the case that Tilebox is the "operating loop" for geospatial data workflows, and asked what changes when the developer is a semi-autonomous LLM agent.

Their answer is to establish a single workflow loop ("discover -> define -> run -> observe -> improve") that's shared across three kinds of callers. These are either humans on a console, LLM agents over MCP, and conventional software via APIs. The work is orchestrated to wherever the data is, either between the cloud or over to on-prem and edge devices.

Tilebox's one workflow loop for people, agents and software.

As with the discussion at last week's PROPL, there's quite a wide consensus that agents will join the coding loop whether we like it or not. So the focus needs to shift to how we keep not only the data source auditable, but also the coding loop more verifiable.

I did a quick poll of the audience to find out which of the geotessera users did coding by hand, and who used agents. I couldn't find a single person who'd use my lovely library by hand. Every single person used a variety of Claude to Codex. There were no local agent users, and no Copilot users, so that's a sign of a rarified crowd.

9 Lightning talks

The afternoon lightning round was a tour of practical pipelines. Jake Wilkins (Epoch Blue) showed how they go from days to minutes with a just-in-time pipeline for plot-level supply-chain analytics, aimed at helping companies comply with the forthcoming EUDR deforestation-compliance deadline.

I had a chance to chat to Jake afterwards and show him our FOOD provenance paper and the interactive explorer. What's really cool about Jake's work is that they're using global embeddings to calculate probabilities at the 10m2 level of a commodity being produced, whereas our (pre-Tessera) work depends on FAO provenance which is only at a national level.

Jake Wilkins on Epoch Blue's just-in-time supply-chain pipeline.

The Epoch Blue process runs customer-supplied locations and addresses through a geocode-and-verify loop, calculates probabilistic supply sheds down to delineated commodity plots, and then merges this with environmental metrics (deforestation, emissions, biodiversity, water use). Jake also wrote a nice piece on using AlphaEarth embeddings to detect palm-oil mill effluent lagoons. I really want to try this with Tessera as well...

Epoch Blue's process ranges from addresses to environmental metrics.

The other lightning talks were great; Alper Dincer (Climingo) spoke on global drought mapping with H3, GeoParquet and DuckDB. Ross Slater (Leeds) on going cloud-native without the cloud for Antarctic ice dynamics; and Petya Kangalova (HOT) who I mentioned earlier on cloud-native open imagery for disaster response. Ross has a really interesting usecase which could benefit from a Tessera-style Barlow Twins approach, but using different satellite data (S1/S2 don't go that far north), which I need to think about more.

10 Panels and closing thoughts

The day closed with a panel with David Eaves (UCL), Jack Kelly (dynamical.org and Open Climate Fix), Niall Robinson (NVIDIA) and Kaja Wasik (Echo Labs). Frustratingly, the heatwave had thrown the trains into the usual chaos and I had to leg it to King's Cross to get back to Cambridge, so I missed it entirely.

I did have a chinwag with Niall though, as he's been helping us train Tessera v2 on the Isambard-AI cluster (part of the UK's AI Research Resource). Jack's dynamical.org is also publishing weather data via Icechunk, which I'm planning to use in some weather forecasting research we're doing with Tessera atm.

Jed, Luca, Niall and I all talked about how many of the day's talks came back to matters of provenance and trust. The encouraging thing is that I think we now have many of the pieces in place to do something concrete about it, especially after last week's PROPL living document as well showed the number of PL researchers who want to dive into this problem alongside systems people. An ATProto-native trust graph like Tangled's evidence-backed vouching (which I wrote about a few weeks ago) could also anchor data provenance to an identity graph that's reusable across different services (see Semble for example), and supporting evidence-driven practice.

This was a first great experience of the cloud-native geospatial community in London for me! Thanks to Jed and Radiant Earth for convening it; next time, ideally, in slightly cooler weather, but the coffee in the Jellicoe was top notch so that made up for the burns!

Granary Square was a full on fountain spraying experience for adults, kids and pets

References

[1]Madhavapeddy (2026). Tessera v1.1 released, with smoother and temporally stable embeddings. 10.59350/vcqjp-24y05

[2]Madhavapeddy (2025). Publish, Review, Curate to upend scholarly publishing. 10.59350/fpc9w-ccj82

[3]Madhavapeddy (2026). TESSERA now supports the Zarr geo-embeddings convention proposal. 10.59350/c3hrq-zsx02

[4]Madhavapeddy (2026). Streaming millions of TESSERA tiles over HTTP with Zarr v3. 10.59350/tk0er-ycs46

[5]Madhavapeddy (2026). 1st TESSERA/CoRE hackathon at the Indian AI Summit. 10.59350/1na80-7ak85

[6]Madhavapeddy (2026). .plan-26-25: Planetary scale plans, Windows file-descriptor scale problems. 10.59350/b3vvx-n70

[7]Madhavapeddy (2025). Foundational AI for Ecosystem Resilience workshop. 10.59350/26hy6-rry61

[8]Madhavapeddy (2025). GeoTessera Python library released for geospatial embeddings. 10.59350/7hy6m-1rq76

[9]Lebovits (2026). Barrios Visibles: Building Footprint Evidence of Systematic Population Undercount in Argentina's Informal Settlements. SSRN. 10.2139/ssrn.6588819

.plan-26-26: Gelato, geospatial, and players of gamesJun 2026

Spoke at CHIA's annual conference on AI for a changing world, as well as the first Cloud-Native Geospatial Forum outside the US, and started moving TESSERA's embeddings onto Source Cooperative.

Nissim Lebovits on Barrios Visibles at CNG London 2026Jun 2026

Making Argentina's informal settlements count. See https://www.barriosvisibles.org/en

.plan-26-25: Planetary scale plans, Windows file-descriptor scale problemsJun 2026

Ten years of the CCI with Sir David Attenborough, Andrew's Royal Society Environment Medal lecture, and the third PROPL at PLDI, while wrapping a local DeepSeek agent in OCaml and a first stab at getting Eio fleshed out on Windows.

A proof-of-work puzzleJun 2026

Mark Elvers. This post explores two more sustainable solutions to the crawler bots issue affecting opam-repo-ci

Tessera v1.1 released, with smoother and temporally stable embeddingsJun 2026

TESSERA v1.1 is a drop-in retrained model that fixes the tiling artefacts of v1.0, with embeddings now being served from AWS S3 and model weights up on Hugging Face.

.plan-26-18: From tropical forest protection to oi swallowing its oxcaml tailMay 2026

Our REDD+ over-crediting paper hits Nature Communications just as Microsoft retreats from removals, we talk responsible evidence synthesis while LLMs appear in UK planning, and oi grows a self-update bootstrap.

AI, science and the UK–EU relationship at the Royal SocietyApr 2026

Notes from a Royal Society policy meeting with the European Commission on responsible AI, interoperable data and UK–EU alignment in AI for science; covering AI-poisoned literature, federated TESSERA-scale infrastructure, disclosure standards and the practical value of sustained UK–EU dialogue.

TESSERA now supports the Zarr geo-embeddings convention proposalMar 2026

Community feedback reshaped our Zarr store layout — years became a dimension, shards got bigger, and we retired the TESSERA-specific convention in favour of a shared geo-embeddings standard that also covers other models.

Streaming millions of TESSERA tiles over HTTP with Zarr v3Mar 2026

How we restructured TESSERA's geospatial embeddings from millions of individual numpy files into sharded Zarr v3 stores for efficient HTTP streaming, enabling everything from single-pixel mobile lookups to regional-scale analysis with just a couple of range requests.

1st TESSERA/CoRE hackathon at the Indian AI SummitFeb 2026

First TESSERA hackathon held at the Indian AI Impact Summit in Delhi, exploring integration with IIT-Delhi's CoRE Stack for geospatial analysis and testing TESSERA labeling workflows.

.plan-26-07: Storage, Lego, Echo, and the IUCNFeb 2026

Growing the Ceph cluster for TESSERA embeddings, a Lego brainstorming session for the Evidence TAP, hosting Echo Labs from ARIA, and Shane's IUCN Red List seminar.

.plan-26-06: Vivas, ARIA and interviewsFeb 2026

PhD viva for Maddy, presenting TESSERA at ARIA, Nature covers the conservation evidence conference, giving evidence to Parliamentary POST, and a CACM interview.

Publish, Review, Curate to upend scholarly publishingDec 2025

Report from a COAR conference on transforming scholarly publishing through the Publish, Review, Curate model, discussing diamond open access, early career challenges, and expanding open infrastructure to datasets and code.

Foundational AI for Ecosystem Resilience workshopDec 2025

Workshop report combining TESSERA geospatial foundation models with Concordia agent-based modeling to simulate ecosystem resilience, covering causal modeling for ecology and AI applications in nature conservation.

Food and the long term risk to lifeNov 2025

A Cambridge article explores our research on how food consumption affects the extinction risk of 30,875 land-dwelling animal species, with an interactive tool to examine biodiversity impacts across different countries and diets.

GeoTessera Python library released for geospatial embeddingsAug 2025

Release of GeoTessera Python library and CLI for accessing TESSERA geospatial foundation model embeddings with interactive visualization tools.

An access library for the world crop, food production and consumption datasetsApr 2025

Available

TESSERA, a pixelwise geospatial foundation modelJan 2025

Conservation Evidence CopilotsJan 2024