.plan-26-11: Bins, bollards, bots and biodiversity boffins

Anil Madhavapeddy

doi:10.59350/kg2a8-10w32

.plan-26-11: Bins, bollards, bots and biodiversity boffins

#tessera #packages #defra #policy #evidence #openstreetmap #ocaml #oxcaml[cite]·15 Mar 2026

Evidence synthesis at the DEFRA science conference, TESSERA transcoding and building a new SPA, OpenStreetMap/DuckDB bindings in OxCaml, and early thoughts on vibecoding etiquette.

1 Evidence Synthesis at the DEFRA science conference

Completely crammed event; lots of interest in AI driven evidence gathering!

Starting with evidence synthesis, we got invited after our previous meetings with DEFRA to run a session about our evidence TAP at their annual "DEFRA Science, Analysis and Data Professions Conference" in London. I couldn't make this one in person, but Sadiq Jaffer and Sam Reynolds did their usual brilliant double act with help from the rest of the CE team.

One little shocking fact I learnt is that DEFRA aren't covered by same sorts of publishing agreements we enjoy via JISC, which means they need to separately negotiate and pay up to all the publishers. While I'm all for sustainable publishing, it's incredibly inefficient to have public funds support a bunch of research which then needs to be repurchased by the government agency seeking to take nature positive decisions. I'm feeling the call of COAR and open publishing more and more...

Also congratulations to 15-year-old Jens Kromdijk, who interned with us in a school placement last summer, and had his OpenGL knowledge graph visualiser showcased in front of DEFRA in last week's talk by Sam! The video of this in action is above, and it's a very cool piece of game engine programming repurposed for interactive visualisation of the academic literature. Nice work Jens!

Sam shows off Jens' OpenGL viewer on stage

2 TESSERA hacking

I spent a bunch of time on figuring out TESSERA and Zarr v3 layouts, and working on getting adequate parallel performance out of it. The embeddings for 2024 and 2025 are now transcoding and pyramiding, so it'll be a few more days and can test these out properly.

2.1 Learning SPA and Typescript

I figure I need to learn 'modern' web programming while building the TZE explorer so I knocked up a website to aggregate information about TESSERA. I used the latest Vite v8 and Rolldown along with Svelte, and resisted the urge to do this in OCaml so I could learn about another language ecosystem!

The experience of building the SPA with Claude was straightforward, except for the usual problem of versioning going wrong. I had to manually intervene to do the fairly complex npm version upgrade (the agent picked Vite6 and Rollup, and I needed Vite8 and Rolldown). Deploying the SPA to GitHub Pages was a bit complicated as all unknown routes need to be redirected to a single index file, so this has to be customised to the static page provider. Since GitHub Pages serves a 404.html, I had to patch the site to generate stub 404.htmls in all the subdirectories so that they could be navigated to.

The geotessera.org website, as a single page application

Anyway, after all this we ended up with a nice geotessera.org site. I'd like to switch to using the shiny new Tangled Pages but am just waiting on custom domains support first. Congratulations to Akshay and Anirudh on shipping the complicated feature in Tangled! This is a convenient place where we can post news such as the v1 model weights being released.

I've added a TESSERA Atom feed with all posts and only original posts (i.e. those not federated from elsewhere), so get your feed readers subscribed!

2.2 Relevant geospatial papers

I ran across a nice whitepaper from Element 84 on a vector embeddings marketplace from last summer. Then "From Pixels to Patches: Pooling Strategies for Earth Embeddings" is a nice systematic view on how to aggregate embeddings:

As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution.

Tessera (512-d) shows the largest mean-pooling gap (12%), suggesting higher-dimensional embeddings benefit most from summary statistics. -- From Pixels to Patches: Pooling Strategies for Earth Embeddings, Corley et al, 2026

This is particularly timely as we're working on Matryoshka embeddings which should bring a fresh (and more information rich) twist to this when they're ready.

James G. C. Ball also pushed out an excellent preprint of doing data-efficient tree species mapping in temperate mountain forests, which is essential reading if you're building classifiers or segmenters using models like TESSERA!

3 Working on OxCaml in OxMono

I've continued working on a bunch of side OCaml libraries in OxMono. Firstly thanks to Jon Ludlam for getting me oxdoc HTML working in my repo!

3.1 OpenStreetMap and DuckDB in OxCaml

Both TZE and the Enki dashboards that Shane Weisz is working on could also use vector tiles for human activities, ideally via local services. I built an OxCaml OpenStreetMap converter from their compressed protobuf format so that I do queries via an in-process DuckDB from OCaml. This allows me to do rapid queries for various vector tags, such as finding all the bollards or horse-friendly gates in the world:

> select * from nodes where map_contains(nodes.tags, 'barrier');
┌──────────────────────────────┬──────────────────────┬───────────────────────────────────────────────
│  id    │        lat          │         lon          │                     tags                     
│ i64    │       double        │        double        │             map(varchar, varchar)             
├────────┼─────────────────────┼──────────────────────┼───────────────────────────────────────────────
│ 291281 │          51.8116825 │  -0.8396488000000001 │ {access=private, barrier=gate}                 
│ 291711 │          51.8161508 │           -0.8351806 │ {barrier=bollard, motor_vehicle=no}            
│ 155806 │   60.65463260000001 │    7.881729600000001 │ {barrier=cattle_grid}                          
│ 402757 │  50.917684400000006 │           -1.4037637 │ {barrier=bollard, bollard=fixed}               
│ 198617 │          59.3074218 │           17.9634278 │ {barrier=yes, bicycle=yes, foot=yes, horse=yes} 
│ 392357 │  45.082863100000004 │   2.7073557000000004 │ {barrier=gate, bicycle=yes, foot=yes}        
│ 424724 │   51.60528540000001 │            -0.178978 │ {barrier=cycle_barrier, foot=yes}

I'll write more about this next week when it's more fleshed out, but the basic bindings and CLI tools for OSM are in my oxmono#duckdb branch for the very curious. I'm still running performance matrices to figure out the best parallelisation strategy for importing the millions of records involved, but the local/stack/unboxing support is already a significant performance boost and very usable (about an hour to import the full 250GB database, which is very usable after that for queries).

3.2 CPU and GPU inference

Mark Elvers meanwhile has been figuring out how to do efficient CPU inference using OxCaml SIMD and also GPU vs CPU NUMA (which threw me back 12 years to my FOSDEM 2013 talk on NUMA).

We have a lot of spare CPU compared to GPU, so this is a direction we'll likely go down to soak spare CPU for relatively slow and steady inference of TESSERA tiles.

My monorepo is once again vastly diverging from colleagues'; for example Thomas Gazagnaire has his agentic monopampam monorepo chocabloc full of exciting new developments. None of the tooling we are building can quite manage to keep things in sync due to the unbelievable throughput of a well-prompted LLM.

But on the other hand, none of the code we're building is fit for third party release without extensive code review (which I've yet to do!). We're risking getting stuck in an uncanny valley of 'almost there' code, which feels a bit like a return to the land of untyped Javascript!

While I've been punting having a firm opinion on this down the road as I'm not sharing much code yet, Patrick Ferris has put together an excellent piece on vibecoding etiquette which I agree with. Jon Ludlam has also adopted a slightly different (but compatible) policy of having a separate commit email for his agent and depending on a rebase to 'own' the code:

It's always slightly alarming to see my own name on the output of the bots, assigning me (or sometimes someone else (!!)) copyright over code I've never seen. This is, of course, a whole other pandora's box that I really don't want to open right now - but I think the point is that I'll feel a lot more comfortable if the commits are all by Jon's Agent <jon+claude@recoil.org> rather than by me! -- Containers vs accounts, 2026-04, Jon Ludlam

Both of these seem right to me. When I'm back in Cambridge in a few weeks, I'm going to take a serious look at locally hosted models as well. Luke Marsden has been doing brilliant work on open agent infrastructure that I've not had the bandwidth to try out yet.

4 Fun Links

I've been watching Tom Loosemore doing epic hacking with UK government websites in the past few months. After seeing his latest post about bin collection days I couldn't resist pitching in to build a script for him. There was a push for this sort of thing back in 2010 (via ScraperWiki or Dapper), and perhaps resurrecting these in an agentic world would be a nice way for collaboration on providing programmatic access to otherwise fragmented local government interfaces.
"The University of Illinois Just Released a Popcorn So Good It Doesn’t Need Butter" is my science story of the week.
We Distribute bridged onto ATProto last week and is a nice collection news about the fediverse, ATproto and Matrix ecosystems.
It turns out that our UK spend on netzero is in total less than a single oil crisis. Energy independence through renewables is just sound fiscal policy.
Perplexity is building a Databox equivalent to meet all the Claws, and Docker and Nanoclaw joined up with sandboxes too. Noone's quite built the kind of temporal and spatially aware personal database we proposed back in 2015 or earlier. I guess I'm glad that my OCaml LifeDB prototype is now 18 years old and finally relevant.
OpenUK has issued its recommendation to UKRI for a UK Foundation for Open Source with a comprehensive report. I remain enthusiastic about the prospect of a variant of the UK National Data Library and this is a good report!

References

[1]Madhavapeddy (2026). Discussing effective conservation with all the UK Chief Scientists. 10.59350/qjrmv-38130

[2]Madhavapeddy (2025). Royal Society's Future of Scientific Publishing meeting. 10.59350/nmcab-py710

[3]Madhavapeddy (2025). Holding an OxCaml tutorial at ICFP/SPLASH 2025. 10.59350/55bc5-x4p75

[4]Madhavapeddy (2025). Thoughts on the National Data Library and private research data. 10.59350/fk6vy-5q841

[5]Jaffer et al (2025). AI-assisted Living Evidence Databases for Conservation Science. Cambridge Open Engage. 10.33774/coe-2025-rmsqf

[6]Madhavapeddy (2025). Publish, Review, Curate to upend scholarly publishing. 10.59350/fpc9w-ccj82

[7]Chaudhry et al (2015). Personal Data: Thinking Inside the Box. 10.7146/aahcc.v1i1.21312

[8]Ball et al (2026). Geospatial foundation models enable data-efficient tree species mapping in temperate mountain forests. bioRxiv. 10.64898/2026.02.23.707022

[9]Madhavapeddy (2026). Streaming millions of TESSERA tiles over HTTP with Zarr v3. 10.59350/tk0er-ycs46

[10]Madhavapeddy (2025). Using AT Proto for more than just Bluesky posts. 10.59350/32rdt-zny05

[11]Corley et al (2026). From Pixels to Patches: Pooling Strategies for Earth Embeddings. arXiv. 10.48550/arXiv.2603.02080

.plan-26-13: Oxidised, standardised, and syndicatedMar 2026

Publishing the OxCaml Labs year-one review, POSSE and AI content disclosure for the web, adopting the geo-embeddings Zarr convention for TESSERA, action PROPL at PLDI, the death of the grant application, and NASA's new swathe lidar mission.

.plan-26-12: Zarr across space and TESSERA timeMar 2026

Reworking the TESSERA Zarr store layout after community feedback, Springer's API woes for evidence synthesis, vibecoding introspection, and git remote helpers for ATProto.

Cambridge Evidence TAP OpenGL interactive visualiserMar 2026

14-year old Jens Kromdijk did a placement with Conservation Evidence in 2025 and worked on an visualizer for the knowledge graph of millions of full text papers that we have assembled for evidence synthesis. Jens worked with Sam Reynolds, Sadiq Jaffer, Will Morgan, Bill Sutherland and Anil Madhavapeddy from the University of Cambridge to build this visualizer using native OpenGL and an interactive user interface, allowing us to browse through the complex connections and metadata in the literature.

Streaming millions of TESSERA tiles over HTTP with Zarr v3Mar 2026

How we restructured TESSERA's geospatial embeddings from millions of individual numpy files into sharded Zarr v3 stores for efficient HTTP streaming, enabling everything from single-pixel mobile lookups to regional-scale analysis with just a couple of range requests.

Beaten by the binsMar 2026

Tom Loosemore. I’m beaten. Me and my LLMs have been beaten by the bins. I’ve spent a bunch of time and tokens failing to create a web app that lets anyone in the UK share their address and find out when to put their bins out. There is simply too much variation in the coding and UX […]

ONNX inference engine using OxCaml’s SIMD intrinsicsMar 2026

Mark Elvers. Following my previous CPU vs GPU post I started thinking about what the ONNX inference engine actually did and if it could be replicated in OxCaml with SIMD.

GPU vs CPU for ONNX Inference: NVIDIA L4 vs AMD EPYC 9965Mar 2026

Mark Elvers. In a previous post, I compared the ONNX Runtime with PyTorch on the CPU and GPU. In this post, I take this to the extreme to see if a CPU can outpace the NVIDIA L4 GPU.

TESSERA v1 model weights now availableMar 2026

TESSERA. The TESSERA v1 model weights are now publicly available, including the QAT checkpoint for quantized int8 inference.

Work distribution with OClusterMar 2026

Mark Elvers. We use OCluster to manage the build cluster for the CI services backing OCaml-CI and opam-repo-ci. However, it is a general-purpose tool and isn’t tied to being a build system; it can distribute any jobs across multiple worker machines.

Letting the Agents run wild – seamless Public Services?Mar 2026

Tom Loosemore. A few days ago I shared a video of an AI Agent hosting a benefit entitlement interview. The AI Agent intuits a lot in the course of this conversation, but still has to ask loads of tricky financial questions about earnings, rent, existing benefits etc. Mistakes are inevitable, often causing huge hea…

Geospatial foundation models enable data-efficient tree species mapping in temperate mountain forestsMar 2026

James GC Ball, Jana Annika Wicklein et al.

AI agents will join up government before government doesFeb 2026

Tom Loosemore. AI agents will join up government before government joins up government. This will be a mixed blessing for citizens, with big implications for trust, equity and accountability. Let’s say you move house in the UK, and want to inform relevant public services. Well, good luck. You’re on you…

Discussing effective conservation with all the UK Chief ScientistsFeb 2026

Hosting the UK chief scientists for nature conservation at Pembroke to discuss TESSERA and AI for biodiversity, followed by the Conservation Evidence conference where I talked about choosing the open red pill over black-box AI for conservation decision-making.

Enki, a Dashboard of Life on EarthJan 2026

Publish, Review, Curate to upend scholarly publishingDec 2025

Report from a COAR conference on transforming scholarly publishing through the Publish, Review, Curate model, discussing diamond open access, early career challenges, and expanding open infrastructure to datasets and code.

Holding an OxCaml tutorial at ICFP/SPLASH 2025Oct 2025

Tutorial at ICFP 2025 on OxCaml extensions for performance engineering with modes and locals.

AI-assisted Living Evidence Databases for Conservation ScienceOct 2025

Sadiq Jaffer, William Morgan et al.

Royal Society's Future of Scientific Publishing meetingJul 2025

Live notes from Royal Society conference on scientific publishing challenges including peer review crisis, AI poisoning threats and open access economics.

Thoughts on the National Data Library and private research dataFeb 2025

Exploring the National Data Library and its potential to improve access to private research data while balancing security and privacy concerns.

Using AT Proto for more than just Bluesky postsFeb 2025

Explore alternative uses for AT Proto beyond Bluesky posts, enabling self-sovereign digital infrastructure and innovative apps.

OxCaml LabsJan 2025

TESSERA, a pixelwise geospatial foundation modelJan 2025

Conservation Evidence CopilotsJan 2024

Personal Data: Thinking Inside the BoxOct 2015

Amir Chaudhry, Jon Crowcroft et al. — Aarhus Series on Human Centered Computing

Personal ContainersJan 2009