· DOI: 10.59350/kg2a8-10w32

.plan-26-11: Bins, bollards, bots and biodiversity boffins

Evidence synthesis at the DEFRA science conference, TESSERA transcoding and building a new SPA, OpenStreetMap/DuckDB bindings in OxCaml, and early thoughts on vibecoding etiquette.

1 Evidence Synthesis at the DEFRA science conference

Completely crammed event; lots of interest in AI driven evidence gathering!
Completely crammed event; lots of interest in AI driven evidence gathering!

Starting with evidence synthesis, we got invited after our previous meetings with DEFRA to run a session about our evidence TAP at their annual "DEFRA Science, Analysis and Data Professions Conference" in London. I couldn't make this one in person, but Sadiq Jaffer and Sam Reynolds did their usual brilliant double act with help from the rest of the CE team.

%rc
One little shocking fact I learnt is that DEFRA aren't covered by same sorts of publishing agreements we enjoy via JISC, which means they need to separately negotiate and pay up to all the publishers. While I'm all for sustainable publishing, it's incredibly inefficient to have public funds support a bunch of research which then needs to be repurchased by the government agency seeking to take nature positive decisions. I'm feeling the call of COAR and open publishing more and more...

Also congratulations to 15-year-old Jens Kromdijk, who interned with us in a school placement last summer, and had his OpenGL knowledge graph visualiser showcased in front of DEFRA in last week's talk by Sam! The video of this in action is above, and it's a very cool piece of game engine programming repurposed for interactive visualisation of the academic literature. Nice work Jens!

Sam shows off Jens' OpenGL viewer on stage
Sam shows off Jens' OpenGL viewer on stage

2 TESSERA hacking

I spent a bunch of time on figuring out TESSERA and Zarr v3 layouts, and working on getting adequate parallel performance out of it. The embeddings for 2024 and 2025 are now transcoding and pyramiding, so it'll be a few more days and can test these out properly.

2.1 Learning SPA and Typescript

I figure I need to learn 'modern' web programming while building the TZE explorer so I knocked up a website to aggregate information about TESSERA. I used the latest Vite v8 and Rolldown along with Svelte, and resisted the urge to do this in OCaml so I could learn about another language ecosystem!

The experience of building the SPA with Claude was straightforward, except for the usual problem of versioning going wrong. I had to manually intervene to do the fairly complex npm version upgrade (the agent picked Vite6 and Rollup, and I needed Vite8 and Rolldown). Deploying the SPA to GitHub Pages was a bit complicated as all unknown routes need to be redirected to a single index file, so this has to be customised to the static page provider. Since GitHub Pages serves a 404.html, I had to patch the site to generate stub 404.htmls in all the subdirectories so that they could be navigated to.

The geotessera.org website, as a single page application
The geotessera.org website, as a single page application

Anyway, after all this we ended up with a nice geotessera.org site. I'd like to switch to using the shiny new Tangled Pages but am just waiting on custom domains support first. Congratulations to Akshay and Anirudh on shipping the complicated feature in Tangled! This is a convenient place where we can post news such as the v1 model weights being released.

I've added a TESSERA Atom feed with all posts and only original posts (i.e. those not federated from elsewhere), so get your feed readers subscribed!

2.2 Relevant geospatial papers

I ran across a nice whitepaper from Element 84 on a vector embeddings marketplace from last summer. Then "From Pixels to Patches: Pooling Strategies for Earth Embeddings" is a nice systematic view on how to aggregate embeddings:

As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution.

Tessera (512-d) shows the largest mean-pooling gap (12%), suggesting higher-dimensional embeddings benefit most from summary statistics. -- From Pixels to Patches: Pooling Strategies for Earth Embeddings, Corley et al, 2026

This is particularly timely as we're working on Matryoshka embeddings which should bring a fresh (and more information rich) twist to this when they're ready.

James G. C. Ball also pushed out an excellent preprint of doing data-efficient tree species mapping in temperate mountain forests, which is essential reading if you're building classifiers or segmenters using models like TESSERA!

3 Working on OxCaml in OxMono

I've continued working on a bunch of side OCaml libraries in OxMono. Firstly thanks to Jon Ludlam for getting me oxdoc HTML working in my repo!

3.1 OpenStreetMap and DuckDB in OxCaml

Both TZE and the Enki dashboards that Shane Weisz is working on could also use vector tiles for human activities, ideally via local services. I built an OxCaml OpenStreetMap converter from their compressed protobuf format so that I do queries via an in-process DuckDB from OCaml. This allows me to do rapid queries for various vector tags, such as finding all the bollards or horse-friendly gates in the world:

> select * from nodes where map_contains(nodes.tags, 'barrier');
┌──────────────────────────────┬──────────────────────┬───────────────────────────────────────────────
│  id    │        lat          │         lon          │                     tags                     
│ i64    │       double        │        double        │             map(varchar, varchar)             
├────────┼─────────────────────┼──────────────────────┼───────────────────────────────────────────────
│ 291281 │          51.8116825 │  -0.8396488000000001 │ {access=private, barrier=gate}                 
│ 291711 │          51.8161508 │           -0.8351806 │ {barrier=bollard, motor_vehicle=no}            
│ 155806 │   60.65463260000001 │    7.881729600000001 │ {barrier=cattle_grid}                          
│ 402757 │  50.917684400000006 │           -1.4037637 │ {barrier=bollard, bollard=fixed}               
│ 198617 │          59.3074218 │           17.9634278 │ {barrier=yes, bicycle=yes, foot=yes, horse=yes} 
│ 392357 │  45.082863100000004 │   2.7073557000000004 │ {barrier=gate, bicycle=yes, foot=yes}        
│ 424724 │   51.60528540000001 │            -0.178978 │ {barrier=cycle_barrier, foot=yes}

I'll write more about this next week when it's more fleshed out, but the basic bindings and CLI tools for OSM are in my oxmono#duckdb branch for the very curious. I'm still running performance matrices to figure out the best parallelisation strategy for importing the millions of records involved, but the local/stack/unboxing support is already a significant performance boost and very usable (about an hour to import the full 250GB database, which is very usable after that for queries).

3.2 CPU and GPU inference

Mark Elvers meanwhile has been figuring out how to do efficient CPU inference using OxCaml SIMD and also GPU vs CPU NUMA (which threw me back 12 years to my FOSDEM 2013 talk on NUMA).

We have a lot of spare CPU compared to GPU, so this is a direction we'll likely go down to soak spare CPU for relatively slow and steady inference of TESSERA tiles.

3.3 LLM protocols for sharing code

My monorepo is once again vastly diverging from colleagues'; for example Thomas Gazagnaire has his agentic monopampam monorepo chocabloc full of exciting new developments. None of the tooling we are building can quite manage to keep things in sync due to the unbelievable throughput of a well-prompted LLM.

But on the other hand, none of the code we're building is fit for third party release without extensive code review (which I've yet to do!). We're risking getting stuck in an uncanny valley of 'almost there' code, which feels a bit like a return to the land of untyped Javascript!

While I've been punting having a firm opinion on this down the road as I'm not sharing much code yet, Patrick Ferris has put together an excellent piece on vibecoding etiquette which I agree with. Jon Ludlam has also adopted a slightly different (but compatible) policy of having a separate commit email for his agent and depending on a rebase to 'own' the code:

It's always slightly alarming to see my own name on the output of the bots, assigning me (or sometimes someone else (!!)) copyright over code I've never seen. This is, of course, a whole other pandora's box that I really don't want to open right now - but I think the point is that I'll feel a lot more comfortable if the commits are all by Jon's Agent <jon+claude@recoil.org> rather than by me! -- Containers vs accounts, 2026-04, Jon Ludlam

Both of these seem right to me. When I'm back in Cambridge in a few weeks, I'm going to take a serious look at locally hosted models as well. Luke Marsden has been doing brilliant work on open agent infrastructure that I've not had the bandwidth to try out yet.

References

[1]Madhavapeddy (2026). Discussing effective conservation with all the UK Chief Scientists. 10.59350/qjrmv-38130
[2]Madhavapeddy (2025). Royal Society's Future of Scientific Publishing meeting. 10.59350/nmcab-py710
[3]Madhavapeddy (2025). Holding an OxCaml tutorial at ICFP/SPLASH 2025. 10.59350/55bc5-x4p75
[4]Madhavapeddy (2025). Thoughts on the National Data Library and private research data. 10.59350/fk6vy-5q841
[5]Jaffer et al (2025). AI-assisted Living Evidence Databases for Conservation Science. Cambridge Open Engage. 10.33774/coe-2025-rmsqf
[6]Madhavapeddy (2025). Publish, Review, Curate to upend scholarly publishing. 10.59350/fpc9w-ccj82
[7]Chaudhry et al (2015). Personal Data: Thinking Inside the Box. 10.7146/aahcc.v1i1.21312
[8]Ball et al (2026). Geospatial foundation models enable data-efficient tree species mapping in temperate mountain forests. bioRxiv. 10.64898/2026.02.23.707022
[9]Madhavapeddy (2026). Streaming millions of TESSERA tiles over HTTP with Zarr v3. 10.59350/8mwjg-b4513
[10]Madhavapeddy (2025). Using AT Proto for more than just Bluesky posts. 10.59350/32rdt-zny05
[11]Corley et al (2026). From Pixels to Patches: Pooling Strategies for Earth Embeddings. arXiv. 10.48550/arXiv.2603.02080