1 Evidence Synthesis at the DEFRA science conference

Starting with evidence synthesis, we got invited after our previous meetings with DEFRA to run a session about our evidence TAP at their annual "DEFRA Science, Analysis and Data Professions Conference" in London. I couldn't make this one in person, but Sadiq Jaffer and Sam Reynolds did their usual brilliant double act with help from the rest of the CE team.

Also congratulations to 15-year-old Jens Kromdijk, who interned with us in a school placement last summer, and had his OpenGL knowledge graph visualiser showcased in front of DEFRA in last week's talk by Sam! The video of this in action is above, and it's a very cool piece of game engine programming repurposed for interactive visualisation of the academic literature. Nice work Jens!

2 TESSERA hacking
I spent a bunch of time on figuring out TESSERA and Zarr v3 layouts, and working on getting adequate parallel performance out of it. The embeddings for 2024 and 2025 are now transcoding and pyramiding, so it'll be a few more days and can test these out properly.
2.1 Learning SPA and Typescript
I figure I need to learn 'modern' web programming while building the TZE explorer so I knocked up a website to aggregate information about TESSERA. I used the latest Vite v8 and Rolldown along with Svelte, and resisted the urge to do this in OCaml so I could learn about another language ecosystem!
The experience of building the SPA with Claude was straightforward, except for the usual problem of versioning going wrong. I had to manually intervene to do the fairly complex npm version upgrade (the agent picked Vite6 and Rollup, and I needed Vite8 and Rolldown). Deploying the SPA to GitHub Pages was a bit complicated as all unknown routes need to be redirected to a single index file, so this has to be customised to the static page provider. Since GitHub Pages serves a 404.html, I had to patch the site to generate stub 404.htmls in all the subdirectories so that they could be navigated to.

Anyway, after all this we ended up with a nice geotessera.org site. I'd like to switch to using the shiny new Tangled Pages but am just waiting on custom domains support first. Congratulations to Akshay and Anirudh on shipping the complicated feature in Tangled! This is a convenient place where we can post news such as the v1 model weights being released.
I've added a TESSERA Atom feed with all posts and only original posts (i.e. those not federated from elsewhere), so get your feed readers subscribed!
2.2 Relevant geospatial papers
I ran across a nice whitepaper from Element 84 on a vector embeddings marketplace from last summer. Then "From Pixels to Patches: Pooling Strategies for Earth Embeddings" is a nice systematic view on how to aggregate embeddings:
As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution.
Tessera (512-d) shows the largest mean-pooling gap (12%), suggesting higher-dimensional embeddings benefit most from summary statistics. -- From Pixels to Patches: Pooling Strategies for Earth Embeddings, Corley et al, 2026
This is particularly timely as we're working on Matryoshka embeddings which should bring a fresh (and more information rich) twist to this when they're ready.
James G. C. Ball also pushed out an excellent preprint of doing data-efficient tree species mapping in temperate mountain forests, which is essential reading if you're building classifiers or segmenters using models like TESSERA!
3 Working on OxCaml in OxMono
I've continued working on a bunch of side OCaml libraries in OxMono. Firstly thanks to Jon Ludlam for getting me oxdoc HTML working in my repo!
3.1 OpenStreetMap and DuckDB in OxCaml
Both TZE and the Enki dashboards that Shane Weisz is working on could also use vector tiles for human activities, ideally via local services. I built an OxCaml OpenStreetMap converter from their compressed protobuf format so that I do queries via an in-process DuckDB from OCaml. This allows me to do rapid queries for various vector tags, such as finding all the bollards or horse-friendly gates in the world:
> select * from nodes where map_contains(nodes.tags, 'barrier');
┌──────────────────────────────┬──────────────────────┬───────────────────────────────────────────────
│ id │ lat │ lon │ tags
│ i64 │ double │ double │ map(varchar, varchar)
├────────┼─────────────────────┼──────────────────────┼───────────────────────────────────────────────
│ 291281 │ 51.8116825 │ -0.8396488000000001 │ {access=private, barrier=gate}
│ 291711 │ 51.8161508 │ -0.8351806 │ {barrier=bollard, motor_vehicle=no}
│ 155806 │ 60.65463260000001 │ 7.881729600000001 │ {barrier=cattle_grid}
│ 402757 │ 50.917684400000006 │ -1.4037637 │ {barrier=bollard, bollard=fixed}
│ 198617 │ 59.3074218 │ 17.9634278 │ {barrier=yes, bicycle=yes, foot=yes, horse=yes}
│ 392357 │ 45.082863100000004 │ 2.7073557000000004 │ {barrier=gate, bicycle=yes, foot=yes}
│ 424724 │ 51.60528540000001 │ -0.178978 │ {barrier=cycle_barrier, foot=yes}
I'll write more about this next week when it's more fleshed out, but the basic bindings and CLI tools for OSM are in my oxmono#duckdb branch for the very curious. I'm still running performance matrices to figure out the best parallelisation strategy for importing the millions of records involved, but the local/stack/unboxing support is already a significant performance boost and very usable (about an hour to import the full 250GB database, which is very usable after that for queries).
3.2 CPU and GPU inference
Mark Elvers meanwhile has been figuring out how to do efficient CPU inference using OxCaml SIMD and also GPU vs CPU NUMA (which threw me back 12 years to my FOSDEM 2013 talk on NUMA).
We have a lot of spare CPU compared to GPU, so this is a direction we'll likely go down to soak spare CPU for relatively slow and steady inference of TESSERA tiles.
3.3 LLM protocols for sharing code
My monorepo is once again vastly diverging from colleagues'; for example Thomas Gazagnaire has his agentic monopampam monorepo chocabloc full of exciting new developments. None of the tooling we are building can quite manage to keep things in sync due to the unbelievable throughput of a well-prompted LLM.
But on the other hand, none of the code we're building is fit for third party release without extensive code review (which I've yet to do!). We're risking getting stuck in an uncanny valley of 'almost there' code, which feels a bit like a return to the land of untyped Javascript!
While I've been punting having a firm opinion on this down the road as I'm not sharing much code yet, Patrick Ferris has put together an excellent piece on vibecoding etiquette which I agree with. Jon Ludlam has also adopted a slightly different (but compatible) policy of having a separate commit email for his agent and depending on a rebase to 'own' the code:
It's always slightly alarming to see my own name on the output of the bots, assigning me (or sometimes someone else (!!)) copyright over code I've never seen. This is, of course, a whole other pandora's box that I really don't want to open right now - but I think the point is that I'll feel a lot more comfortable if the commits are all by
Jon's Agent <jon+claude@recoil.org>rather than by me! -- Containers vs accounts, 2026-04, Jon Ludlam
Both of these seem right to me. When I'm back in Cambridge in a few weeks, I'm going to take a serious look at locally hosted models as well. Luke Marsden has been doing brilliant work on open agent infrastructure that I've not had the bandwidth to try out yet.
4 Fun Links
- I've been watching Tom Loosemore doing epic hacking with UK government websites in the past few months. After seeing his latest post about bin collection days I couldn't resist pitching in to build a script for him. There was a push for this sort of thing back in 2010 (via ScraperWiki or Dapper), and perhaps resurrecting these in an agentic world would be a nice way for collaboration on providing programmatic access to otherwise fragmented local government interfaces.
- "The University of Illinois Just Released a Popcorn So Good It Doesn’t Need Butter" is my science story of the week.
- We Distribute bridged onto ATProto last week and is a nice collection news about the fediverse, ATproto and Matrix ecosystems.
- It turns out that our UK spend on netzero is in total less than a single oil crisis. Energy independence through renewables is just sound fiscal policy.
- Perplexity is building a Databox equivalent to meet all the Claws, and Docker and Nanoclaw joined up with sandboxes too. Noone's quite built the kind of temporal and spatially aware personal database we proposed back in 2015 or earlier. I guess I'm glad that my OCaml LifeDB prototype is now 18 years old and finally relevant.
- OpenUK has issued its recommendation to UKRI for a UK Foundation for Open Source with a comprehensive report. I remain enthusiastic about the prospect of a variant of the UK National Data Library and this is a good report!
