.plan-26-15: Banyan trees, (anti)botnets and Bose-Einstein bases

Travelling from Ireland to IIT Madras for the FP Launchpad launch, mirroring half a petabyte of TESSERA embeddings to AWS Open Data, antibotty discussions, and Tangled trust boundaries for AI code review.

I spent this week shuttling from Ireland to Cambridge to India for the FP Launchpad takeoff at IIT Madras next week. This new centre is possibly even more exciting than a roundtrip to the moon for the PL geeks among you!

Certainly the nicest management school campus I've ever been to
Certainly the nicest management school campus I've ever been to

I'm hosted by KC Sivaramakrishnan and staying in the amazing Bose-Einstein guest house on campus. There are banyan trees, monkeys, huge bats and deer everywhere! Ilya Sergey and I wandered up to the local student coffee shop and ran into Durwasa Chakraborty and Vimala Soundarapandian where we learnt more about PhD life out here. Durwasa has a fun blog post on "How Agentic LLMs Almost Destroyed My Academic Career" which came up as we chatted about how easy access was to frontier coding LLMs in India (answer: not very) and also my recent musings about AI provenance.

%rc
This is my second trip to India this year after the AI Impact Summit and the TESSERA hackathon in Delhi, but my first time back to Tamil Nadu in a while. I went to school here in PSBB back in 1990 and it's fantastic touring my old haunts again!

1 Antibotties and Mythos fallout

My antibotty immune system post got some discussion on Lobsters after the launch of Mythos. I thought that lcamtuf (of afl-fuzz fame) offered the most nuanced take:

I'm less convinced that this is a fundamental inflection point for software security. The bulk of vulnerability research - I'd not hesitate to say 80%+ - was already automated with fuzzers and related tools. Fuzzers discovered tens of thousands of bugs in critical software and put some visible strain on the OSS ecosystem. We now have a new tool that will uncover even more bugs, but I don't think this is how the world ends. When I came up with afl-fuzz, I could've put out a press release saying that it's too dangerous to share. We survived. So, there's a combination of impressive results and marketing at play.

As a tangent, I worry a lot more about enterprise security. We now have a tool that can be deployed on the cheap to pull on every door handle at any large organization; solving this attack surface is less tractable than finding software bugs, in part because, to stick to my tortured metaphor, the "handles" the bots can pull on include every human working at the company. Worse, with vuln discovery, there's this inherent symmetry: you can use the same tools to beat the bad guys to the punch. Using LLMs to secure an enterprise is considerably harder, in part because you need much higher decision fidelity to automatically stop bad things without getting in the way of normal work. -- lcamtuf, Lobste.rs, 2026

In connected news, I've also been triaging bugs in various projects that have come out of Mythos-related disclosures, but more on that in a few weeks when it's public...

1.1 FP Launchpad preview

The FP Launchpad kickoff is on Monday and headed by KC Sivaramakrishnan, with a goal of bringing more assurance to systems construction. The speaking lineup is cracking: we've got Rishiyur Nikhil on FP for hardware design at Bluespec (which our Cambridge CHERI CPU uses), Chester Rebeiro on trusted hardware on RISC-V Shakti, Krishnan Raghavan on verifiable governance with LLMs and Lean, Shriram Krishnamurthi on lightweight diagramming languages, Ilya Sergey on mechanizing a massive real world type checker with AI-assisted metatheory, and I'll be talking about TESSERA and FP.

Others have been passing through Chennai as well; I missed out on last week when Nik Swamy's talk on agentic proof-oriented programming happened. He's produced 200KLOC+ of verified F*/Pulse code in a few weeks using Copilot CLI (10x harder than a good coding CLI, if I'm being snarky), including specifications of concurrent data structures through to proofs of production C code. KC Sivaramakrishnan conjectured to me during the week that the same LLM coding agents driving remote vulnerability discovery could also drive proof generation. Nik's results are the first evidence I've seen that this might actually scale, and Ilya has also been speeding along with his Lean code on this topic.

2 TESSERA: half a petabyte to the cloud

This week in TESSERA was mainly spent rationalising the storage setup. We've now got access to AWS Open Data (yay!) to mirror the TESSERA embeddings just as we run out of space in the University cluster again, and so we've started a giant transfer of everything to the cloud. This will take around 7 days (half a petabyte is a fair bit of data!), but I can satisfyingly watch this stream using S3:

> s3cmd ls s3://tessera-embeddings/v1/global_0.1_degree_representation/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2017/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2018/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2019/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2020/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2024/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2025/

Last week's Ceph expansion to 1.4PB raw was well timed as without that headroom the Cambridge side of this transfer would have been pretty squeezed, but we're all good now. Since we're also short of GPUs for inference as always, Mark Elvers has also been investigating using Intel AMX for inference which turns out to be much cheaper on the cloud spot market.

Next week I'm adding geotessera support for multiple registries so clients can discover tiles from both Cambridge and AWS. This builds on the Zarr v3 restructuring and geo-embeddings convention work from the past few weeks. Once the sync finishes, I'll kick off the Zarr conversions directly on AWS as well to actually make this usable with geotessera 0.8.

3.1 Tangled trust boundaries for AI code review

I've been thinking about reputation systems again and how to manage our Tangled PDSes, which have the nice property that our own git repos are hosted by us with the social metadata living elsewhere on ATProto. I'd like to extend our group's use of Tangled for better tracking of what's AI code or not, following my OCaml AI Disclosure proposal.

What I want are trust boundaries (within our group, or between individuals) where we can trade AI-generated code and help each other review it iteratively while remaining informed about where the code came from. I want to avoid what Avery Pennarun calls the sloth of too many review layers (every layer of approval makes you 10x slower in wall-clock time) by engineering quality in, taking advantage of OCaml's module interfaces to enforce boundaries. This connects to the provenance thinking I've been doing of late: if you can see at a glance what's AI-generated and who has reviewed what, we can direct limited human attention (and energy) much more efficiently without burning out.

This post by Meri also evaluates ATProto's proposed approach to permissioned data and figures out ownership transferability. What happens when a personal project (e.g. anil.recoil.org) grows into a community resource and the original creator wants to hand off control (say to eeg.cl.cam.ac.uk)? Lots of pesky details to get right here.

3.2 Chinese open source and India

Kevin Xu's definitive history of Chinese open source is a good read.

DeepSeek changed everything. Along with Qwen, Kimi, GLM, MiniMax, Stepfun, and even Unitree, Chinese open weight AI models took the world by storm. The traction even earned “open source” a shoutout in the Chinese Premier’s all-important address at the Two Sessions, with an explicit commitment to continue the growth of open source communities and ecosystems as part of the country’s next five-year plan. -- Kevin Xu, 2026

I'd love to see someone write an equivalent history for Indian open source; given my increasing time here it feels like there's a story to tell here as well, even though there hasn't been (to my knowledge) a breakout project from here that's had impact as big as Deepseek in recent years.

3.3 Computing's environmental responsibility

I'm going to be proposing a Dagstuhl seminar with Manuel Rigger and Sara Beery and ran across this previous Dagstuhl manifesto on "What is Computing's Responsibility?" from a 2025 workshop. It argues that computing contributes to planetary boundary overshoot rather than solving it, as ICT accounts for ~3% of global CO2 (comparable to aviation) and energy demand is accelerating. They do invoke Jevons' Paradox (that efficiency gains from computing lead to rebound effects) rather than net reductions.

I'm not sure I entirely agree with their conclusions though; our planetary computing work and the arxiv paper argue for pragmatic data-driven environmental policy-making which doesn't strike me as digital exceptionalism. The Dagstuhl summary is right that computing does have an overall responsibility problem, but I don't think the answer is to retreat from computational approaches to environmental activism. The upcoming third 'action PROPL' at PLDI in Boulder and the Rewilding the Web workshop in Edinburgh next month are both attempts at this.

References

[1]Madhavapeddy (2026). Connecting the dots for biodiversity action from the NAS/Royal Society Forum. 10.59350/dy7d3-hdt43
[2]Madhavapeddy (2026). A Proposal for Voluntary AI Disclosure in OCaml Code. 10.59350/cxypn-ysv27
[3]Madhavapeddy (2026). At the AI Impact Summit in Delhi: people, planet, progress. 10.59350/6vc5q-mbk23
[4]Madhavapeddy (2026). TESSERA now supports the Zarr geo-embeddings convention proposal. 10.59350/c3hrq-zsx02
[5]Madhavapeddy (2026). Streaming millions of TESSERA tiles over HTTP with Zarr v3. 10.59350/tk0er-ycs46
[6]Madhavapeddy (2025). Socially self-hosting source code with Tangled on Bluesky. 10.59350/r80vb-7b441
[7]Madhavapeddy (2025). Programming for the Planet at ICFP/SPLASH 2025. 10.59350/hasmq-vj807
[8]Madhavapeddy (2026). 1st TESSERA/CoRE hackathon at the Indian AI Summit. 10.59350/1na80-7ak85
[9]Madhavapeddy (2025). Four Ps for Building Massive Collective Knowledge Systems. 10.59350/418q4-gng78
[10]Ferris et al (2024). Planetary computing for data-driven environmental policy-making. arXiv. 10.48550/arXiv.2303.04501
[11]Madhavapeddy (2026). The Internet needs an antibotty immune system, stat. 10.59350/snnnf-asc02
[12]Madhavapeddy (2025). mlgpx is the first Tangled-hosted package available on opam. 10.59350/7267y-nj702