.plan-26-15: Banyan trees, (anti)botnets and Bose-Einstein bases

#ai #ocaml #tessera #india #security #internet #ecology #infrastructure #spatial #policy12 Apr 2026

Travelling from Ireland to IIT Madras for the FP Launchpad launch, mirroring half a petabyte of TESSERA embeddings to AWS Open Data, antibotty discussions, and Tangled trust boundaries for AI code review.

I spent this week shuttling from Ireland to Cambridge to India for the FP Launchpad takeoff at IIT Madras next week. This new centre is possibly even more exciting than a roundtrip to the moon for the PL geeks among you!

Certainly the nicest management school campus I've ever been to

I'm hosted by KC Sivaramakrishnan and staying in the amazing Bose-Einstein guest house on campus. There are banyan trees, monkeys, huge bats and deer everywhere! Ilya Sergey and I wandered up to the local student coffee shop and ran into Durwasa Chakraborty and Vimala Soundarapandian where we learnt more about PhD life out here. Durwasa has a fun blog post on "How Agentic LLMs Almost Destroyed My Academic Career" which came up as we chatted about how easy access was to frontier coding LLMs in India (answer: not very) and also my recent musings about AI provenance.

This is my second trip to India this year after the AI Impact Summit and the TESSERA hackathon in Delhi, but my first time back to Tamil Nadu in a while. I went to school here in PSBB back in 1990 and it's fantastic touring my old haunts again!

1 Antibotties and Mythos fallout

My antibotty immune system post got some discussion on Lobsters after the launch of Mythos. I thought that lcamtuf (of afl-fuzz fame) offered the most nuanced take:

I'm less convinced that this is a fundamental inflection point for software security. The bulk of vulnerability research - I'd not hesitate to say 80%+ - was already automated with fuzzers and related tools. Fuzzers discovered tens of thousands of bugs in critical software and put some visible strain on the OSS ecosystem. We now have a new tool that will uncover even more bugs, but I don't think this is how the world ends. When I came up with afl-fuzz, I could've put out a press release saying that it's too dangerous to share. We survived. So, there's a combination of impressive results and marketing at play.

As a tangent, I worry a lot more about enterprise security. We now have a tool that can be deployed on the cheap to pull on every door handle at any large organization; solving this attack surface is less tractable than finding software bugs, in part because, to stick to my tortured metaphor, the "handles" the bots can pull on include every human working at the company. Worse, with vuln discovery, there's this inherent symmetry: you can use the same tools to beat the bad guys to the punch. Using LLMs to secure an enterprise is considerably harder, in part because you need much higher decision fidelity to automatically stop bad things without getting in the way of normal work. -- lcamtuf, Lobste.rs, 2026

In connected news, I've also been triaging bugs in various projects that have come out of Mythos-related disclosures, but more on that in a few weeks when it's public...

1.1 FP Launchpad preview

The FP Launchpad kickoff is on Monday and headed by KC Sivaramakrishnan, with a goal of bringing more assurance to systems construction. The speaking lineup is cracking: we've got Rishiyur Nikhil on FP for hardware design at Bluespec (which our Cambridge CHERI CPU uses), Chester Rebeiro on trusted hardware on RISC-V Shakti, Krishnan Raghavan on verifiable governance with LLMs and Lean, Shriram Krishnamurthi on lightweight diagramming languages, Ilya Sergey on mechanizing a massive real world type checker with AI-assisted metatheory, and I'll be talking about TESSERA and FP.

Others have been passing through Chennai as well; I missed out on last week when Nik Swamy's talk on agentic proof-oriented programming happened. He's produced 200KLOC+ of verified F*/Pulse code in a few weeks using Copilot CLI (10x harder than a good coding CLI, if I'm being snarky), including specifications of concurrent data structures through to proofs of production C code. KC Sivaramakrishnan conjectured to me during the week that the same LLM coding agents driving remote vulnerability discovery could also drive proof generation. Nik's results are the first evidence I've seen that this might actually scale, and Ilya has also been speeding along with his Lean code on this topic.

2 TESSERA: half a petabyte to the cloud

This week in TESSERA was mainly spent rationalising the storage setup. We've now got access to AWS Open Data (yay!) to mirror the TESSERA embeddings just as we run out of space in the University cluster again, and so we've started a giant transfer of everything to the cloud. This will take around 7 days (half a petabyte is a fair bit of data!), but I can satisfyingly watch this stream using S3:

> s3cmd ls s3://tessera-embeddings/v1/global_0.1_degree_representation/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2017/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2018/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2019/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2020/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2024/
    DIR  s3://tessera-embeddings/v1/global_0.1_degree_representation/2025/

Last week's Ceph expansion to 1.4PB raw was well timed as without that headroom the Cambridge side of this transfer would have been pretty squeezed, but we're all good now. Since we're also short of GPUs for inference as always, Mark Elvers has also been investigating using Intel AMX for inference which turns out to be much cheaper on the cloud spot market.

Next week I'm adding geotessera support for multiple registries so clients can discover tiles from both Cambridge and AWS. This builds on the Zarr v3 restructuring and geo-embeddings convention work from the past few weeks. Once the sync finishes, I'll kick off the Zarr conversions directly on AWS as well to actually make this usable with geotessera 0.8.

3 Fun links

3.1 Tangled trust boundaries for AI code review

I've been thinking about reputation systems again and how to manage our Tangled PDSes, which have the nice property that our own git repos are hosted by us with the social metadata living elsewhere on ATProto. I'd like to extend our group's use of Tangled for better tracking of what's AI code or not, following my OCaml AI Disclosure proposal.

What I want are trust boundaries (within our group, or between individuals) where we can trade AI-generated code and help each other review it iteratively while remaining informed about where the code came from. I want to avoid what Avery Pennarun calls the sloth of too many review layers (every layer of approval makes you 10x slower in wall-clock time) by engineering quality in, taking advantage of OCaml's module interfaces to enforce boundaries. This connects to the provenance thinking I've been doing of late: if you can see at a glance what's AI-generated and who has reviewed what, we can direct limited human attention (and energy) much more efficiently without burning out.

This post by Meri also evaluates ATProto's proposed approach to permissioned data and figures out ownership transferability. What happens when a personal project (e.g. anil.recoil.org) grows into a community resource and the original creator wants to hand off control (say to eeg.cl.cam.ac.uk)? Lots of pesky details to get right here.

3.2 Chinese open source and India

Kevin Xu's definitive history of Chinese open source is a good read.

DeepSeek changed everything. Along with Qwen, Kimi, GLM, MiniMax, Stepfun, and even Unitree, Chinese open weight AI models took the world by storm. The traction even earned “open source” a shoutout in the Chinese Premier’s all-important address at the Two Sessions, with an explicit commitment to continue the growth of open source communities and ecosystems as part of the country’s next five-year plan. -- Kevin Xu, 2026

I'd love to see someone write an equivalent history for Indian open source; given my increasing time here it feels like there's a story to tell here as well, even though there hasn't been (to my knowledge) a breakout project from here that's had impact as big as Deepseek in recent years.

3.3 Computing's environmental responsibility

I'm going to be proposing a Dagstuhl seminar with Manuel Rigger and Sara Beery and ran across this previous Dagstuhl manifesto on "What is Computing's Responsibility?" from a 2025 workshop. It argues that computing contributes to planetary boundary overshoot rather than solving it, as ICT accounts for ~3% of global CO2 (comparable to aviation) and energy demand is accelerating. They do invoke Jevons' Paradox (that efficiency gains from computing lead to rebound effects) rather than net reductions.

I'm not sure I entirely agree with their conclusions though; our planetary computing work and the arxiv paper argue for pragmatic data-driven environmental policy-making which doesn't strike me as digital exceptionalism. The Dagstuhl summary is right that computing does have an overall responsibility problem, but I don't think the answer is to retreat from computational approaches to environmental activism. The upcoming third 'action PROPL' at PLDI in Boulder and the Rewilding the Web workshop in Edinburgh next month are both attempts at this.

References

[1]Madhavapeddy (2026). Connecting the dots for biodiversity action from the NAS/Royal Society Forum. 10.59350/dy7d3-hdt43

[2]Madhavapeddy (2026). A Proposal for Voluntary AI Disclosure in OCaml Code. 10.59350/cxypn-ysv27

[3]Madhavapeddy (2026). At the AI Impact Summit in Delhi: people, planet, progress. 10.59350/6vc5q-mbk23

[4]Madhavapeddy (2026). TESSERA now supports the Zarr geo-embeddings convention proposal. 10.59350/c3hrq-zsx02

[5]Madhavapeddy (2026). Streaming millions of TESSERA tiles over HTTP with Zarr v3. 10.59350/tk0er-ycs46

[6]Madhavapeddy (2025). Socially self-hosting source code with Tangled on Bluesky. 10.59350/r80vb-7b441

[7]Madhavapeddy (2025). Programming for the Planet at ICFP/SPLASH 2025. 10.59350/hasmq-vj807

[8]Madhavapeddy (2026). 1st TESSERA/CoRE hackathon at the Indian AI Summit. 10.59350/1na80-7ak85

[9]Madhavapeddy (2025). Four Ps for Building Massive Collective Knowledge Systems. 10.59350/418q4-gng78

[10]Ferris et al (2024). Planetary computing for data-driven environmental policy-making. arXiv. 10.48550/arXiv.2303.04501

[11]Madhavapeddy (2026). The Internet needs an antibotty immune system, stat. 10.59350/snnnf-asc02

[12]Madhavapeddy (2025). mlgpx is the first Tangled-hosted package available on opam. 10.59350/7267y-nj702

The FP Launchpad takes off at IIT MadrasApr 2026

A day at the launch of the FP Launchpad at IIT Madras, covering talks on hardware design, trusted execution on Shakti, verifiable Indian tax law, precise JIT analysis, AI-assisted Lean metatheory, constraint-based diagramming, and my own TESSERA talk.

Talk: Property Based Testing to Verify Pipelined CPU Hardware — Rishiyur S. NikhilApr 2026

FP Launchpad. Speaker: Rishiyur S. Nikhil — CTO and Co-founder, Bluespec

The Internet needs an antibotty immune system, statApr 2026

Anthropic's Mythos makes autonomous vulnerability chaining across devices a sudden reality, so I've been thinking about how digital 'antibotty' inoculation networks may be needed far sooner than I expected.

Can a CPU with Intel AMX Match a GPU for ML Inference?Apr 2026

Mark Elvers. GPU acceleration is the default assumption for machine learning inference. But Intel’s AMX (Advanced Matrix Extensions) may close the gap. AMX is built into recent Xeon processors, which are available from Azure. Can they compete with similarly priced GPU-based machines for the Tessera pipeline?

.plan-26-14: Tracking AI screen time and escaping to pen and paperApr 2026

Mythos Preview and the urgent need for internet immune systems, cognitive DDoS and AI screen time for code, a proposal for voluntary disclosure in OCaml, desktop focus and printed papers, iOS misery, GeoTessera 0.8, Ceph at 1.4PB, OCaml CI migration, hardware perf counters for OxCaml, and the FP Launchpad launch at IIT Madras.

A Proposal for Voluntary AI Disclosure in OCaml CodeApr 2026

Proposing a voluntary, machine-readable AI content disclosure scheme for OCaml spanning opam packages, dune, and per-module attributes, aligned with the W3C AI Content Disclosure vocabulary.

Talk: Agentic Proof-oriented Programming — Nik SwamyApr 2026

FP Launchpad. Speaker: Nik Swamy — Partner Researcher, RiSE group, Microsoft Research, Redmond

FP Launchpad KickoffMar 2026

FP Launchpad. .sp { width: 32px; height: 32px; border-radius: 50%; object-fit: cover; vertical-align: middle; margin-right: 0.4rem; border: 1px solid #e0e0e0; } FP Launchpad will hold its kickoff on Monday, April 13, 2026 at the IC&SR Building, IIT Madras. The day-long event features talks by leading research…

TESSERA now supports the Zarr geo-embeddings convention proposalMar 2026

Community feedback reshaped our Zarr store layout — years became a dimension, shards got bigger, and we retired the TESSERA-specific convention in favour of a shared geo-embeddings standard that also covers other models.

Streaming millions of TESSERA tiles over HTTP with Zarr v3Mar 2026

How we restructured TESSERA's geospatial embeddings from millions of individual numpy files into sharded Zarr v3 stores for efficient HTTP streaming, enabling everything from single-pixel mobile lookups to regional-scale analysis with just a couple of range requests.

Connecting the dots for biodiversity action from the NAS/Royal Society ForumMar 2026

Summary of the Nine Recommendations and Biodiversity Monitoring Standards Framework papers from the NAS/Royal Society US-UK Forum in summer 2025, and how they connect to my work on collective knowledge systems, TESSERA, and evidence synthesis.

At the AI Impact Summit in Delhi: people, planet, progressFeb 2026

Trip report from the Indian AI Impact Summit in New Delhi, covering the massive expo, a conversation with Yann LeCun, a hackathon/talk at IIT-Delhi, networking at the British High Commission, and reflections on the summit declaration's shift from safety to progress and equitable access.

1st TESSERA/CoRE hackathon at the Indian AI SummitFeb 2026

First TESSERA hackathon held at the Indian AI Impact Summit in Delhi, exploring integration with IIT-Delhi's CoRE Stack for geospatial analysis and testing TESSERA labeling workflows.

Four Ps for Building Massive Collective Knowledge SystemsNov 2025

Design principles for collective knowledge systems—permanence, provenance, permission, and placement—that enable robust networks for evidence-based decision making.

Programming for the Planet at ICFP/SPLASH 2025Oct 2025

Report on second Programming for the Planet workshop featuring papers on climate modeling, geospatial computation and planetary-scale collaborative systems.

mlgpx is the first Tangled-hosted package available on opamAug 2025

The Tangled git forge has recently gained support for CI, stacked pull requests and also the Dune build system can generate Tangled metadata easily now for OCaml packages hosted there.

Socially self-hosting source code with Tangled on BlueskyMar 2025

Self-host source code with Tangled on Bluesky for decentralized Git repositories.

TESSERA, a pixelwise geospatial foundation modelJan 2025

Planetary computing for data-driven environmental policy-makingMar 2024

Patrick Ferris, Michael Dales et al.

Planetary ComputingJan 2022