.plan-26-08: At AI summit, Shriram's PL opinions, Zarr hacking

Anil Madhavapeddy

doi:10.59350/wvx71-0na91

.plan-26-08: At AI summit, Shriram's PL opinions, Zarr hacking

#tessera #ai #policy #india #zarr #teaching #ocaml #oxcaml[cite]·22 Feb 2026

TESSERA paper accepted at CVPR 2026, went to the AI Impact Summit, OCaml Zarr hacking, Shriram's talk on human factors of formal methods, and discussions on teaching OxCaml to agents.

Most of the week was taken up by hopping over to New Delhi to host a TESSERA hackathon and also to attend the AI Impact Summit. I redeyed back to host Shriram Krishnamurthi in Cambridge as he does his UK tour.

1 TESSERA

The best news of the week was that the TESSERA paper got accepted into CVPR 2026, out of a whopping 16000+ (!) submissions. This has been a giant amount of work for the whole team, but particular props to lead author and PhD student Frank Feng who has lead the whole effort with perseverance and a big smile the whole time!

It looks like CVPR is in Denver right before PLDI in Boulder (where I have an OxCaml tutorial to help hold) so I guess a chunk of my June will be spent in Colorado this year.

I also spent some time porting Mark Elvers OCaml Zarr implementation over to OxCaml, and also started adding Zarr zone support to geotessera so we can start converting the registry over.

2 Literature downloader

Robin Message and Sadiq Jaffer have restarted the literature downloader, and Robin has been manually classifying DOI prefixes into a two-level tree so we can easily dispatch download logic on a per-publisher basis (we have individual agreements via the University library with many publishers).

I'm surprised that DOIs are not a two-level tree to start with, as now with no central source of detailed DOI prefix metadata if a journal is sold to another publisher (as just happened with JFP), you either have to forward a portion of your DOI space or continue to resolve old journal article DOIs forever.

I also started migrating a lot of datasets over to our new Ceph cluster, including full syncs of GBIF, OpenAlex, and Crossref. This should set us up nicely for Shane's dashboard using locally hosted database for fast queries. On the queue once the storage settles is also iNaturalist open data, and to mirror the TESSERA embeddings to our Ceph so that local Cambridge users such as Andres Zuñiga-Gonzalez can access them more easily to do global analyses directly without a full local copy.

2.1 Figuring out what a URI really is

I also had a really fun discussion with Jon Sterling over High Table dinner at Pembroke about whether it was a good idea for me to get into Lean to start to specify the semantics of URI resolution.

Jon published a design for canonical URLs in Forester last year, and as I'm getting slightly obsessed with managing Atom, RSS and JSONFeeds at the moment (the /network view above is powered by this) this seems relevant to both that and also the literature downloader. In return for Jon's help, I will happily code up an OCaml monorepo script for him!

3 Shriram's PL opinions

Shriram passed through Cambridge on Friday on his UK lecture tour, so I leapt at the chance to host him after leaping off the redeye from India. I last chatted to Shriram at ICFP over the summer, and this time we got hear him speak about The Human Factors of Formal Methods in the Logic & Semantics seminar here in the CL.

The talk was fantastic and I can't recommend watching it enough; I have so many papers to follow up on now:

Perceptual learning; differentiation or enrichment (1955)
Sexing day-old chicks: A case study and expert systems analysis of a difficult perceptual-learning task (1987). The twist being it involves WWII era tanks as well.
Practicing versus inventing with contrasting cases: The effects of telling first on learning and transfer (2011)

3.1 Can LLMs learn the Stroop effect?

Shriram used the Stroop effect in his talk, which naturally led to Neel Krishnaswami and me wondering if LLMs could learn the Stroop effect too! I found one paper on this topic:

Moreover, as in humans, age is a key determinant of cognitive decline: “older” chatbots, like older patients, tend to perform worse on the MoCA test. These findings challenge the assumption that artificial intelligence will soon replace human doctors, as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients’ confidence. -- Age against the machine, 2024

I find the analogy between human age and 'model age' a bit incongruous, since of course models dont age -- there are improved training regimes. So the basic takeaway is that human cognitive impairment is decreasing as frontier LLMs advance.

3.2 Adversarial experiments to teach OxCaml?

When chatting about how to teach our agents OxCaml better, Shriram pointed me to his 2017 paper on Teaching Programming Languages by Experimental and Adversarial Thinking:

Its essence is to view programming language learning as a natural science activity, where students probe languages experimentally to understand both the normal and extreme behaviors of their features. [...] The approach is modular (with minimal dependencies), incremental (it can be introduced slowly into existing classes), interoperable (it does not need to push out other, existing methods), and complementary (since it introduces a new mode of thinking). -- J. Pombrio et al 2017

There's obvious parallels here to how the OCaml to OxCaml translation process works, whereby we typically add in mode annotations once the OCaml version is working. The only practical twist is that shifting to OxCaml also requires porting code to Base/Core as well, since the stdlib doesn't have mode annotations.

4 Fun Reading

I discovered that Saurabh Sharma taught OCaml as the first year course in IIT-Delhi for quite some time!
I enjoyed the Full Disclosure episode with Rutger Bregman as a followup to reading his book.
Nice episode of MCJ covering Turning Wasted Renewable Power into AI Compute with Rune. Lots of geeking about the physics of using all that power. Dave Scott also pointed out to me the reason we can't just build AI datacenters up north in Scotland where the renewable power is cheap and plentiful is because there's a requirement for a constant national electricity price.
Welcome Thomas Gazagnaire back to the blogosphere with a banging post about porting NASA's reusable flight software framework to OCaml.
Scientists cant agree on where the world's forests are: would be fun to cross-check the datasets mentioned here against TESSERA.

Extremely random feature: I added finger support to my website, so you can just do finger @anil.recoil.org (it is installed by default on macOS) to see my latest weekly.

5 Next Week

I need to get TESSERA Zarr in shape. This will fix so many infrastructure issues with using the embeddings! I'm also going to vibe code up a cool website for the project, using the feed aggregation logic from my own website and these Threejs Claude skills I just stumbled across.

I'm also off to WG2.8 the week after, so I need to figure out what functional programming goodness I will present there!

Thanks Jon Sterling for letting me look around Clare College and see the restored buildings; the scaffolding just came down!

References

[1]Madhavapeddy (2026). At the AI Impact Summit in Delhi: people, planet, progress. 10.59350/6vc5q-mbk23

[2]Feng et al (2025). TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis. arXiv. 10.48550/arXiv.2506.20380

[3]Madhavapeddy (2025). What I learnt at ICFP/SPLASH 2025 about OCaml, Hazel and FP. 10.59350/w1jvt-8qc58

[4]Jaffer et al (2025). AI-assisted Living Evidence Databases for Conservation Science. Cambridge Open Engage. 10.33774/coe-2025-rmsqf

[5]Madhavapeddy (2026). Happy new year and my fave readings of the year. 10.59350/y9f0e-raa45

[6]Madhavapeddy (2026). 1st TESSERA/CoRE hackathon at the Indian AI Summit. 10.59350/1na80-7ak85

[7]Gibson et al (1955). Perceptual learning: Differentiation or enrichment?. 10.1037/h0048826

[8]Biederman et al (1987). Sexing day-old chicks: A case study and expert systems analysis of a difficult perceptual-learning task.. 10.1037/0278-7393.13.4.640

[9]Schwartz et al (2011). Practicing versus inventing with contrasting cases: The effects of telling first on learning and transfer.. 10.1037/a0025140

[10]Dayan et al (2024). Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis. British Medical Journal Publishing Group. 10.1136/bmj-2024-081948

The FP Launchpad takes off at IIT MadrasApr 2026

A day at the launch of the FP Launchpad at IIT Madras, covering talks on hardware design, trusted execution on Shakti, verifiable Indian tax law, precise JIT analysis, AI-assisted Lean metatheory, constraint-based diagramming, and my own TESSERA talk.

At the AI Impact Summit in Delhi: people, planet, progressFeb 2026

Trip report from the Indian AI Impact Summit in New Delhi, covering the massive expo, a conversation with Yann LeCun, a hackathon/talk at IIT-Delhi, networking at the British High Commission, and reflections on the summit declaration's shift from safety to progress and equitable access.

1st TESSERA/CoRE hackathon at the Indian AI SummitFeb 2026

First TESSERA hackathon held at the Indian AI Impact Summit in Delhi, exploring integration with IIT-Delhi's CoRE Stack for geospatial analysis and testing TESSERA labeling workflows.

F Prime Looks a Lot Like MirageOS (but in C++)Feb 2026

Thomas Gazagnaire. Last week I attended the F Prime workshop at JPL, over 100 people, from CubeSat student teams to flagship mission engineers. I learned a lot about F Prime, and I kept noticing how familiar the concepts felt. This post is some of my thoughts on why. F Prime (GitHub) is NASA's open-source framework fo…

Happy new year and my fave readings of the yearJan 2026

My favourite books, podcasts and recommendations from 2025, covering moral ambition, maps, wolves, AI dystopias, geopolitics, Chennai history, and the best tech podcasts.

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and AnalysisNov 2025

Zhengpeng Feng, Clement Atzberger et al.

What I learnt at ICFP/SPLASH 2025 about OCaml, Hazel and FPOct 2025

Highlights from ICFP/SPLASH 2025 including Hazel live programming, OCaml AI tooling, formally verified GC, and cross-community discussions between Haskell and OCaml.

AI-assisted Living Evidence Databases for Conservation ScienceOct 2025

Sadiq Jaffer, William Morgan et al.

A Living IUCN Red List of the World's SpeciesSep 2025

Ongoing · PhD

Towards Forester 5.0 II: a design for canonical URLsMar 2025

Forester. One of the goals of Forester 5.0 is lightweight federation—the ability to have two forests participate in the same graph and therefore provide backlinks, etc. In a previous post (Towards Forester 5.0: a design for global identity), I talked about some of the difficulties that arise when dealing wi…

Conservation Evidence CopilotsJan 2024