Most of the week was taken up by hopping over to New Delhi to host a TESSERA hackathon and also to attend the AI Impact Summit. I redeyed back to host Shriram Krishnamurthi in Cambridge as he does his UK tour.
1 TESSERA
The best news of the week was that the TESSERA paper got accepted into CVPR 2026, out of a whopping 16000+ (!) submissions. This has been a giant amount of work for the whole team, but particular props to lead author and PhD student Frank Feng who has lead the whole effort with perseverance and a big smile the whole time!
It looks like CVPR is in Denver right before PLDI in Boulder (where I have an OxCaml tutorial to help hold) so I guess a chunk of my June will be spent in Colorado this year.
I also spent some time porting Mark Elvers OCaml Zarr implementation over to OxCaml, and also started adding Zarr zone support to geotessera so we can start converting the registry over.
2 Literature downloader
Robin Message and Sadiq Jaffer have restarted the literature downloader, and Robin has been manually classifying DOI prefixes into a two-level tree so we can easily dispatch download logic on a per-publisher basis (we have individual agreements via the University library with many publishers).
I'm surprised that DOIs are not a two-level tree to start with, as now with no central source of detailed DOI prefix metadata if a journal is sold to another publisher (as just happened with JFP), you either have to forward a portion of your DOI space or continue to resolve old journal article DOIs forever.
I also started migrating a lot of datasets over to our new Ceph cluster, including full syncs of GBIF, OpenAlex, and Crossref. This should set us up nicely for Shane's dashboard using locally hosted database for fast queries. On the queue once the storage settles is also iNaturalist open data, and to mirror the TESSERA embeddings to our Ceph so that local Cambridge users such as Andres Zuñiga-Gonzalez can access them more easily to do global analyses directly without a full local copy.
2.1 Figuring out what a URI really is
I also had a really fun discussion with Jon Sterling over High Table dinner at Pembroke about whether it was a good idea for me to get into Lean to start to specify the semantics of URI resolution.
Jon published a design for canonical URLs in Forester last year, and as I'm getting slightly obsessed with managing Atom, RSS and JSONFeeds at the moment (the /network view above is powered by this) this seems relevant to both that and also the literature downloader. In return for Jon's help, I will happily code up an OCaml monorepo script for him!
3 Shriram's PL opinions

The talk was fantastic and I can't recommend watching it enough; I have so many papers to follow up on now:
- Perceptual learning; differentiation or enrichment (1955)
- Sexing day-old chicks: A case study and expert systems analysis of a difficult perceptual-learning task (1987). The twist being it involves WWII era tanks as well.
- Practicing versus inventing with contrasting cases: The effects of telling first on learning and transfer (2011)
3.1 Can LLMs learn the Stroop effect?
Shriram used the Stroop effect in his talk, which naturally led to Neel Krishnaswami and me wondering if LLMs could learn the Stroop effect too! I found one paper on this topic:
Moreover, as in humans, age is a key determinant of cognitive decline: “older” chatbots, like older patients, tend to perform worse on the MoCA test. These findings challenge the assumption that artificial intelligence will soon replace human doctors, as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients’ confidence. -- Age against the machine, 2024
I find the analogy between human age and 'model age' a bit incongruous, since of course models dont age -- there are improved training regimes. So the basic takeaway is that human cognitive impairment is decreasing as frontier LLMs advance.
3.2 Adversarial experiments to teach OxCaml?
When chatting about how to teach our agents OxCaml better, Shriram pointed me to his 2017 paper on Teaching Programming Languages by Experimental and Adversarial Thinking:
Its essence is to view programming language learning as a natural science activity, where students probe languages experimentally to understand both the normal and extreme behaviors of their features. [...] The approach is modular (with minimal dependencies), incremental (it can be introduced slowly into existing classes), interoperable (it does not need to push out other, existing methods), and complementary (since it introduces a new mode of thinking). -- J. Pombrio et al 2017
There's obvious parallels here to how the OCaml to OxCaml translation process works, whereby we typically add in mode annotations once the OCaml version is working. The only practical twist is that shifting to OxCaml also requires porting code to Base/Core as well, since the stdlib doesn't have mode annotations.
4 Fun Reading

- I discovered that Saurabh Sharma taught OCaml as the first year course in IIT-Delhi for quite some time!
- I enjoyed the Full Disclosure episode with Rutger Bregman as a followup to reading his book.
- Nice episode of MCJ covering Turning Wasted Renewable Power into AI Compute with Rune. Lots of geeking about the physics of using all that power. Dave Scott also pointed out to me the reason we can't just build AI datacenters up north in Scotland where the renewable power is cheap and plentiful is because there's a requirement for a constant national electricity price.
- Welcome Thomas Gazagnaire back to the blogosphere with a banging post about porting NASA's reusable flight software framework to OCaml.
- Scientists cant agree on where the world's forests are: would be fun to cross-check the datasets mentioned here against TESSERA.

finger @anil.recoil.org (it is installed by default on macOS) to see my latest weekly.
5 Next Week
I need to get TESSERA Zarr in shape. This will fix so many infrastructure issues with using the embeddings! I'm also going to vibe code up a cool website for the project, using the feed aggregation logic from my own website and these Threejs Claude skills I just stumbled across.
I'm also off to WG2.8 the week after, so I need to figure out what functional programming goodness I will present there!

