.plan-26-21: Pint of Science, OxCaml dissertations, and TESSERA 1.1 stirring

After the BBC/ITV media run we had a talk at Pint of Science, two cracking Part II dissertations on CE/TESSERA and OxCaml vector RAG, and put TESSERA v1.1 weights on HuggingFace.

Most of my week has been dominated by being on the BBC/ITV/etc and so just a short weeknote this time around! Aside from all the media coverage, there have been some fun events and hacking going on.

1 Pint of Science Cambridge about TESSERA

Sadiq Jaffer spoke at the Cambridge Pint of Science, which was held in the congenial surroundings of the Station Tavern near the train station. The room itself was quite small and long, and was hugely noisy due to being right beside the actual pub, but it was a wonderfully quirky and informal way to present science to a generalist audience. The specific theme of the event we attended was "How is AI accelerating science?", and other speakers ranged from Moe Vali talking about ultrasound detection of Adenomyosis to Anna Breger talking about reconstructing medieval music from ancient transcripts! Sadiq had the pressure piled on being the last speaker at a point when the audience had mostly had their third pint:

As always though, Sadiq pulled off explaining these complex ideas just brilliantly! His slides were some of the best I've seen yet, as it was done with a combination of his expert knowledge about TESSERA and the use of Claude Design (the latest iteration of an AI vision model from Anthropic).

Gorgeous slides about how TESSERA works
Gorgeous slides about how TESSERA works

You can also see another iteration of Sadiq speaking about TESSERA in this news piece on ITV the day after this talk.

2 Cambridge Part II projects due in

The Cambridge undergrads had to get their Part II dissertation projects in, and in particular I really enjoyed two (which I couldn't directly mark/supervise as I'm on sabbatical, but I cheered on from the sidelines).

2.1 Conservation actions using multimodal foundation models

The first is Radhika Iyer on applying TESSERA to test the effectiveness of conservation actions which was a very bold foray into the unknown that she did a brilliant job of writing up. I'll talk more about this one in a while, but the execution in combining two very separate (and cutting edge) projects into a cohesive thesis was remarkable work.

2.2 Hybrid vector databases in O(x)Caml

The other was as ambitious but in a totally different dimension: the first OxCaml project we've supervised here with an undergraduate, with Ryan Gibb keeping a close eye. Oliver Fogelin went off and built a hybrid vector RAG database in OxCaml, including embedding a big stack of arxiv papers and showing a beautiful browser visualisation of the embeddings in an interactive way. You can try it for yourself on his site.

A rapt audience for Oli's demo
A rapt audience for Oli's demo

Try out Oli's explorer for yourself by clicking on the image!
Try out Oli's explorer for yourself by clicking on the image!

We had an entertaining session in my office with Oli demonstrating it to us. I almost got it to run out of the box via oix --toolchain=oxcaml --with=https://github.com/olifog/gvecdb-ocaml gvecdb-server, except we got scuppered by a few (valid) build failures in dependent libraries that he had patched locally but not committed. It's very close to being able to work, though!

This is some of the most impressive hacking I've seen in a Part II project in a while, and I'm very much going to try to replace my shaky website search with Oli's code when things calm down a bit this summer!

3 TESSERA v1.1 updates

We've had a flurry of activity on preparing a TESSERA 1.1 model update with Frank Feng pulling out all the stops on training with help from Nvidia. This will be the first updated iteration of the TESSERA mode that we've issued. More on the cool new features next week, but in the meanwhile some headlines are:

References

[1]Feng et al (2025). TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis. arXiv. 10.48550/arXiv.2506.20380