Dear ACM, you're doing AI wrong but you can still get it right / Dec 2025 / DOI

There's outrage in the computer science community over a new feature rolled out by the ACM Digital Library that generates often inaccurate AI summaries. To make things worse, this is hidden behind a 'premier' paywall, so authors without access (for example, having graduated from University) can't even see what is being said.

Read full note... (2016 words)

# 18th Dec 2025DOI: 10.59350/c84g4-5zt58ai, policy, publishing

2025 Advent of Agentic Humps: Building a useful O(x)Caml library every day / Dec 2025

Agentic programming has been getting a hilariously bad rap in the OCaml community recently, but it's definitely here to stay despite the security and legal concerns. I realised that to form a useful opinion on all this, I needed to really get into using Claude with OCaml for real outputs and not just toy code. So this holiday month, I'm going to release a new useful OCaml library per day until Christmas using Claude Code: the advent of agentic humps is here!

Read full note... (1481 words)

# 18th Dec 2025agents, ai, aoah, llms, ocaml, oxcaml

AoAH Day 17: OCaml JMAP to plaster my painful email papercuts / Dec 2025

After building a JSON Pointer library yesterday, I proceeded to complete my OCaml JMAP library today so that I could wrestle my overflowing email inbox under control. Email is central to our digital lives and yet we have mostly ceded control to third-party services for something that unlocks access to almost any service we use.

Luckily, I've been self-hosting my own email for some time, so I do have full local access to about three decades worth of messages. However, I've been hampered by existing email clients which are mostly geared towards a temporal view and not towards easy programmability. So today's exercise has been to build an ocaml-jmap that lets me write little agentic programs to help me manage my ever overflowing inbox!

Read full note... (1707 words)

# 17th Dec 2025agents, ai, aoah, email, llms, ocaml

AoAH Day 16: Vibesplaining JSON Pointers using OCaml/Javascript / Dec 2025

After the successful HTML5 translation yesterday, I realised that I know next to nothing about HTML5 parsing and had leant extremely heavily on agentic coding. This approach has also been useful to help me explore diverse codebases in a combination of languages. So today I set my sights on understanding the pedagogical impacts of agentic coding a bit more. Can we use coding agents to help us iteratively explore complex protocols?

I decided to build a JMAP email client implementation in OCaml that I need for myself but with the added twist of seeing how I could engineer agents to "vibesplain" a protocol to me that I'm unfamiliar with.

OCaml has superb tooling to help with this; it can not only compile to efficient native code but also to JavaScript and WASM that runs standalone in the browser. I turned to my colleagues Jon Ludlam, Patrick Ferris and Arthur Wendling for help with the tooling, since they've been leading the way on scientific programming, visualisations and webcomponents in OCaml.

Today's work resulted in an ocaml-json-pointer (RFC6901) implementation along with an interactive notebook tutorial that bundles the entire OCaml compiler toolchain alongside it. There's even another one for Yaml just to illustrate how easy this is to replicate once we've built the first one.

Read full note... (1573 words)

# 16th Dec 2025agents, ai, aoah, email, llms, ocaml

AoAH Day 15: Porting a complete HTML5 parser and browser test suite / Dec 2025

After my success with Yaml 1.2 in pure OCaml, I found JustHTML, a new Python library for parsing HTML5 by Emil Stenström (via Simon Willison posting about it). Emil wrote JustHTML using coding agents as well, and then Simon ported it to JavaScript in a few hours.

My question, though, is how difficult is to go in the other direction and move towards a strongly typed interface like OCaml's. Could we ultimately distill down the extremely complex set of rules around parsing HTML all the way into a proof assistant like Lean, but hopping via OCaml and Haskell to provide convenient executable pitstops?

Today's task was to vibespile the Python into ocaml-html5rw, a pure OCaml HTML5 parser and serialiser that passes the browser test suite 100%.

Read full note... (1387 words)

# 15th Dec 2025agents, ai, aoah, llms, ocaml, web

AoAH Day 14: Debugging a Karakeep CLI against the live service / Dec 2025

With the Requests library under my belt, I finally got to what I actually need for myself: vibe coding OCaml library interfaces to my #selfhosted services that contain most of my data.

To start with, I use Karakeep across all my devices to bookmark things, and I'd like to be able to programmatically search through tags, for example by taking all outbound links from the blogs that I read and autosynching them with my remote service. Karakeep on the server side does some cool things like screenshot links and create local webarchives.

Unfortunately, Karakeep doesn't publish an OCaml interface. Fortunately, my new bestie Claude helped me build ocaml-karakeep without much input from me!

Read full note... (813 words)

# 14th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 13: Heckling an OCaml HTTP client from 50 implementations in 10 languages / Dec 2025

Now I had some prerequisite libraries, I turned my attention to having a batteries-included OCaml HTTP tool with features like request throttling and redirect loop detection. I've hacked on OCaml HTTP protocol libraries since 2011, but these higher level features weren't necessary in things like Docker's VPNKit. The problem with building one now is that there are loads of random quirks needed in real-world HTTP, which would take ages to figure out if I start from scratch.

Luckily, there's an entire ecology of HTTP clients built in other languages that could use for inspiration as well! Today, I gathered fifty open-source HTTP clients from a variety of other language ecosystems, and agentically synthesised a specification across all of them into one OCaml client using Eio.

I'm not sure what the collective verb is for a group of HTTP clients, so dubbed this whole process a 'heckle' of HTTP coding!

Read full note... (2006 words)

# 13th Dec 2025agents, ai, aoah, ecology, llms, ocaml

AoAH Day 12: Eio Connection pooling and event tracing / Dec 2025

After yesterday's library bonanza for HTTP cookie handling, I implemented a TCP/TLS connection pooling library. This is useful for an HTTP client as it provides the network-level mechanisms for keeping track of outgoing network connections by their DNS name. This allows for more flexible outgoing connection management without worrying about overloading remote endpoints.

For example, github.io has four A records:

> host github.io
github.io has address 185.199.110.153
github.io has address 185.199.109.153
github.io has address 185.199.108.153
github.io has address 185.199.111.153

With this new connection pooling library, my application should be able to connect to the github.io name and keep track of all the outgoing connections on the basis of it being called github.io and load balance the number of outgoing connections accordingly.

In the interests of exploring something new, I also decided to add in visualisation support to figure out what the library is spending its time on. I decided to generate self-contained visualisations, inspired by Jon Ludlam rediscovering the joy of SVGs yesterday!

Read full note... (792 words)

# 12th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 11: HTTP Cookies and vibing RFCs for breakfast / Dec 2025

I'm switching focus for a few days to build a complete HTTP(S) client to use in my literature downloader. This requires building a few support libraries before we build the full client, so I figured I'd dive in them in the next few days. First up is RFC6264 HTTP Cookie support. There are some excellent existing cookie libraries already on opam, notably http-cookie and ocaml-cookie, but I wasn't sure what their coverage of the protocol is, and there's no Eio serialisation support.

So I thought I'd have a go at a different approach today using agentic coding: can we synthesise a complete HTTP Cookie implementation purely from the RFC 6265 prose itself, and then differentially compare this OCaml implementation against the others? In theory, running a single test suite across all three libraries might be a good way of discovering how to improve the existing implementations. In the long-term, http-cookie is probably the upstream library I want to use, but I don't want to generate a giant diff against it today due to my groundrules of not disturbing other maintainers.

Read full note... (1631 words)

# 10th Dec 2025agents, ai, aoah, llms, ocaml, rfcs

AoAH Day 10: Building a TUI for Sortal using Mosaic / Dec 2025

After building a reasonably complete Sortal contacts manager and trying out OxCaml's Bonsai_term, I thought I'd have a second go at a terminal UI using a newly announced Mosaic library by Thibaut Mattio.

I first noticed this library when Thibaut presented his OCaml coding with AI talk at FunOCaml. It's quite different from Bonsai in that Mosaic uses OCaml's effects to provide a more direct-style API, and so seems worth experimenting with. So today's task is to port Sortal to use Mosaic and see what this terminal UI looks like!

Read full note... (646 words)

# 10th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 9: Adding a Bonsai terminal UI to Sortal / Dec 2025

After building a reasonably complete Sortal contacts manager, I decided to try to do a proper job of a terminal user interface. The first option for a modern UI is something that Yaron Minsky announced last week: bonsai_term, which also gives me a chance to dip into the OxCaml ecosystem with my agentic hacking!

Read full note... (1032 words)

# 9th Dec 2025agents, ai, aoah, llms, ocaml, oxcaml

Publish, Review, Curate to upend scholarly publishing / Dec 2025 / DOI

I was not expecting to find a bunch of activist librarians at the lovely spires of King's College Chapel last week, but I was very glad that I did! I gave a talk to the Confederation of Open Access Repositories group that was having a meeting about "Turning scholarly publishing on its head". Luckily, I had my budding Four Ps for Collective Intelligence fresh on my brain, so I discussed it with the assembled librarians. The crowd was a really interesting mix of the open research team at Cambridge, their French equivalents in CNRS, academic researchers like myself and Albert Cardona interested in non-traditional outputs, and of course digital librarians from all over the world.

Read full note... (1562 words)

# 8th Dec 2025DOI: 10.59350/fpc9w-ccj82ai, atproto, networks, opensource, publishing

AoAH Day 8: Building a contacts CLI manager with Sortal / Dec 2025

I've been accumulating a lot of contacts that I use to write cross references on my website. This works by using Cmarkit to parse my custom Markdown, and spot entries like [@sadiqj] and convert those into a full reference like Sadiq Jaffer.

Today, I want to build a full CLI application that stores all my contacts as Yaml files in my home directory using XDG conventions, and give me a simple search interface so I can quickly autocomplete these posts from my editor. I call this little application "Sortal".

Read full note... (915 words)

# 8th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 7: Converting between JSON and Yaml with yamlt / Dec 2025

After the excitement of building an entire Yaml 1.2 parser yesterday, I began to put it to use. Since I've been steadily converting all my JSON parsers to use jsont codecs, it would be convenient if a single JSONt codec definition could also convert that schema to Yaml. In theory, Yaml is a superset of JSON, except it isn't actually. But it's close enough that we should be able to build a yamlt library that can accept a jsont codec and spit out Yaml (or the reverse).

Read full note... (706 words)

# 7th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 6: Getting a Yaml 1.2 implementation in pure OCaml / Dec 2025

I did the palate cleanser of Bytesrw-eio yesterday for a good reason. Back in 2017, I wrote the OCaml Yaml bindings that a lot of projects use in the OCaml ecosystem, and I'm having trouble maintaining it.

Since Yaml is an monstrously convoluted spec, I opted back then to bind to the C libyaml using ocaml-ctypes. This was a good decision a decade ago, but maintaining this has been a nightmare due to the complexity of vendoring the C library, dealing with security issues there, and exposing a reasonable OCaml interface. The ocaml-yaml implementation also doesn't pass the full Yaml test suite.

And the worst thing is, I cannot find the motivation to figure out how Yaml really works. It's the world's worst serialisation format, with lots of corner cases and memory blowups inherent in how it works. So I decided to dive in and see if I could build a pure OCaml Yaml 1.2 implementation using bytesrw and the source spec.

TL;DR: it worked. It actually seems to have come up with a reasonable, pure OCaml implementation that I'm now using! It needs more validation and external code review, but this has been on my TODO list for years now.

Read full note... (897 words)

# 6th Dec 2025agents, ai, aoah, llms, ocaml, opam

AoAH Day 5: Bytesrw Eio adapters and automating opam metadata / Dec 2025

After the Claude exertions of yesterday, I needed something easier to cool my laptop down. I wanted to learn how to use another new library from Daniel Bünzli called Bytesrw, which provides composable byte stream readers and writers. It supplies ways to serialise Bytesrw to Unix file descriptors, so I figured I'd add in an Eio library for this. Along the way though, I was generating a growing number of opam packages, so I also learnt how to use Claude Skills to automate my opam metadata on Tangled as well.

Read full note... (928 words)

# 5th Dec 2025agents, ai, aoah, llms, ocaml, opam

AoAH Day 4: Going recursive with Claudeio for Claude / Dec 2025

By this point, I've got three useful libraries and my use of Claude is getting better. So naturally I want to automate my invocations of the claude CLI, but I hit a roadblock: there are no OCaml SDK bindings! However, there appear to be SDKs in Python, Go and many others. So today will involve having a stab at generating Claude OCaml bindings using Eio, so I can use Claude to write more OCaml!

Read full note... (745 words)

# 4th Dec 2025agents, ai, aoah, llms, ocaml

Foundational AI for Ecosystem Resilience workshop / Dec 2025 / DOI

As part of the ARIA Engineering Ecosystem Resilience program, we've been convening a series of workshops here at the Cambridge Conservation Initiative to explore the potential of combining two very radically different approaches to modeling. Joe Millard wrote this to frame the discussion:

Ecology and ecosystems are inherently agent-based. In other words, patterns in biodiversity in both space and time emerge as a function of the local interaction of many types of individual organisms, both with each other and with their abiotic environment.

Generative agent-based models, such as Concordia enable the simulation of multiple interacting large language models. Given LLMs now possess significant ecological knowledge, it is possible that models such as Concordia will enable the meaningful simulation of ecological interactions.

The biotic and abiotic environment in which ecological agents interact in a given ecosystem is likely measurable via remotely monitored earth-observation data. Raw EO data, however, is unwieldy, containing large quantities of information that can be difficult to interpret. Earth-system models, such as TESSERA or AlphaEarth are foundational AI models which compress large quantities of EO data into "embeddings", unambiguous and consistent digital representations of the structure of the Earth’s surface. -- Foundational AI to forecast ecosystem resilience, J. Millard, A. Pili, K. Berthon, R. Fletcher, L. Dicks

We held two separate workshops to explore this; one for a deep-dive into the technical details, and another to invite conservation practitioners to drive our modeling direction in a realistic and positive direction. This was all lead by Lynn Dicks and the stellar organisation of Joe Millard, Katherine Berthon, Arman Pili and Rob Fletcher, with input from me, Srinivasan Keshav and David Coomes. I'll go into each talk next, or you can watch the playlist yourself.

Read full note... (1500 words)

# 3rd Dec 2025DOI: 10.59350/26hy6-rry61ai, aria, ecology, nature, sensing, tessera

AoAH Day 3: XDG filesystem paths using Eio capabilities / Dec 2025

By Day 3 of the Advent of Agentic Humps, I now have the confidence to build a slightly more complex library that uses Eio to implement the XDG Base Directory Specification with a twist: let's use Eio capabilities to sandbox XDG paths by default.

Read full note... (1030 words)

# 3rd Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 2: Building an OCaml JSONFeed library / Dec 2025

Day 2 of the Advent of Agentic Humps dawns with building a slightly more complex library than before, via the JSONFeed specification that is a more modern version of Atom.

JSONfeed is a successor to Atom for website feeds, that has a nice informal specification about how to parse it. However, it also has a growing number of extensions which also need to be implemented somehow, as well as some informal rules to map RSS/Atom to JSONFeed.

There is no existing OCaml implementation that I could find, and I need it to integrate my website with Rogue Scholar more easily for permanent DOIs.

Read full note... (991 words)

# 2nd Dec 2025agents, ai, aoah, llms, ocaml

The AI French Connection to the Practice of Science / Dec 2025 / DOI

Our neighbours France and the UK announced a Franco-British AI collaboration a few months ago dubbed the Entente CordIAle. Last week we held a couple of days of workshops with our Oxford and French buddies deep diving into details of what a partnership might actually involve; a particular pleasure with France given my group's long history of working with Inria on OCaml and other open source projects.

I sprinted back from Birmingham to speak about our research on connecting the dots on terrestrial life via TESSERA, LIFE, food and the scholarly literature. The other talks were vastly ambitious, best summarised in the Scriberia visual:

The scriberia summary of the Entente Cordiale Workshop (credit: AI@CAM). My talk is on the bottom left!
The scriberia summary of the Entente Cordiale Workshop (credit: AI@CAM). My talk is on the bottom left!

Read full note... (710 words)

# 1st Dec 2025DOI: askdv-e9z43ai, evidence, llms

AoAH Day 1: Building a Base32 Crockford library in OCaml / Dec 2025

Let's start day 1 of the Advent of Agentic Humps with a gentle introduction to agentic coding. Firstly, I've chosen to exclusively use Claude Code for this since it's CLI driven. I tried some of the other Copilot and Cursor IDEs, but I just couldn't adjust to how busy the displays were.

With Claude, my setup first involved a custom devcontainer using Docker on a Linux host, and my local Mac laptop. I coordinate both of these via Git repositories hosted up at Tangled with a self-hosted knot.

Read full note... (572 words)

# 1st Dec 2025agents, ai, aoah, llms, ocaml

Four Ps for Building Massive Collective Knowledge Systems / Nov 2025 / DOI

I've been building some big collective knowledge systems recently, both for scholarly literature or to power large-scale observational foundation models. While the modalities of knowledge in these systems are very different, they share a common set of design principles I've noticed while building individual pieces. A good computer architecture is one that can be re-used, and I've been mulling over what this exactly is for some time.

I found the perfect place to codify this at the ARIA Workshop on Collective Flourishing that Sadiq Jaffer and I attended in Birmingham last week. I posit there are "4 P's" needed for any collective knowledge system to be robust and accurate: permanence, provenance, permission and placement. If these properties exist throughout our knowledge graph, we can make robust networks for rapid evidence-based decision making. They also form a dam against the wave of agentic AI that is going to dominate the Internet next year in a big way.

Will building these collective knowledge systems be a transformative capability for human society? Hot on the heels of COP30 concluding indecisively, I've been getting excited by decision making towards biodiversity going down a more positive path in IPBES. We could empower decisionmakers at all scales (local, country, international) to be able to move five times faster on actions about global species extinctions, unsustainable wildlife trade and food security, while rapidly assimilating extraordinarily complex evidence chains. I'll talk about this more while explaining the principles...

Read full note... (4169 words)

# 23rd Nov 2025DOI: 10.59350/418q4-gng78ai, biodiversity, networking, policy, spatial

GeoTessera 0.7 out with efficient sampling and Zarr support / Nov 2025 / DOI

I've just released geotessera 0.7 to pypi for our TESSERA geospatial foundation model, following on from the first release earlier this year. To recap:

TESSERA is a foundation model for Earth observation that processes Sentinel-1 and Sentinel-2 satellite data to generate representation (embedding) maps. It compresses a full year of Sentinel-1 and Sentinel-2 data and learns useful temporal-spectral features. -- Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

With this new release, there's convenient documentation to show how you can freely access 150TB+ of CC-BY-licensed embeddings of the earth's surface. We've been getting a growing influx of requests for diverse regions of the world, and so our focus for the next few months is attaining complete coverage of our v1 model on the whole planet.

Read full note... (1228 words)

# 17th Nov 2025DOI: 10.59350/nagwp-tnw89ai, satellite, spatial, tessera

On the path to the UK/India AI Summit with OpenUK and the ATI / Nov 2025 / DOI

There's a buzz forming around the upcoming AI Impact Summit next year in India, following up the AI Safety Summit here and the France Action Summit earlier this year. I headed down to a couple of events in London this week to help set the agenda, particularly around the importance of FAIR and ethical AI for sustainability being on the political agenda.

Read full note... (1433 words)

# 11th Nov 2025DOI: 10.59350/x6rea-1g262ai, india, opensource, policy, uk, zfs
Loading recent items...