AoAH Day 20: Human language detection in native code, JS and wasm / Dec 2025

I took a break from yesterday's bot hacking to continue the HTML5 parsing in OCaml adventure. Vibespiling seems to have taken off, with Simon Willison reporting that there's a Swift version now as well. I got curious about how far I could push the vibespiling support: could we go beyond "just" parsing to also do complete HTML5 validation? The Nu HTML Validator is where I went next, which is a bunch of Java code used by the W3C to apply some seriously complex rules for HTML5 validation.

I decided to split this work into two days, and started with a simple problem: HTML5 validation includes the need for automated language detection to validate that the lang attribute on HTML elements matches the actual content. This is important for accessibility, as screen readers use language hints to select the correct pronunciation.

The W3C validator uses the Cybozu langdetect algorithm, so I vibespiled this into pure OCaml code as ocaml-langdetect. However, I decided to push harder by compiling this to three different backends: native code OCaml, JavaScript via js_of_ocaml and then into modern WebAssembly using wasm_of_ocaml. As a fun twist, I got the regression tests running as interactive "vibesplained" online notebooks that can do language detection in the browser.

Read full note... (1397 words)

# 20th Dec 2025agents, ai, aoah, llms, ocaml, wasm, web

2025 Advent of Agentic Humps: Building a useful O(x)Caml library every day / Dec 2025

Agentic programming has been getting a hilariously bad rap in the OCaml community recently, but it's definitely here to stay despite the security and legal concerns. I realised that to form a useful opinion on all this, I needed to really get into using Claude with OCaml for real outputs and not just toy code. So this holiday month, I'm going to release a new useful OCaml library per day until Christmas using Claude Code: the advent of agentic humps is here!

Read full note... (1525 words)

# 20th Dec 2025agents, ai, aoah, llms, ocaml, oxcaml

AoAH Day 19: Zulip bot framework to bring Vicuna the friendly camel back / Dec 2025

After building tomlt yesterday for TOML 1.1 parsing, I proceeded to integrate it with my group's Zulip chat server. I then discovered that Zulip actually uses Python's configparser INI format for its .zuliprc files rather than TOML, woops! But this gave me the perfect opportunity to attempt to quickly replicate the tomlt experience with a third config format codec library for Windows-style INI files as well.

So today I released both ocaml-zulip for Zulip API integration and ocaml-init for INI file codecs that are compatible with Pythonic features such as variable interpolation. Along the way, I developed a new regression test mechanism by writing a Zulip bot that tests the Zulip API using OCaml Zulip!

Read full note... (1591 words)

# 19th Dec 2025agents, ai, aoah, functional, llms, ocaml

AoAH Day 18: TOML 1.1 codecs directly from the spec and paper / Dec 2025

After getting my email interfaces automated yesterday, I turned my attention to Zulip integration. But first, I took a segway into another format that it required known as TOML. I noticed TOML 1.1.0 was released today and so I built ocaml-tomlt today.

What I wanted to explore with this library is whether I could use a coding agent to build a complex functional abstraction from scratch. After building yamlrw and yamlt, I settled on the technique Daniel Bünzli developed with jsont in his paper.

Read full note... (1136 words)

# 18th Dec 2025agents, ai, aoah, functional, llms, ocaml

AoAH Day 17: OCaml JMAP to plaster my painful email papercuts / Dec 2025

After building a JSON Pointer library yesterday, I proceeded to complete my OCaml JMAP library today so that I could wrestle my overflowing email inbox under control. Email is central to our digital lives and yet we have mostly ceded control to third-party services for something that unlocks access to almost any service we use.

Luckily, I've been self-hosting my own email for some time, so I do have full local access to about three decades worth of messages. However, I've been hampered by existing email clients which are mostly geared towards a temporal view and not towards easy programmability. So today's exercise has been to build an ocaml-jmap that lets me write little agentic programs to help me manage my ever overflowing inbox!

Read full note... (1707 words)

# 17th Dec 2025agents, ai, aoah, email, llms, ocaml

AoAH Day 16: Vibesplaining JSON Pointers using OCaml/Javascript / Dec 2025

After the successful HTML5 translation yesterday, I realised that I know next to nothing about HTML5 parsing and had leant extremely heavily on agentic coding. This approach has also been useful to help me explore diverse codebases in a combination of languages. So today I set my sights on understanding the pedagogical impacts of agentic coding a bit more. Can we use coding agents to help us iteratively explore complex protocols?

I decided to build a JMAP email client implementation in OCaml that I need for myself but with the added twist of seeing how I could engineer agents to "vibesplain" a protocol to me that I'm unfamiliar with.

OCaml has superb tooling to help with this; it can not only compile to efficient native code but also to JavaScript and WASM that runs standalone in the browser. I turned to my colleagues Jon Ludlam, Patrick Ferris and Arthur Wendling for help with the tooling, since they've been leading the way on scientific programming, visualisations and webcomponents in OCaml.

Today's work resulted in an ocaml-json-pointer (RFC6901) implementation along with an interactive notebook tutorial that bundles the entire OCaml compiler toolchain alongside it. There's even another one for Yaml just to illustrate how easy this is to replicate once we've built the first one.

Read full note... (1573 words)

# 16th Dec 2025agents, ai, aoah, email, llms, ocaml

AoAH Day 15: Porting a complete HTML5 parser and browser test suite / Dec 2025

After my success with Yaml 1.2 in pure OCaml, I found JustHTML, a new Python library for parsing HTML5 by Emil Stenström (via Simon Willison posting about it). Emil wrote JustHTML using coding agents as well, and then Simon ported it to JavaScript in a few hours.

My question, though, is how difficult is to go in the other direction and move towards a strongly typed interface like OCaml's. Could we ultimately distill down the extremely complex set of rules around parsing HTML all the way into a proof assistant like Lean, but hopping via OCaml and Haskell to provide convenient executable pitstops?

Today's task was to vibespile the Python into ocaml-html5rw, a pure OCaml HTML5 parser and serialiser that passes the browser test suite 100%.

Read full note... (1387 words)

# 15th Dec 2025agents, ai, aoah, llms, ocaml, web

AoAH Day 14: Debugging a Karakeep CLI against the live service / Dec 2025

With the Requests library under my belt, I finally got to what I actually need for myself: vibe coding OCaml library interfaces to my #selfhosted services that contain most of my data.

To start with, I use Karakeep across all my devices to bookmark things, and I'd like to be able to programmatically search through tags, for example by taking all outbound links from the blogs that I read and autosynching them with my remote service. Karakeep on the server side does some cool things like screenshot links and create local webarchives.

Unfortunately, Karakeep doesn't publish an OCaml interface. Fortunately, my new bestie Claude helped me build ocaml-karakeep without much input from me!

Read full note... (813 words)

# 14th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 13: Heckling an OCaml HTTP client from 50 implementations in 10 languages / Dec 2025

Now I had some prerequisite libraries, I turned my attention to having a batteries-included OCaml HTTP tool with features like request throttling and redirect loop detection. I've hacked on OCaml HTTP protocol libraries since 2011, but these higher level features weren't necessary in things like Docker's VPNKit. The problem with building one now is that there are loads of random quirks needed in real-world HTTP, which would take ages to figure out if I start from scratch.

Luckily, there's an entire ecology of HTTP clients built in other languages that could use for inspiration as well! Today, I gathered fifty open-source HTTP clients from a variety of other language ecosystems, and agentically synthesised a specification across all of them into one OCaml client using Eio.

I'm not sure what the collective verb is for a group of HTTP clients, so dubbed this whole process a 'heckle' of HTTP coding!

Read full note... (2006 words)

# 13th Dec 2025agents, ai, aoah, ecology, llms, ocaml

AoAH Day 12: Eio Connection pooling and event tracing / Dec 2025

After yesterday's library bonanza for HTTP cookie handling, I implemented a TCP/TLS connection pooling library. This is useful for an HTTP client as it provides the network-level mechanisms for keeping track of outgoing network connections by their DNS name. This allows for more flexible outgoing connection management without worrying about overloading remote endpoints.

For example, github.io has four A records:

> host github.io
github.io has address 185.199.110.153
github.io has address 185.199.109.153
github.io has address 185.199.108.153
github.io has address 185.199.111.153

With this new connection pooling library, my application should be able to connect to the github.io name and keep track of all the outgoing connections on the basis of it being called github.io and load balance the number of outgoing connections accordingly.

In the interests of exploring something new, I also decided to add in visualisation support to figure out what the library is spending its time on. I decided to generate self-contained visualisations, inspired by Jon Ludlam rediscovering the joy of SVGs yesterday!

Read full note... (792 words)

# 12th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 11: HTTP Cookies and vibing RFCs for breakfast / Dec 2025

I'm switching focus for a few days to build a complete HTTP(S) client to use in my literature downloader. This requires building a few support libraries before we build the full client, so I figured I'd dive in them in the next few days. First up is RFC6264 HTTP Cookie support. There are some excellent existing cookie libraries already on opam, notably http-cookie and ocaml-cookie, but I wasn't sure what their coverage of the protocol is, and there's no Eio serialisation support.

So I thought I'd have a go at a different approach today using agentic coding: can we synthesise a complete HTTP Cookie implementation purely from the RFC 6265 prose itself, and then differentially compare this OCaml implementation against the others? In theory, running a single test suite across all three libraries might be a good way of discovering how to improve the existing implementations. In the long-term, http-cookie is probably the upstream library I want to use, but I don't want to generate a giant diff against it today due to my groundrules of not disturbing other maintainers.

Read full note... (1631 words)

# 10th Dec 2025agents, ai, aoah, llms, ocaml, rfcs

AoAH Day 10: Building a TUI for Sortal using Mosaic / Dec 2025

After building a reasonably complete Sortal contacts manager and trying out OxCaml's Bonsai_term, I thought I'd have a second go at a terminal UI using a newly announced Mosaic library by Thibaut Mattio.

I first noticed this library when Thibaut presented his OCaml coding with AI talk at FunOCaml. It's quite different from Bonsai in that Mosaic uses OCaml's effects to provide a more direct-style API, and so seems worth experimenting with. So today's task is to port Sortal to use Mosaic and see what this terminal UI looks like!

Read full note... (646 words)

# 10th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 9: Adding a Bonsai terminal UI to Sortal / Dec 2025

After building a reasonably complete Sortal contacts manager, I decided to try to do a proper job of a terminal user interface. The first option for a modern UI is something that Yaron Minsky announced last week: bonsai_term, which also gives me a chance to dip into the OxCaml ecosystem with my agentic hacking!

Read full note... (1032 words)

# 9th Dec 2025agents, ai, aoah, llms, ocaml, oxcaml

AoAH Day 8: Building a contacts CLI manager with Sortal / Dec 2025

I've been accumulating a lot of contacts that I use to write cross references on my website. This works by using Cmarkit to parse my custom Markdown, and spot entries like [@sadiqj] and convert those into a full reference like Sadiq Jaffer.

Today, I want to build a full CLI application that stores all my contacts as Yaml files in my home directory using XDG conventions, and give me a simple search interface so I can quickly autocomplete these posts from my editor. I call this little application "Sortal".

Read full note... (915 words)

# 8th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 7: Converting between JSON and Yaml with yamlt / Dec 2025

After the excitement of building an entire Yaml 1.2 parser yesterday, I began to put it to use. Since I've been steadily converting all my JSON parsers to use jsont codecs, it would be convenient if a single JSONt codec definition could also convert that schema to Yaml. In theory, Yaml is a superset of JSON, except it isn't actually. But it's close enough that we should be able to build a yamlt library that can accept a jsont codec and spit out Yaml (or the reverse).

Read full note... (706 words)

# 7th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 6: Getting a Yaml 1.2 implementation in pure OCaml / Dec 2025

I did the palate cleanser of Bytesrw-eio yesterday for a good reason. Back in 2017, I wrote the OCaml Yaml bindings that a lot of projects use in the OCaml ecosystem, and I'm having trouble maintaining it.

Since Yaml is an monstrously convoluted spec, I opted back then to bind to the C libyaml using ocaml-ctypes. This was a good decision a decade ago, but maintaining this has been a nightmare due to the complexity of vendoring the C library, dealing with security issues there, and exposing a reasonable OCaml interface. The ocaml-yaml implementation also doesn't pass the full Yaml test suite.

And the worst thing is, I cannot find the motivation to figure out how Yaml really works. It's the world's worst serialisation format, with lots of corner cases and memory blowups inherent in how it works. So I decided to dive in and see if I could build a pure OCaml Yaml 1.2 implementation using bytesrw and the source spec.

TL;DR: it worked. It actually seems to have come up with a reasonable, pure OCaml implementation that I'm now using! It needs more validation and external code review, but this has been on my TODO list for years now.

Read full note... (897 words)

# 6th Dec 2025agents, ai, aoah, llms, ocaml, opam

AoAH Day 5: Bytesrw Eio adapters and automating opam metadata / Dec 2025

After the Claude exertions of yesterday, I needed something easier to cool my laptop down. I wanted to learn how to use another new library from Daniel Bünzli called Bytesrw, which provides composable byte stream readers and writers. It supplies ways to serialise Bytesrw to Unix file descriptors, so I figured I'd add in an Eio library for this. Along the way though, I was generating a growing number of opam packages, so I also learnt how to use Claude Skills to automate my opam metadata on Tangled as well.

Read full note... (928 words)

# 5th Dec 2025agents, ai, aoah, llms, ocaml, opam

AoAH Day 4: Going recursive with Claudeio for Claude / Dec 2025

By this point, I've got three useful libraries and my use of Claude is getting better. So naturally I want to automate my invocations of the claude CLI, but I hit a roadblock: there are no OCaml SDK bindings! However, there appear to be SDKs in Python, Go and many others. So today will involve having a stab at generating Claude OCaml bindings using Eio, so I can use Claude to write more OCaml!

Read full note... (745 words)

# 4th Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 3: XDG filesystem paths using Eio capabilities / Dec 2025

By Day 3 of the Advent of Agentic Humps, I now have the confidence to build a slightly more complex library that uses Eio to implement the XDG Base Directory Specification with a twist: let's use Eio capabilities to sandbox XDG paths by default.

Read full note... (1030 words)

# 3rd Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 2: Building an OCaml JSONFeed library / Dec 2025

Day 2 of the Advent of Agentic Humps dawns with building a slightly more complex library than before, via the JSONFeed specification that is a more modern version of Atom.

JSONfeed is a successor to Atom for website feeds, that has a nice informal specification about how to parse it. However, it also has a growing number of extensions which also need to be implemented somehow, as well as some informal rules to map RSS/Atom to JSONFeed.

There is no existing OCaml implementation that I could find, and I need it to integrate my website with Rogue Scholar more easily for permanent DOIs.

Read full note... (991 words)

# 2nd Dec 2025agents, ai, aoah, llms, ocaml

AoAH Day 1: Building a Base32 Crockford library in OCaml / Dec 2025

Let's start day 1 of the Advent of Agentic Humps with a gentle introduction to agentic coding. Firstly, I've chosen to exclusively use Claude Code for this since it's CLI driven. I tried some of the other Copilot and Cursor IDEs, but I just couldn't adjust to how busy the displays were.

With Claude, my setup first involved a custom devcontainer using Docker on a Linux host, and my local Mac laptop. I coordinate both of these via Git repositories hosted up at Tangled with a self-hosted knot.

Read full note... (572 words)

# 1st Dec 2025agents, ai, aoah, llms, ocaml

What I learnt at ICFP/SPLASH 2025 about OCaml, Hazel and FP / Oct 2025 / DOI

This is part 5 of a series of posts about ICFP 2025.

In addition to giving a bunch of talks about Docker, post-POSIX and planetary computing, the greatest fun at a huge conference like ICFP and SPLASH is seeing talks given by my students (they grow up so fast!) and collaborators, and generally floating around random talks trying to deceipher ancient Greek lambdas floating on a projector.

Read full note... (2017 words)

# 9th Oct 2025DOI: 10.59350/w1jvt-8qc58docker, functional, icfp, multicore, ocaml, oxcaml, programming

It's time to go post-POSIX at ICFP/SPLASH 2025 / Oct 2025 / DOI

This is part 4 of 5 of a series of posts about ICFP 2025.

After the excitement of presenting my Docker experience report, I went straight into giving a keynote talk at VMIL 2025. This talk bubbled up intrusive thoughts I've had resulting in the past 25 years: every system I've worked on, ranging from Xen to Docker on all seem to boil down to "make shared memory go fast".

I'd started to believe it was time for change in the way we approach IO about 12 years ago when I talked about wierd IO behaviour to a packed audience at FOSDEM, and now I believe it's even more true in 2025.

So I made one key argument to the audience: it's time to accept that standards such as POSIX are now holding back the development of good language runtimes, and we need to embrace the diversity of highly concurrent, shared-memory interfaces. And unfortunately, there's no portable subset amongst these, and so this may require a rethink of our frontend language interfaces as well.

The leaning tower of operating system layers
The leaning tower of operating system layers

Read full note... (852 words)

# 8th Oct 2025DOI: 10.59350/mch1m-8a030functional, icfp, iouring, ocaml, programming, tutorial

Jane Street and Docker on moving to OCaml 5 at ICFP/SPLASH 2025 / Oct 2025 / DOI

This is part 3 of 5 of a series of posts about ICFP 2025.

It's been about six years since we wrote the papers on parallelism and effects, and four years since we helped to release upstream OCaml 5.0 with multicore support, a mammoth effort that took up years of work for my OCaml Labs and Tarides crew. After the release came out, I focussed on building applications using OCaml 5 for my own work on planetary computing, for example on using the new features with the fledgling Eio library to get some experience with direct-style OCaml programming.

Meanwhile, big OCaml users have also been adapting their codebases to shift from OCaml 4 to 5. Jane Street have expanded their tools and compiler team and driven through their production switch to the multicore runtime, and Docker for Desktop is progressing with their switch to direct-style code via Eio for hundreds of millions of users! Read on to learn more...

Read full note... (1839 words)

# 7th Oct 2025DOI: 10.59350/3jkaq-d3398docker, icfp, multicore, ocaml, oxcaml, programming

Holding an OxCaml tutorial at ICFP/SPLASH 2025 / Oct 2025 / DOI

This is part 2 of 5 of a series of posts about ICFP 2025.

Several extensions to "oxidize" OCaml (Rust performancew with ML ergonomics!) have been developing rapidly in a fork called OxCaml. I helped an intrepid crew from Jane Street, IIT-M, Tarides, Brown and Cambridge pull together a really fun tutorial in ICFP 2025 that you can try out too! TL;DR: Work through the online slides, try the activities, and take the quiz to give us feedback.

Just click on the tutorial repo to get an online environment
Just click on the tutorial repo to get an online environment

Read full note... (1485 words)

# 6th Oct 2025DOI: 10.59350/55bc5-x4p75icfp, ocaml, oxcaml, programming, tutorial
Loading recent items...