AoAH Day 6: Getting a Yaml 1.2 implementation in pure OCaml / Dec 2025
I did the palate cleanser of
Since Yaml is an monstrously convoluted spec, I opted back then to bind to the C libyaml using
And the worst thing is, I cannot find the motivation to figure out how Yaml really works. It's the world's worst serialisation format, with lots of corner cases and memory blowups inherent in how it works. So I decided to dive in and see if I could build a pure OCaml Yaml 1.2 implementation using bytesrw and the source spec.
TL;DR: it worked. It actually seems to have come up with a reasonable, pure OCaml implementation that I'm now using! It needs more validation and external code review, but this has been on my TODO list for years now.
Approach
As with previous projects, I carefully set up the source directory using all the previous libraries to act as style guides, along with the source code to the key dependency of bytesrw and the associated
In a slight twist, I also instructed the agent to look at the Git history for ocaml-yaml, since there have been a decade of bug reports about bad ways of interpreting yaml reported, including one from Martin Jambon that I haven't gotten around to looking at yet for the main library. There have also been frequent requests for a pure OCaml version to make cross compilation to iOS easier, as well as compilation on OpenBSD (failing due to the vendoring of the C library), and of course C memory leaks and spec violations. All of these would disappear with a spec-compliant implementation, so I fed the agent these to use as regression examples from user bug reports.

Tests
The key to the development loop succeeding was adding integration to an external test suite. Yaml has the yaml-test-suite which has thousands of little examples of good and bad specs, along with the expected outputs in both JSON and Yaml. For example, test 36F6 takes this input yaml:
plain: a
b
c
and expects this error "Multiline plain scalar with empty line" and the following parser events in a custom DSL exposed by the test suite:
+MAP
=VAL :plain
=VAL :a b\nc
-MAP
-DOC
-STR
So I instructed the agent to also build up a test suite DSL that could output in a format compatible with the test suite. A simple custom loader and JSON converter then output in the exact format required by the checked in test suite files.
I also added in some of the pathological tests from my original OCaml yaml, including an implementation of the Yaml bomb:
a: &a ["lol","lol","lol","lol","lol","lol","lol","lol","lol"]
b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a]
c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b]
d: &d [*c,*c,*c,*c,*c,*c,*c,*c,*c]
e: &e [*d,*d,*d,*d,*d,*d,*d,*d,*d]
f: &f [*e,*e,*e,*e,*e,*e,*e,*e,*e]
g: &g [*f,*f,*f,*f,*f,*f,*f,*f,*f]
h: &h [*g,*g,*g,*g,*g,*g,*g,*g,*g]
i: &i [*h,*h,*h,*h,*h,*h,*h,*h,*h]
This simple Yaml file exponentially allocates lols into a billion laughs, so I prompted the agent to also add in depth tracking to terminate parsing after a configurable number of nodes or depths have been crossed.
Results
With the test structure setup, it was plain sailing. I prompted the agent to maintain a strict separation between a pure Yaml parser (with a single dependency on bytesrw), and then have Unix and Eio converters using the Bytesrw unix one, or the bytesrw-eio one I coded up yesterday.
As a useful aid to debugging, I also prompted the test suite to output nice HTML, which you can browse here. It's convenient to have a rendered version of the entire test suite!
I also coded up a quick core-bench library to differentially test both the original Yaml library and this new one on the full yaml test suite, and the pure OCaml one seems around 20% faster. I'm going to do a bit more memory benchmarking in addition to performance before I'm confident in these results, but the vibe coded smoke test was reassuring to see that I wasn't dramatically slower. Memory usage remains a risk of being high, though; something to look at for a future day.
Reflections
This was the first day of the advent adventure where I really felt like I'd hit a breakthrough! Yamlrw is a dropin replacement for all of my uses of ocaml-yaml now, and I can't find any regressions. After a bit more code review, I'm going to post on the OCaml forums to request existing users to test this one and see if they can find any regressions.
It's also nice having a streaming Eio Yaml parser, which will be convenient
for some projects in the remaining days, like my contacts manager. But first,
in