AoAH Day 18: TOML 1.1 codecs directly from the spec and paper / Dec 2025

After getting my email interfaces automated yesterday, I turned my attention to Zulip integration. But first, I took a segway into another format that it required known as TOML. I noticed TOML 1.1.0 was released today and so I built ocaml-tomlt today.

What I wanted to explore with this library is whether I could use a coding agent to build a complex functional abstraction from scratch. After building yamlrw and yamlt, I settled on the technique Daniel Bünzli developed with jsont in his paper.

Daniel wrote a nice paper about the combinator magic behind jsont
Daniel wrote a nice paper about the combinator magic behind jsont

Why TOML instead of Yaml or JSON?

TOML has become a popular configuration format for other language ecosystems like Rust and Python. Unlike Yaml 1.2, TOML is actually a reasonable human-editable format without the terrifying corner cases and denial of service traps hidden in Yaml.

Since Toml 1.1 was just released today, there are existing OCaml libraries that fully supported. In addition, I need one that is pure OCaml with no C dependencies (like yamlrw) and that uses Bytesrw for streaming I/O so that it composes well with my other libraries from this month's coding.

The data soup paper

The implementation of tomlt was prompted from "An Alphabet for Your Data Soups" which accompanies his jsont library. Working with untyped data formats like TOML in strongly-typed languages like OCaml requires a lot of tedious dynamic marhsalling, and I'd like to switch to conventional OCaml records or other static types as soon as possible.

Daniel's solution is to define a generalised algebraic datatype whose values represent bidirectional mappings between subsets of the wire format and my chosen OCaml types. Waaay back in 2010 when Thomas Gazagnaire and I worked on camlp4-based serialisation, we converted into a generic intermediate representation for OCaml types and values. More recently Jeremy Yallop has been working on MacoCaml which performs this transformation at compile time via hygenic macros.

Unlike any of these approaches, the functional pearl Daniel came up with allows the programmer to define direct functional transformations that work in both directions. It's a bit more work at runtime and so a bit slower, but in return you get excellent error messages for malformed messages. The core Toml type therefore becomes:

(* A codec encapsulates both decoding and encoding *)
type 'a t = {
  kind : string;
  doc : string;
  dec : Toml.t -> ('a, codec_error) result;
  enc : 'a -> Toml.t;
}

This means you write your schema once and get both directions for free, and user functions can be placed at every coding step to allow the programmer to interpose custom functionality such as transformation or validation.

Using tomlt in practise

A Toml config file might look something like this:

[server]                                                                                                                                                    
  host = "localhost"                                                                                                                                          
  port = 8080                                                                                                                                                 
                                                                                                                                                              
[database]                                                                                                                                                  
  connection_max = 5000

Here's what using tomlt to parse this looks like in practice:

type config = { host : string; port : int; debug : bool }

let config_codec =
  Tomlt.(Table.(
    obj (fun host port debug -> { host; port; debug })
    |> mem "host" string ~enc:(fun c -> c.host)
    |> mem "port" int ~enc:(fun c -> c.port)
    |> mem "debug" bool ~enc:(fun c -> c.debug) ~dec_absent:false
    |> finish
  ))

let () =
  match Tomlt.decode_string config_codec {|
    host = "localhost"
    port = 8080
  |} with
  | Ok config -> Printf.printf "Host: %s\n" config.host
  | Error e -> prerr_endline (Tomlt.Toml.Error.to_string e)

The functional pattern is almost identical to the yamlt or jsont codecs I've been building. You don't have to define a codec, as tomlt also provides custom index operators to navigate tables directly:

let config = Toml.of_string {|
  [server]
  host = "localhost"
  port = 8080

  [database]
  connection_max = 5000
|} in
(* Navigate nested tables with .%{} *)
let host = Toml.(config.%{["server"; "host"]} |> to_string) in
let port = Toml.(config.%{["server"; "port"]} |> to_int) in
Printf.printf "Server: %s:%Ld\n" host port;

(* Update values *)
let config' = Toml.(config.%{["database"; "enabled"]} <- bool true) in
print_endline (Toml.to_string config')

The syntax is a little verbose due to the module opening, but it's still a pretty nice way to poke around TOML files interactively!

Datetime handling

Another area where TOML differs from other formats is that it four distinct datetime formats: offset datetimes, local datetimes, local dates, and local times. tomlt tries to unify this a little via a single codec that normalises everything to Ptime.t, but allows the codec to supply sensible defaults (e.g. for a missing timezone, or a missing date).

(* All of these decode to Ptime.t with sensible defaults *)
(* when = 2024-01-15T10:30:00Z       -> offset datetime *)
(* when = 2024-01-15T10:30:00        -> local datetime *)
(* when = 2024-01-15                 -> date at midnight *)
(* when = 10:30:00                   -> time on today's date *)

let event_codec = Tomlt.(Table.(
  obj (fun name when_ -> { name; when_ })
  |> mem "name" string ~enc:(fun e -> e.name)
  |> mem "when" (ptime ()) ~enc:(fun e -> e.when_)
  |> finish
))

For applications that need to preserve the exact format, there's also a ptime_full function which returns a polymorphic variant indicating precisely what was present in the source config file.

Testing

The secret to vibing seems to be having a specification oracle to guide the agent, and TOML has a toml-test suite that's perfect for this purpose:

toml-test is a language-agnostic test suite to verify the correctness of TOML parsers and writers.

Tests are divided into two groups: "invalid" and "valid". Decoders or encoders that reject "invalid" tests pass the tests, and decoders that accept "valid" tests and output precisely what is expected pass the tests. The output format is JSON, described below. -- Toml-test GitHub, 2021

The Claude coding agent iterated overnight on getting to 100% test on the third party tests
The Claude coding agent iterated overnight on getting to 100% test on the third party tests

Reflections

After building yamlrw, yamlt, and now tomlt, I'm convinced that the bidirectional codec pattern is a good approach for agentic OCaml programming. It's a little verbose to express by hand, which leads down the ppx route for most. But with agentic generation and oracle specification testing, the coding agent was particularly helpful with both figuring out the TOML grammar and exposing all the variations of codecs required for parsing all those datetime variants.

Having the TOML 1.1 specification as context and my earlier Claude OCaml RFC skill helped a lot as well, to allow the ocamldoc to be cross referenced. And of course, the key design insights at the heart of the library came from Daniel Bünzli publishing jsont and also uploading his paper. This Tomlt library is a generative clone of his ideas, but a useful one to my personal workflows this advent!

Tomorrow in Day 19, I'll continue with my original goal of getting a Zulip bot working!

# 18th Dec 2025agents, ai, aoah, functional, llms, ocaml

Loading recent items...