AoAH Day 21: Complete dynamic HTML5 validation in OCaml and the browser / Dec 2025

With language detection now working in OCaml, I completed vibespiling the Nu HTML Validator from Java to OCaml. This is the official W3C validator used to check HTML5 conformance, and it's a substantial codebase with thousands of validation rules. I set out to see what a few days of agentic processing would do to transform the complex Java codebase into a more functionally structured pure OCaml codebase.

The result is a pure OCaml HTML5 conformance checker that integrates with the parser I built last week, all published as ocaml-html5rw. Having the logic in pure OCaml meant that I could also compile it into standalone JavaScript and WASM. Dynamic conformance checking works even better than server-side filtering since live JavaScript executing on the page (and modifying the DOM) can also be checked. I published this to NPM using a new Claude skill, and coded a live panel overlay to live debug HTML5 issues that I use on my own website now.

My conformance checker now runs the OCaml straight in the browser on my dev website and highlights errors along with explanations.
My conformance checker now runs the OCaml straight in the browser on my dev website and highlights errors along with explanations.

Full HTML5 validation in OCaml/JavaScript/WASM

I'm going to talk about my results in reverse today, since I thought the outcomes were so unexpectedly useful. I took yesterday's session and asked Claude to build an ocaml-to-npm skill, which I used to publish the html5rw JavaScript and wasm to npm.

This runs the HTML5 validation OCaml code by serialising the live DOM tree and then collecting the various validation errors along with the source. This is sufficient to populate an overlay panel that can not only list the errors, but also highlighting the offending DOM node with a red box. This spotted lots of dynamic errors in my website!

Publishing on NPM is quite convenient as there are several CDNs that serve the JavaScript directly. I integrate this into the development version of my blog as simply as:

<script src="https://cdn.jsdelivr.net/npm/html5rw-jsoo@1.0.0/htmlrw.js"></script>
<script>
function validateWithPanel() {
  const result = html5rw.validateAndShowPanel(document.documentElement, {
    // Annotation options
    annotation: {
      addDataAttrs: true,
      addClasses: true,
      showTooltips: true,
      tooltipPosition: 'auto',
      highlightOnHover: true
    },
    // Panel options
    panel: {
      initialPosition: 'topRight',
      draggable: true,
      collapsible: true,
      groupBySeverity: true,
      clickToHighlight: true,
      showSelectorPath: true,
      theme: 'auto'
    }
});
</script>

I'll probably wrap this in a webcomponent in the future as Arthur Wendling did with x-ocaml but for now this is already useful on my own website. If anyone has any pointers for what the right CSS patterns are for adding these debug overlay panels to websites with minimal intrusion, I'd be grateful. I'm extremely unfamiliar with how modern frontend programming works...

Did you know that you're not really supposed to have more than one h1 tag? Neither did I...
Did you know that you're not really supposed to have more than one h1 tag? Neither did I...

And of course, if you do prefer to stick to the server-side, then you get fast native code OCaml via a command-line binary provided by the package:

$ dune exec -- html5check test.html
test.html:126.73: error [no-p-element-in-scope]: No “p” element in scope but a
“p” end tag seen.
test.html:113.72: error [missing-alt]: An “img” element must have an “alt”
attribute, except under certain conditions. For details, consult guidance on
providing text alternatives for images.
test.html:120.27: error [duplicate-id]: Duplicate ID “duplicate-id”.
test.html:123.36: error [disallowed-child]: Element “div” not allowed as child
of element “span” in this context. (Suppressing further errors from this
subtree.)
test.html:152.8: info [multiple-h1]: Consider using only one “h1” element per
document (or, if using “h1” elements multiple times is required, consider using
the “headingoffset” attribute to indicate that these “h1” elements are not all
top-level headings).

A few days of vibespiling

The reason this took a few days of background vibespiling comes down to the sheer size of the problem. The Nu Validator is a mature Java application that is built around Java's SAX event model, which I last used in 2000 when I worked on Chello's website. Looking through the validator Java code brought back "fond" memories of building factories of factory makers. In the Nu validators, there are lots of rules checkers that iterate through an HTML5 parse tree and extend a base Checker class:

public final class TableChecker extends Checker {
private Table current;
private final LinkedList<Table> stack = new LinkedList<>();

@Override
public void startElement(String uri, String localName,
                         String qName, Attributes atts)
  throws SAXException {
  if ("http://www.w3.org/1999/xhtml".equals(uri)) {
    if ("table".equals(localName)) { push(); } else
    if (current != null) {
      if ("td".equals(localName) || "th".equals(localName)) {
        current.cell(atts, localName);
      }
      // ... more element handling
    }
  } } }

Since the number of rules was massive, a single run of the agent wasn't enough. Instead, I knocked up a Claudeio wrapper that ran the agent iteratively exhorting it to continually sample the rules and iterate on a good architecture. This is only possible since the validator has a massive test suite with expected outputs. The goal of the agent was therefore to try OCaml code architectures that maximised the number of passing rules, and then combining them towards getting 100% pass rate.

After a few days, this converged and hit 100% pass rate with a bit of human prompt massaging from me. The OCaml version replaces inheritance with first-class modules instead, and each checker implements the Checker.S signature:

module type S = sig
  type state
  val create : unit -> state
  val reset : state -> unit
  val start_element : state -> element:Element.t -> Message_collector.t -> unit
  val end_element : state -> tag:Tag.element_tag -> Message_collector.t -> unit
  val characters : state -> string -> Message_collector.t -> unit
  val end_document : state -> Message_collector.t -> unit
end

This gives us the same flexibility to compose checkers, but with abstract state types rather than hidden mutable fields scattered across a class hierarchy.

Browsing the giant HTML5 test suite

I extended the visual HTML test suite generator I built a few days ago to include the thousands of validation tests, and the library now outputs one epic HTML file that lists each of the thousands of tests.

Having these tests was essential when doing refactoring, as a small change in one checker affected others. Without the comprehensive test oracle, the agents would quickly diverge out of control.

It's quite fun just browsing around the expect tests to see what's going on with HTML5
It's quite fun just browsing around the expect tests to see what's going on with HTML5

A case study on the table checker

The table checker is one of the more complex validators, tracking cell spans, detecting overlaps, and validating that headers attributes reference valid th elements. The Java version spreads this across multiple files, but the OCaml version consolidates everything into a single module with explicit types.

type cell = {
  mutable left : int;
  mutable right : int;
  mutable bottom : int;
  headers : string list;
  element_name : string;
}

type row_group = {
  mutable current_row : int;
  mutable insertion_point : int;
  cells_in_effect : ((int * int), cell) Hashtbl.t;
  mutable cells_on_current_row : cell array;
  row_group_type : string option;
}

type table = {
  mutable state : table_state;
  mutable column_count : int;
  header_ids : (string, unit) Hashtbl.t;
  cells_with_headers : cell list ref;
  mutable current_row_group : row_group option;
  (* ... *)
}

However, the agent didn't fundamentally change the algorithmic structure; we still have the same basic state machine but with much more succinct variant types.

Typed error codes

One significant quality of life improvement came from refactoring how the error messages are collected for rendering. The Java code uses string formatting throughout to directly output messages, but the OCaml error_code.mli module defines a polymorphic variant hierarchy that's exhaustively checkable:

type table_error = [
  | `Cell_overlap
  | `Cell_spans_rowgroup
  | `Row_no_cells of [`Row of int]
  | `Column_no_cells of [`Column of int] * [`Elem of string]
]

type attr_error = [
  | `Not_allowed of [`Attr of string] * [`Elem of string]
  | `Missing of [`Elem of string] * [`Attr of string]
  | `Bad_value of [`Elem of string] * [`Attr of string] *
                  [`Value of string] * [`Reason of string]
  | `Duplicate_id of [`Id of string]
  (* ... *)
]

This allows clients to pattern match on specific classes of errors easily:

match err with
| `Attr (`Duplicate_id (`Id id)) -> handle_duplicate id
| `Img `Missing_alt -> suggest_alt_text ()
| `Table `Cell_overlap -> report_overlap ()
| _ -> default_handler err

This also means we can add new error categories without changing existing code, and the compiler tells us if we miss any cases. This is a pretty classic usecase for OCaml that both Yaron Minsky and I talked about (in Caml Trading and XenServer), but sometimes it's good to remember why I love OCaml so much!

Reflections

While the OCaml code generated is by no means sparkling clean, it is useful and operational already. The typed error hierarchy is perhaps the biggest win, as it lifts up the abstraction level to be more idiomatic to OCaml style and eventually makes it easier to perhaps jump over to Haskell or Lean for even more purity and formal specification work. This is by far the biggest agentic coding translation I've attempted so far, to the point where it used up all my Claude 20x Max credits in a matter of days. I have two accounts now!

What also surprised me was how little the agent struggled with the architectural transformation across programming languages. Given examples of OCaml first class modules (from Real World OCaml and the Jane Street OCaml code), it produced well-structured code.

Claude can introspect its session context to update a skill to match what its learnt
Claude can introspect its session context to update a skill to match what its learnt

I still have no idea how I'm going to maintain this code in the long term, but do let me know if the HTML5 checker is useful to you. One little Claude trick that remains handy is that after a prompting session, I prompt the agent to fix its own skill based on the feedback its received this session. That helps to generalise the skills as more projects get to using it.

# 21st Dec 2025agents, ai, aoah, llms, ocaml, web

Loading recent items...