AoAH Day 24: Tuatara, an evolving Atom aggregator that mutates / Dec 2025

My original purpose for starting this AoAH series was to build a feed aggregator for my group website, so I had to finish up with something to show!

I'm not sure if taking the longest way around was wise here but I ended up building tuatara, an aggregator to pull together all my colleagues' writing into one place. They're a quirky bunch with many diverse homegrown feeds in various states of brokenness, so it's difficult to build a one-size-fits-all tool.

So given it's the end of the year and I'm sozzled on Christmas eve on mulled wine, I decided to make Tuatara mutate its own code by linking with my Claudeio library to force it to evolve and modify itself as it runs across feed errors. Every deployment of Tuatara is meant to be slightly different.

Evolving code like it's 2026

The initial generation of the code was pretty straightforward, using Sqlite to store a database with all the posts and importing metadata from my previously created Sortal contacts manager.

> tuatara import-sortal
Sortal Import Results:

  Total contacts scanned: 420
  Contacts with feeds: 15
  Feeds imported: 16
  Feeds skipped (already exist): 0

Run 'tuatara fetch' to download posts from the imported feeds.

But when we actually get the feeds, I rapidly realised that there are lots of parsing quirks needed:

> tuatara fetch
Fetching Anil Madhavapeddy...
  340 posts (0 new)
Fetching David Allsopp...
  Not modified
Fetching Jessica Man...
  Not modified
Fetching Jon Ludlam...
  28 posts (0 new)
Fetching Jon Sterling...
  Not modified
Fetching Mark Elvers...
  Not modified
Fetching Martin Kleppmann...
  Error: Feed parse error: document MUST contains exactly one <feed> element at l.0 c.0
  URL: http://feeds.feedburner.com/martinkl
Fetching Onkar Gulati...
  Error: Not_found
  URL: https://onkargulati.com/feed.xml
Fetching Patrick Ferris...
  Error: Feed parse error: <entry> elements MUST contains at least an <author> element or <feed> element MUST contains one or more <author> elements at l.1460 c.7
  URL: http://patrick.sirref.org/weeklies/atom.xml
Fetching Richard Mortier...
  79 posts (79 new)
Fetching Ryan Gibb...
  38 posts (38 new)
Fetching Sadiq Jaffer...
  10 posts (10 new)

Total: 127 new posts (3 errors)

Either we skip content, or talk to the people involved to fix their feeds, but it's Christmas eve so that's unlikely. And anyway, we want to be liberal in what we accept so why can't I fix my own software first?!

Like the amazing Tuatara, why don't we build evolution directly into our software? Time to take off the semantics seatbelt...
Like the amazing Tuatara, why don't we build evolution directly into our software? Time to take off the semantics seatbelt...

Medice, cura te ipsum

The non-obvious and probably-terrible answer here is to use our fancy coding models to force the Tuatara source code to heal itself. I added an --evolve flag to allow tuatara to invoke Claude/OCaml upon errors and patch its own code. Turns out self help works, as Martin's blog feed was just fine!

Fetching Martin Kleppmann...
  Error: Feed parse error: document MUST contains exactly one <feed> element at l.0 c.0
  URL: http://feeds.feedburner.com/martinkl

Invoking Claude Code to fix parse error...

The feed from `http://feeds.feedburner.com/martinkl` is an **RSS 2.0 feed** (it
starts with `<rss version="2.0">`), but tuatara was incorrectly detecting it as
an **Atom feed**. This caused the Syndic Atom parser to fail with "document
MUST contains exactly one \<feed\> element".

The root cause was that the `detect_feed_type` function prioritized the HTTP
`Content-Type` header over the actual content. FeedBurner (and other feed
aggregators) often serve RSS feeds with an incorrect `application/atom+xml`
content-type header.

This is a generic fix that will work for any feed aggregator or CDN that
mis-labels RSS feeds as Atom (or vice versa), so no domain-specific quirk was
needed.

But the parsing drama continued, as Michael Dales uses the wrong date format in his feed (tsk tsk I'd send the RFC police out if it wasn't Christmas), but Tuatara evolves a quirk that gets past it:

The quirk module converts ISO 8601 dates (2025-10-22T12:24:00-00:00) to RFC 822 format (Wed, 22 Oct 2025 12:24:00 GMT) which is what Syndic's RSS2 parser expects.

And Onkar Gulati and Patrick Ferris both have an empty author field which would ordinarily give us a dreaded Not_found exception:

Fetching Patrick Ferris...Error: Feed parse error: elements MUST contains at least an element or element MUST contains one or more elements at l.1460 c.7 URL: http://patrick.sirref.org/weeklies/atom.xml

But never fear, the inexorable --evolve flag figures it out and patches its own code!

There were some non-trivial quirks as well; Andres Zuñiga-Gonzalez uses Quatro for his website which puts the entire HTML blob into the summary field, but the evolution managed to use html5rw to parse its way out of this. This sort of fix is very hard to generalise, so it's actually quite useful for the tool to fix itself on demand for our small group.

Using the Claude frontend design

Then I needed a quick way to do a clean frontend output so I can visualise the JSONfeed. Claude has a /plugin frontend-design skill that is built in, and prompting it to give me a few designs let me integrate a --html output.

And because it's Christmas, I added some snowflakes as well. Yay!

Ho ho ho merry xmas everyone from the EEG feed that isnt live yet but will be after the new year
Ho ho ho merry xmas everyone from the EEG feed that isnt live yet but will be after the new year

Reflections

The paper I enjoyed writing the most this year was Steps towards an Ecology for the Internet for Aarhus 2025. In the back of my head since has been a desire to start figuring out what self-evolving software actually might be. It's a strange, and probably impractical idea, but I'm delighted that I took a tiny step towards it with this project.

Back in March, I had the honour of being invited to a Bellairs meeting to discuss a heady combination of semantics and computational science. Jon Sterling demonstrated his wonderfully organised Forester website. And I... showed how my mismash of semi-structured writings can kind of be connected together in a vaguely coherent way to build my website. Next year will have me thinking much harder about the implications of self-evolving code, of how radically transformative to global biodiversity semi-structured agentic processing might be, and other heavy matters. But to close this year, I'm disproportionately pleased to have gotten my tiny website under control a little!

Sitting indoors in Barbados with a gigantic beach outside: a classic sign of semanticists in the wild
Sitting indoors in Barbados with a gigantic beach outside: a classic sign of semanticists in the wild

As I noted in my letter to the ACM, it's important that we can use AI for things that boost the human condition; I really enjoy reading my colleagues' long form thoughts much more than doomscrolling on the web, and so making it easier to gather their thoughts digestibly and easily is a nice end to my agentic humps effort. Tomorrow on Christmas I'll publish all the skills I used so others can try them out.

# 24th Dec 2025ai, aoah, llms, networks, ocaml, oxcaml

Loading recent items...