AoAH Day 24: Tuatara, an evolving Atom aggregator that mutates

#ai #ocaml #oxcaml #llms #aoah #networks24 Dec 2025

Tuatara is a feed aggregator that integrates Claude to evolve and patch its own code when encountering parsing errors, embodying the concept of self-healing software.

My original purpose for starting this AoAH series was to build a feed aggregator for my group website, so I had to finish up with something to show!

I'm not sure if taking the longest way around was wise here but I ended up building tuatara, an aggregator to pull together all my colleagues' writing into one place. They're a quirky bunch with many diverse homegrown feeds in various states of brokenness, so it's difficult to build a one-size-fits-all tool.

So given it's the end of the year and I'm sozzled on Christmas eve on mulled wine, I decided to make Tuatara mutate its own code by linking with my Claudeio library to force it to evolve and modify itself as it runs across feed errors. Every deployment of Tuatara is meant to be slightly different.

1 Evolving code like it's 2026

The initial generation of the code was pretty straightforward, using Sqlite to store a database with all the posts and importing metadata from my previously created Sortal contacts manager.

> tuatara import-sortal
Sortal Import Results:

  Total contacts scanned: 420
  Contacts with feeds: 15
  Feeds imported: 16
  Feeds skipped (already exist): 0

Run 'tuatara fetch' to download posts from the imported feeds.

But when we actually get the feeds, I rapidly realised that there are lots of parsing quirks needed:

> tuatara fetch
Fetching Anil Madhavapeddy...
  340 posts (0 new)
Fetching David Allsopp...
  Not modified
Fetching Jessica Man...
  Not modified
Fetching Jon Ludlam...
  28 posts (0 new)
Fetching Jon Sterling...
  Not modified
Fetching Mark Elvers...
  Not modified
Fetching Martin Kleppmann...
  Error: Feed parse error: document MUST contains exactly one <feed> element at l.0 c.0
  URL: http://feeds.feedburner.com/martinkl
Fetching Onkar Gulati...
  Error: Not_found
  URL: https://onkargulati.com/feed.xml
Fetching Patrick Ferris...
  Error: Feed parse error: <entry> elements MUST contains at least an <author> element or <feed> element MUST contains one or more <author> elements at l.1460 c.7
  URL: http://patrick.sirref.org/weeklies/atom.xml
Fetching Richard Mortier...
  79 posts (79 new)
Fetching Ryan Gibb...
  38 posts (38 new)
Fetching Sadiq Jaffer...
  10 posts (10 new)

Total: 127 new posts (3 errors)

Either we skip content, or talk to the people involved to fix their feeds, but it's Christmas eve so that's unlikely. And anyway, we want to be liberal in what we accept so why can't I fix my own software first?!

Like the amazing Tuatara, why don't we build evolution directly into our software? Time to take off the semantics seatbelt...

2 Medice, cura te ipsum

The non-obvious and probably-terrible answer here is to use our fancy coding models to force the Tuatara source code to heal itself. I added an --evolve flag to allow tuatara to invoke Claude/OCaml upon errors and patch its own code. Turns out self help works, as Martin's blog feed was just fine!

Fetching Martin Kleppmann...
  Error: Feed parse error: document MUST contains exactly one <feed> element at l.0 c.0
  URL: http://feeds.feedburner.com/martinkl

Invoking Claude Code to fix parse error...

The feed from `http://feeds.feedburner.com/martinkl` is an **RSS 2.0 feed** (it
starts with `<rss version="2.0">`), but tuatara was incorrectly detecting it as
an **Atom feed**. This caused the Syndic Atom parser to fail with "document
MUST contains exactly one \<feed\> element".

The root cause was that the `detect_feed_type` function prioritized the HTTP
`Content-Type` header over the actual content. FeedBurner (and other feed
aggregators) often serve RSS feeds with an incorrect `application/atom+xml`
content-type header.

This is a generic fix that will work for any feed aggregator or CDN that
mis-labels RSS feeds as Atom (or vice versa), so no domain-specific quirk was
needed.

But the parsing drama continued, as Michael Dales uses the wrong date format in his feed (tsk tsk I'd send the RFC police out if it wasn't Christmas), but Tuatara evolves a quirk that gets past it:

The quirk module converts ISO 8601 dates (2025-10-22T12:24:00-00:00) to RFC 822 format (Wed, 22 Oct 2025 12:24:00 GMT) which is what Syndic's RSS2 parser expects.

And Onkar Gulati and Patrick Ferris both have an empty author field which would ordinarily give us a dreaded Not_found exception:

Fetching Patrick Ferris...Error: Feed parse error: elements MUST contains at least an element or element MUST contains one or more elements at l.1460 c.7 URL: http://patrick.sirref.org/weeklies/atom.xml

But never fear, the inexorable --evolve flag figures it out and patches its own code!

There were some non-trivial quirks as well; Andres Zuñiga-Gonzalez uses Quatro for his website which puts the entire HTML blob into the summary field, but the evolution managed to use html5rw to parse its way out of this. This sort of fix is very hard to generalise, so it's actually quite useful for the tool to fix itself on demand for our small group.

3 Using the Claude frontend design

Then I needed a quick way to do a clean frontend output so I can visualise the JSONfeed. Claude has a /plugin frontend-design skill that is built in, and prompting it to give me a few designs let me integrate a --html output.

And because it's Christmas, I added some snowflakes as well. Yay!

Ho ho ho merry xmas everyone from the EEG feed that isnt live yet but will be after the new year

4 Reflections

The paper I enjoyed writing the most this year was Steps towards an Ecology for the Internet for Aarhus 2025. In the back of my head since has been a desire to start figuring out what self-evolving software actually might be. It's a strange, and probably impractical idea, but I'm delighted that I took a tiny step towards it with this project.

Back in March, I had the honour of being invited to a Bellairs meeting to discuss a heady combination of semantics and computational science. Jon Sterling demonstrated his wonderfully organised Forester website. And I... showed how my mismash of semi-structured writings can kind of be connected together in a vaguely coherent way to build my website. Next year will have me thinking much harder about the implications of self-evolving code, of how radically transformative to global biodiversity semi-structured agentic processing might be, and other heavy matters. But to close this year, I'm disproportionately pleased to have gotten my tiny website under control a little!

Sitting indoors in Barbados with a gigantic beach outside: a classic sign of semanticists in the wild

As I noted in my letter to the ACM, it's important that we can use AI for things that boost the human condition; I really enjoy reading my colleagues' long form thoughts much more than doomscrolling on the web, and so making it easier to gather their thoughts digestibly and easily is a nice end to my agentic humps effort. Tomorrow on Christmas I'll publish all the skills I used so others can try them out.

References

[1]Madhavapeddy et al (2025). Steps towards an Ecology for the Internet. Association for Computing Machinery. 10.1145/3744169.3744180

[2]Sutherland et al (2026). Nine changes needed to deliver a radical transformation in biodiversity measurement. 10.1073/pnas.2519345123

[3]Madhavapeddy (2025). Dear ACM, you're doing AI wrong but you can still get it right. 10.59350/c84g4-5zt58

[4]Madhavapeddy (2025). Presenting our Ecology of the Internet ideas at Aarhus 2025. 10.59350/p45b8-kvt85

Webbplats update (part 1)May 2026

Michael Dales. This website is hosted using Webbplats, my own OCaml based thing that is kinda like a static site generator, but dynamic. It was built upon Dream, a web framework for OCaml, which is very good (and particularly well documented), but for various reasons, I want to port it over to something using the …

.plan-26-18: From tropical forest protection to oi swallowing its oxcaml tailMay 2026

Our REDD+ over-crediting paper hits Nature Communications just as Microsoft retreats from removals, we talk responsible evidence synthesis while LLMs appear in UK planning, and oi grows a self-update bootstrap.

The Internet needs an antibotty immune system, statApr 2026

Anthropic's Mythos makes autonomous vulnerability chaining across devices a sudden reality, so I've been thinking about how digital 'antibotty' inoculation networks may be needed far sooner than I expected.

Nine changes needed to deliver a radical transformation in biodiversity measurementJan 2026

William J. Sutherland, Neil D. Burgess et al. — Proceedings of the National Academy of Sciences

2025 Advent of Agentic Humps: Building a useful O(x)Caml library every dayDec 2025

An exploration of agentic programming through building useful OCaml libraries daily using Claude Code while establishing groundrules for responsible development.

AoAH Day 25: Claude OCaml Marketplace for all your festive coding needsDec 2025

Wrapping up 25 days of agentic coding with a Claude Code OCaml plugin marketplace to share the skills and tools developed throughout the series.

Dear ACM, you're doing AI wrong but you can still get it rightDec 2025

Critiquing ACM's paywalled AI paper summaries and proposing better alternatives like open feeds, easier downloads, provenance tracking, and personalised agentic interfaces.

AoAH Day 15: Porting a complete HTML5 parser and browser test suiteDec 2025

Vibespiling JustHTML from Python to pure OCaml, achieving 100% pass rate on the browser html5lib test suite using agentic workflows.

AoAH Day 8: Building a contacts CLI manager with SortalDec 2025

Creating Sortal, a CLI contacts management application using Yaml storage, XDG directories, Git-based synchronization, and integrating all previously built libraries into a cohesive CLI tool.

AoAH Day 4: Going recursive with Claudeio for ClaudeDec 2025

Creating OCaml bindings for the Claude API using Eio and jsont codecs by reverse-engineering the JSON-RPC protocol from Python and Go SDKs, enabling Claude to write more Claude-powered OCaml code.

Presenting our Ecology of the Internet ideas at Aarhus 2025Aug 2025

Presentation at Aarhus 2025 on Internet ecology, proposing AI-driven software diversity to fight protocol ossification and create more resilient networks.

Steps towards an Ecology for the InternetAug 2025

Anil Madhavapeddy, Sam Reynolds et al. — Proceedings of the sixth decennial Aarhus conference: Computing X Crisis

OxCaml LabsJan 2025

1 Evolving code like it's 2026

2 Medice, cura te ipsum

3 Using the Claude frontend design

4 Reflections

References

Related