# A Proposal for Voluntary AI Disclosure in OCaml Code

*2026-04-03 — note*


After my [December of agentic coding](https://anil.recoil.org/notes/aoah-2025) sprint, I was left quite
[frazzled](https://marvinh.dev/blog/ddosing-the-human-brain/) but also with a
practical problem. I've got two kinds of libraries: the ones I care about (and
handcraft), and the wild experiments that look perfectly formed but are in fact just
(well typed) slop. After [a year](https://anil.recoil.org/notes/claude-copilot-sandbox) of doing this, it's obvious that the _quality_ of generated code also varies dramatically as
models steadily improve and agentic harnesses improve context management.

This post is about an **[ocaml-ai-disclosure proposal](https://github.com/avsm/ocaml-ai-disclosure)** I put together to help track this in OCaml using metadata and [extension attributes](https://ocaml.org/manual/5.3/attributes.html) in source code.

## The EU is mandating what this summer?\!

Toby Jaffey pointed
me to the [W3C AI Content Disclosure](https://www.w3.org/community/ai-content-disclosure/)
[last week](https://anil.recoil.org/notes/2026w13). The bit that
properly surprised me was a legal snippet buried in their README:

> The EU AI Act Article 50 (effective August 2026) requires that AI-generated text content be "marked in a machine-readable format and detectable as artificially generated or manipulated."
> <cite>\-- [ai-content-disclosure](https://github.com/dweekly/ai-content-disclosure?tab=readme-ov-file), David E. Weekly, 2026</cite>

This summer!!! Whether source code falls under "text content" is an [open
question](https://eur-lex.europa.eu/eli/reg/2024/1689/oj) that hasn't been
addressed in existing legal commentary as far as I can tell (nor can I read the
raw 300+ pages to figure it out for myself).  However, regardless of how lawyers eventually
parse this, voluntary disclosure for code seems like a sensible thing to do anyway.

I've therefore put together an **[ocaml-ai-disclosure](https://github.com/avsm/ocaml-ai-disclosure)** repository contains a draft specification and OCaml reference tooling for voluntary, machine-readable AI content disclosure in OCaml code. I'm interested in both thoughts from the OCaml community but also from other language ecosystems. Weirdly, I can't find a single other programming language that's proposed anything for source code after some searching.

<a href="https://eur-lex.europa.eu/eli/reg/2024/1689/oj"> <figure class="image-center"><img src="/images/eu-ai-act-1.webp" alt="Not even reading the AI Act in my mothertongue shed light on the matter. (Ok ok, it's about laying down harmonised rules on AI and amending existing Regulations)" title="Not even reading the AI Act in my mothertongue shed light on the matter. (Ok ok, it's about laying down harmonised rules on AI and amending existing Regulations)" loading="lazy" srcset="/images/eu-ai-act-1.768.webp 768w, /images/eu-ai-act-1.640.webp 640w, /images/eu-ai-act-1.480.webp 480w, /images/eu-ai-act-1.320.webp 320w, /images/eu-ai-act-1.1600.webp 1600w, /images/eu-ai-act-1.1440.webp 1440w, /images/eu-ai-act-1.1280.webp 1280w, /images/eu-ai-act-1.1024.webp 1024w"><figcaption>Not even reading the AI Act in my mothertongue shed light on the matter. (Ok ok, it's about laying down harmonised rules on AI and amending existing Regulations)</figcaption></figure> </a>

## AI Disclosure for OCaml is pretty easy

The OCaml ecosystem's accumulating code with varying degrees of AI involvement, but currently no machine-readable way to signal it. We obviously need to be very careful about how we mix this code into the [commons](https://github.com/ocaml/opam-repository), because the usual social signals we use to review packages are basically useless now.

However a binary AI "yes/no" flag doesn't capture the reality of how people actually work with these tools. The code I wrote during [AoAH](https://anil.recoil.org/notes/aoah-2025) ranged from a one-shot *"CC generated the whole module from a one-line prompt"* to *"I wrote the core logic by hand and Claude sorted the pretty-printer boilerplate"* or even *"[I got CC to test with Gemini](https://toao.com/blog/check-with-gemini)"*.

My proposal is extremely simple, here's how it works...

### Package Disclosures

An opam package can declare its disclosure using extension fields:

```
x-ai-disclosure: "ai-assisted"
x-ai-model: "claude-opus-4-6"
x-ai-provider: "Anthropic"
```

Note: This may just become a list of values in the final proposal, but you get the idea.

### OCaml Module level

OCaml supports extension attributes, which we use via a floating attribute that applies to the entire compilation unit:

```ocaml
[@@@ai_disclosure "ai-generated"]
[@@@ai_model "claude-opus-4-6"]
[@@@ai_provider "Anthropic"]

let foo = ...
let bar = ...
```

These can also be scoped more finely via declaration attributes that apply to a single binding:

```ocaml
[@@@ai_disclosure "ai-assisted"]

let human_written x = ...

let ai_helper y =
  ...
[@@ai_disclosure "ai-generated"]
```

Disclosure follows a nearest-ancestor inheritance model like the W3C HTML proposal, whereby an explicit annotation overrides the inherited value.

One detail I'm quite pleased with is that `.mli` and `.ml` files are annotated independently, which means that one workflow I use quite a bit of writing the interface files first can be tracked separately from the implementations themselves.

### The disclosure vocabulary

I use the same four levels as the W3C vocabulary, which works well enough for HTML:

|Value|Meaning|
|-------|---------|
|`none`|No AI involvement|
|`ai-assisted`|Human-authored, AI edited or refined|
|`ai-generated`|AI-generated with human prompting and review|
|`autonomous`|AI-generated without human oversight|

I treat the absence of annotation as "unknown", not "none". The `none` value exists for authors who *want* to positively assert human authorship, perhaps because their project's policy requires it or because they want reviewers to know this particular module was deliberately hand-written. Tools may also choose to spelunk back through pre-2022 code and add `none` automatically where it's obvious.

If a module contains both human-written and AI-generated bits, you can annotate
at the package level and add overrides directly in code.  OCaml's module system
and attributes gives us a natural hierarchy for this.

### Model provenance

Each annotation can also optionally carry provenance metadata:
- `ai_model` (the API model identifier, like `claude-opus-4-6` or `gpt-4o`)
- `ai_provider` (like `Anthropic` or `OpenAI`).

[Michael Dales](https://mynameismwd.org) pointed out it's quite common to use multiple models (e.g. to cross
test), so these attributes can be repeated when multiple models contributed.

## The programmer burden is minimal

The nice thing about this proposal is that there's _no_ overhead to a programmer that chooses not to use AI assistance.

For those that do, I've got a [Claude Skill ocaml-dev:ai-disclosure](https://github.com/avsm/ocaml-claude-marketplace/blob/main/plugins/ocaml-dev/skills/ai-disclosure/SKILL.md)
that instructs the agent to add the right annotations in.  So when Claude
generates OCaml code in my sessions, it now inserts the attributes and also
maintains the `.opam.template` files.

During code review, I read the AI-generated code and edit away to (hopefully) improve it, and downgrade `ai-generated` to `ai-assisted` on the way.  If I've substantially rewritten the code then I just remove the annotation and fully claim it.

The key principle is that disclosure reflects the *current state of the code* to make it easier for a human to claim responsibility. A human who has thoroughly reviewed, understood, and rewritten a piece of code may reasonably call it their own. This is not my legal opinion, just a moral, informal and pragmatic one\!

## What this isn't

A few things worth being explicit about after discussions around [my group](/projects/oxcaml) on the matter:

- It's not a judgement on whether AI code is good or bad. The goal is a transparent, machine-readable signal so that consumers of the code (be they humans, puppies, licence checkers, package managers, CI systems, whatever) can apply their own policies.
  
- We don't use git for this. A human may commit AI-generated code, or an AI agent may commit code that was human-reviewed and hacked and slashed enough to be considered rewritten before the commit. Rebases and squash also destroy attribution based on commits. Source-level attributes survive all these operations.
  
- It's not mandatory. The whole point is voluntary adoption. I have noticed a vague reluctance from the people I've talked to to declare, as they'll feel they're being judged. If the OCaml community decides this is useful, adoption will happen naturally. If not, then it'll just be me using it and I'm fine with that\!

## What's next

I'm starting by integrating this into my own [libraries](https://anil.recoil.org/notes/aoah-2025) as a test bed. The Claude Code [marketplace skill](https://github.com/avsm/ocaml-claude-marketplace) is already available if you want to try the automated annotation in your own sessions.

On the tooling side, there are several integration points I'd like to see if this idea has legs:

- odoc could render disclosure metadata alongside module documentation, perhaps using [the odoc plugin](https://jon.recoil.org/blog/2026/03/weeknotes-2026-13.html) system that [Jon Ludlam](https://jon.recoil.org) has been designing.
- merlin or ocaml-lsp could surface disclosure attributes in hover information in the IDE, giving you a quick 'trust signal' while reading other people's code.
- dune could gain native support for the `(ai_disclosure)` stanza to make the opam file generation easier.
- opam could eventually use disclosure fields during version solving. I think it'd be useful to have a solver constraint that prefers packages with human-reviewed code where available, and only fall back to AI if nothing else works.

The full draft specification, FAQ, and reference implementation are at **[github.com/avsm/ocaml-ai-disclosure](https://github.com/avsm/ocaml-ai-disclosure)**.
I'd love feedback on the spec. File issues on the repo or in the [OCaml Discussion thread](https://discuss.ocaml.org/t/a-proposal-for-voluntary-ai-disclosure-in-ocaml-code/17950).
Synopsis: Proposing a voluntary, machine-readable AI content disclosure scheme for OCaml spanning opam packages, dune, and per-module attributes, aligned with the W3C AI Content Disclosure vocabulary.
Words: 1325
DOI: 10.59350/cxypn-ysv27

Discussion:
- Bluesky: <https://bsky.app/profile/anil.recoil.org/post/3mimam2jmbk2a>
- LinkedIn: <https://www.linkedin.com/posts/anilmadhavapeddy_heres-my-proposal-for-voluntary-disclosure-share-7445885378802180096-TDmd>
- Mastodon: <https://amok.recoil.org/@avsm/116341951440291661>
- Twitter: <https://x.com/avsm/status/2040119085460066377>

## Related

- [.plan-26-13: Oxidised, standardised, and syndicated](https://anil.recoil.org/notes/2026w13) (note, 2026-03-29)
- [2025 Advent of Agentic Humps: Building a useful O(x)Caml library every day](https://anil.recoil.org/notes/aoah-2025) (note, 2025-12-26)
- [Oh my Claude, we need agentic copilot sandboxing right now](https://anil.recoil.org/notes/claude-copilot-sandbox) (note, 2025-03-02)

---
Canonical: https://anil.recoil.org/notes/opam-ai-disclosure
Type: note
Tags: ai, ocaml, oxcaml, standards, policy, llms
