Modern operating systems are increasingly moving away from the syscall-at-a-time POSIX model towards shared-memory submission/completion queues like Linux's io_uring, Windows IoRing and macOS' various async interfaces. In my VMIL 2025 keynote at ICFP/SPLASH, I argued that it's time for language runtimes to treat these shared-memory rings as first-class citizens, with POSIX relegated to a compatibility backend.
Lean 4 is an interesting target for this kind of work because it is simultaneously a proof assistant and a practical programming language with its own runtime. Its IO story today is comparatively thin though as it leans (pun intended) on its C runtime for blocking file and socket operations. There is room to design a principled, verified async IO layer from scratch rather than retrofitting one.
This project would design and implement an io_uring-backed IO subsystem for Lean. Most of the work will be specifying the submission/completion queue pair: a bounded, fixed-capacity, lock-free ring shared between a single producer (the Lean program submitting work) and a single consumer (the kernel completing it), with memory-ordering semantics that must be respected exactly for correctness. The student would:
- Model the ring structure. Define the submission queue (SQ) and completion queue
(CQ) as Lean data structures, with explicit head/tail indices, entry
layouts, and the subset of io_uring opcodes needed for a useful demo
(
read,write,accept,recv,send,close). - Specify concurrent semantics. Lean's existing concurrency story is
mostly sequential
IOwith a few task primitives. The project will almost certainly require defining novel abstractions for shared-memory concurrency like monotonic indices, release/acquire fences, and a story for reasoning about interleavings between kernel and user space. - Wire it into the Lean runtime. Use Lean's FFI to
mmapthe ring buffers, register file descriptors and driveio_uring_enter, exposing a clean high-level API (ideally something close to Eio'sFlow.copy) that hides the ring machinery behind typed resource handles. - Build a demo. A small HTTP echo server or parallel file copy that
demonstrates measurable throughput wins over a naive blocking-IO Lean
baseline, like the ocaml-uring parallel
cpexample.
The ambitious stretch goal is to prove the ring's key invariants inside Lean itself. There is growing expertise in Ilya Sergey's group at NUS (more here) on mechanised metatheory for concurrent data structures to ask for help!
A successful project would produce a working io_uring backend usable from
real Lean programs, a clean specification of the ring's concurrent semantics,
benchmarks against the existing Lean IO path, and as a stretch
a machine-checked proof that the ring never loses or duplicates
submissions under the single-producer/single-consumer discipline.
1 Related Reading
- VMIL 2025 keynote that motivates this work
- ocaml-uring are my io_uring OCaml bindings and test cases
- Eio is a higher-level OCaml interface to imitate
- Lean 4 FFI documentation
- Efficient IO with io_uring, Jens Axboe
- The FPL is where the inspiration for this project came from after talking to Ilya Sergey
