# A Hybrid Graph-Vector Database in O(x)Caml

*2026-01-01 — idea*


This project [proposes](https://github.com/olifog/gvecdb-ocaml/blob/main/proposal.pdf) building a hybrid graph-vector database in OCaml.

> Retrieval-Augmented Generation (RAG) grounds LLM responses in external data. The naive approach to implementing RAG is to embed relevant documents with a pre-trained embedding model, store them in a vector database, and then return the top k closest documents to an LLM's embedded query during a generation step.
> 
> This approach is often insufficient for high-quality LLM responses, as dense retrieval (via embedding closeness) has been shown to underperform more traditional keyword-based sparse retrieval (e.g. BM25') on several BEIR datasets. In practice, merging sparse and dense retrieval results improves recall and downstream RAG accuracy over using either in isolation.
> 
> Recent hybrid sparse/dense retrieval systems such as GraphRAG have further demonstrated the value of graph structure in the sparse retrieval component. By carefully setting up a knowledge graph to expose semantically meaningful edges between entities, models can be augmented with much more powerful search and retrieval capabilities over embedded external data. However, most such systems are text-only, built as offline batch processes, and slow to update. Their indices are typically reconstructed in large jobs rather than incrementally maintained, and they rarely target lightweight, embed-dable deployments. Likewise, general-purpose graph databases with vector add-ons tend to have runtime overhead and operational complexity that is undesirable for a small, embedded engine.
> 
> Over the course of the project, I will produce a single-machine graph-vector database in OCaml. The graph store will behave as a conventional graph database, allowing CRUD' operations on nodes, edges, and their associated types and metadata. The vector addition will allow the user to link a number of labelled vectors to a node or edge, and perform semantic search queries over the nodes and edges using the vectors. The graph store will be built on top of LMDB.
> <cite> --[olifog, Part II project proposal, Nov 2025](https://github.com/olifog/gvecdb-ocaml/blob/main/proposal.pdf)</cite>

Oliver completed this project using [OxCaml](https://anil.recoil.org/projects/oxcaml) with the full source code available at <https://github.com/olifog/gvecdb-ocaml> and even got most of arxiv embedded and visualised using his project\!

<figure class="image-center"><img src="/images/oli-project-1.webp" alt="" title="" loading="lazy" srcset="/images/oli-project-1.768.webp 768w, /images/oli-project-1.640.webp 640w, /images/oli-project-1.480.webp 480w, /images/oli-project-1.3840.webp 3840w, /images/oli-project-1.320.webp 320w, /images/oli-project-1.2560.webp 2560w, /images/oli-project-1.1920.webp 1920w, /images/oli-project-1.1600.webp 1600w, /images/oli-project-1.1440.webp 1440w, /images/oli-project-1.1280.webp 1280w, /images/oli-project-1.1024.webp 1024w"><figcaption></figcaption></figure>

<figure class="image-center"><img src="/images/oli-project-2.webp" alt="" title="" loading="lazy" srcset="/images/oli-project-2.768.webp 768w, /images/oli-project-2.640.webp 640w, /images/oli-project-2.480.webp 480w, /images/oli-project-2.3840.webp 3840w, /images/oli-project-2.320.webp 320w, /images/oli-project-2.2560.webp 2560w, /images/oli-project-2.1920.webp 1920w, /images/oli-project-2.1600.webp 1600w, /images/oli-project-2.1440.webp 1440w, /images/oli-project-2.1280.webp 1280w, /images/oli-project-2.1024.webp 1024w"><figcaption></figcaption></figure>
Status: Completed
Level: PartII
Year: 2026
Project: OxCaml Labs
Supervisors: Ryan Gibb, Jon Crowcroft, Anil Madhavapeddy
Students: Oliver Fogelin

## Related

- [.plan-26-21: Pint of Science, OxCaml dissertations, and TESSERA 1.1 stirring](https://anil.recoil.org/notes/2026w21) (note, 2026-05-24)
- [OxCaml Labs](https://anil.recoil.org/projects/oxcaml) (project, 2025-01-01)

---
Canonical: https://anil.recoil.org/ideas/oxcaml-vector-db
Type: idea
Tags: ocaml, oxcaml, ai, systems
