Generating chunk-free embeddings for LLMs

This is an idea proposed in 2024 as a Cambridge Computer Science Part III or MPhil project, and is currently being worked on by Mark Jacobsen. It is co-supervised with Sadiq Jaffer.

This project aims to explore the development of a chunk-free approach for generating embeddings in Retrieval-Augmented Generation (RAG) models. Traditional RAG workflows often involve manual or predefined chunking of documents, and we seek to bypass this requirement.

Instead, our approach involves generating multiple embeddings for unchunked text using a synthetic dataset created by (e.g.) a 7b parameter LLM. This dataset would feature structured, point-by-point summaries of each paragraph. An off-the-shelf embedding model could then be modified by removing its mean pooling layer and incorporating cross-attention layers. These layers, inspired by T5's encoder-decoder architecture, would enable a frozen set of embeddings to interact with summary-based embeddings via cross-attention, creating a more nuanced chunk-free representation.

Additionally, the research aims to explore adaptive chunking driven by a trained model, allowing context-aware embedding generation end-to-end. This method promises a more integrated and efficient approach, eliminating the need for separate summarization and embedding processes.

# 1st Jan 2024

ideas ai idea-hard idea-ongoing llms

Anil Madhavapeddy, Professor of Planetary Computing

Generating chunk-free embeddings for LLMs

Related News

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases / May 2025

Conservation Evidence Copilots / Jan 2024