Benjamin Clavié on Making ColBERT Actually Usable

Table of content

Benjamin Clavié builds tools that make cutting-edge retrieval research actually work in production. Based in Tokyo, he does R&D at Mixedbread while maintaining a collection of open-source projects that have quietly become essential infrastructure for anyone serious about RAG systems.

The Problem With Dense Embeddings

Most RAG tutorials tell you to use OpenAI embeddings and call it a day. Clavié has spent years explaining why that’s often not good enough.

Dense retrieval—converting text into single vectors—works as a baseline. But research keeps showing that late-interaction models like ColBERT generalize better to new domains, need less training data, and handle complex queries that dense embeddings struggle with.

The catch: ColBERT was a research project with a research-grade codebase. You needed to understand the literature and wrestle with dependencies to use it.

RAGatouille: Three Lines to ColBERT

RAGatouille changed that. The library wraps ColBERT’s complexity in an API so simple it’s almost embarrassing:

from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
RAG.index(documents, index_name="my_index")
results = RAG.search(query)

Before RAGatouille, ColBERTv2 had around 50k monthly downloads on HuggingFace. After: 3 million. That’s not just a library—it’s a bridge between research papers and working systems.

The Full Retrieval Stack

Clavié didn’t stop at indexing. His toolkit covers the whole retrieval pipeline:

rerankers unifies the mess of reranking approaches—cross-encoders, ColBERT, API-based models—into one consistent interface. Swap reranking strategies by changing one line instead of rewriting your pipeline.

byaldi does the same for multimodal retrieval. It’s a thin wrapper around ColPali that finally lets you “chat with your PDFs” by actually retrieving the visual documents, not just extracted text.

fastkmeans solves a specific pain point: K-means clustering without faiss’s installation nightmare. Just PyTorch and NumPy, runs 5x faster than faiss on GPU.

Training Better Models

The tools matter, but so do the models they run on.

Clavié’s JaColBERT improved Japanese retrieval benchmarks by over 20 percentage points when released. JaColBERTv2.5 then demonstrated how to train effective retrievers with minimal resources—insights that fed into answerai-colbert-small-v1, a 33-million parameter model that competes with models 15x its size.

The JaColBERTv2.5 training recipe has become a reference for building multi-vector retrievers, showing that you don’t need massive compute to get strong results.

ModernBERT: Encoders Aren’t Dead

In late 2024, Clavié co-led the ModernBERT project—a collaboration between Answer.AI and LightOn to finally give BERT a proper successor.

The thesis: encoder models still power most real-world retrieval, classification, and entity extraction. They’re just running on 2018 architecture while decoder research has sprinted ahead.

ModernBERT brings modern techniques to encoders: Flash Attention, rotary position embeddings, 8192-token context, code pretraining. The result is faster, more accurate, and the first encoder to include serious code understanding.

Philosophy: Research as Tools

Clavié writes about the gap between “ML-as-a-commodity-for-ML-practitioners” and “ML-as-a-commodity-for-everyone.” His work consistently aims to close that gap.

From his blog:

“The main motivation of RAGatouille is simple: bridging the gap between state-of-the-art research and alchemical RAG pipeline practices.”

That means strong defaults that work out of the box, but every parameter exposed when you need control. It means tutorials that teach concepts, not just code snippets. It means writing libraries that fail gracefully instead of silently producing wrong results.

Practical Lessons

Late interaction often beats dense embeddings. ColBERT’s MaxSim scoring lets tokens match individually, which captures nuance that single-vector similarity misses. Worth trying before accepting your embedding model’s results as “good enough.”

Token pooling compresses without quality loss. Clavié’s research on clustering-based pooling reduces ColBERT’s storage requirements by 50-66% with no retrieval degradation. A free lunch for production systems.

Reranking is underrated. Using a weak first-stage retriever with a strong reranker often beats trying to make your retriever perfect. The rerankers library makes this easy to test.

Small models can compete. answerai-colbert-small-v1 proves that thoughtful training recipes matter more than parameter count. Test smaller models before assuming you need the big ones.

Resources

Clavié’s work matters because it turns papers into pip-installable tools. The research community publishes improvements; he makes them usable. For anyone building RAG systems, that translation layer is essential.