Hybrid Search: Combining Keyword and Semantic Retrieval
Table of content
Vector search finds “database migration” when you search “moving data stores.” Keyword search finds “PostgreSQL” when you search “PostgreSQL.” Neither does both well. Hybrid search runs both methods and merges the results.
The Problem with Single Methods
Each approach fails in predictable ways:
| Query | Keyword Search | Vector Search |
|---|---|---|
| “Q3-2024-roadmap.md” | Exact match found | Might rank it lower |
| “thoughts about APIs” | Misses “REST design notes” | Finds semantic matches |
| “error code ERR_429” | Precise hit | Gets lost in embeddings |
| “how to handle rate limiting” | Misses if no exact phrase | Understands intent |
Error codes, filenames, and technical identifiers need exact matching. Conceptual queries need semantic understanding. Your personal knowledge base has both types of content.
How Hybrid Search Works
Run keyword search (BM25) and vector search in parallel, then combine results:
Query: "PostgreSQL migration strategy"
Keyword Search (BM25) Vector Search (Embeddings)
───────────────────── ──────────────────────────
1. postgres-migration.md 1. database-move-plan.md
2. migration-checklist.md 2. postgres-migration.md
3. sql-scripts/migrate.sql 3. switching-databases.md
↓ Fusion ↓
Combined Results (Reciprocal Rank Fusion)
─────────────────────────────────────────
1. postgres-migration.md (appeared in both)
2. database-move-plan.md (strong semantic match)
3. migration-checklist.md (keyword precision)
Documents appearing in both result sets get boosted. Documents with strong signals in either method still surface.
Reciprocal Rank Fusion
RRF merges ranked lists without needing comparable scores. The formula:
RRF_score = Σ 1/(k + rank)
Where k is a constant (typically 60) and rank is the position in each result list.
def reciprocal_rank_fusion(results_lists: list[list], k: int = 60) -> list:
"""Combine multiple ranked result lists using RRF."""
scores = {}
for results in results_lists:
for rank, doc_id in enumerate(results, start=1):
if doc_id not in scores:
scores[doc_id] = 0
scores[doc_id] += 1 / (k + rank)
# Sort by combined score
return sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
# Example
keyword_results = ["doc_a", "doc_b", "doc_c"]
vector_results = ["doc_b", "doc_d", "doc_a"]
merged = reciprocal_rank_fusion([keyword_results, vector_results])
# doc_b ranks highest (position 2 in keyword, position 1 in vector)
RRF works because it:
- Ignores raw scores (no normalization needed between different search methods)
- Rewards documents found by multiple methods
- Handles missing documents naturally (only contributes to score when present)
Implementation with pgvector
PostgreSQL with pgvector supports both full-text search and vector similarity:
-- Create table with both text search and vector columns
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding vector(384),
content_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);
CREATE INDEX ON documents USING gin(content_tsv);
CREATE INDEX ON documents USING ivfflat(embedding vector_cosine_ops);
-- Hybrid search query
WITH keyword_search AS (
SELECT id, ts_rank(content_tsv, query) AS keyword_score
FROM documents, plainto_tsquery('english', 'database migration') query
WHERE content_tsv @@ query
ORDER BY keyword_score DESC
LIMIT 20
),
vector_search AS (
SELECT id, 1 - (embedding <=> $1) AS vector_score
FROM documents
ORDER BY embedding <=> $1
LIMIT 20
),
rrf AS (
SELECT
COALESCE(k.id, v.id) AS id,
COALESCE(1.0 / (60 + ROW_NUMBER() OVER (ORDER BY k.keyword_score DESC)), 0) +
COALESCE(1.0 / (60 + ROW_NUMBER() OVER (ORDER BY v.vector_score DESC)), 0) AS score
FROM keyword_search k
FULL OUTER JOIN vector_search v ON k.id = v.id
)
SELECT d.*, rrf.score
FROM rrf
JOIN documents d ON d.id = rrf.id
ORDER BY rrf.score DESC
LIMIT 10;
Khoj’s Two-Stage Approach
Khoj uses bi-encoder embeddings for initial retrieval, then cross-encoder reranking for precision:
Query → Bi-Encoder → Top 100 candidates → Cross-Encoder → Top 10 results
The bi-encoder computes query and document embeddings separately (fast, scalable). The cross-encoder processes query-document pairs together (slow, accurate). For a personal knowledge base, you might have 10,000 documents. The bi-encoder narrows that to 100 candidates in milliseconds. The cross-encoder then spends its time on just those 100.
From the Khoj documentation:
“The search engine uses a two-stage retrieval approach: initial candidate retrieval via bi-encoder embeddings, followed by precise reranking using cross-encoder models.”
Configure the bi-encoder confidence threshold to balance recall and precision. Lower thresholds return more documents for the cross-encoder to rerank.
RAGFlow’s Hybrid Implementation
RAGFlow uses Elasticsearch for hybrid search, combining keyword and vector retrieval with configurable fusion:
| Setting | Effect |
|---|---|
| Keyword weight 0.7, Vector weight 0.3 | Favor exact matches |
| Keyword weight 0.3, Vector weight 0.7 | Favor semantic understanding |
| 60% keyword hit prerequisite | Require some lexical overlap |
The prerequisite filter requires some word overlap before vector similarity kicks in. Without it, a query about “PostgreSQL” might return documents about “database philosophy” that never mention Postgres at all.
When to Use Each Approach
| Scenario | Best Approach |
|---|---|
| Technical documentation with codes | Keyword-heavy hybrid |
| Journal entries, notes | Semantic-heavy hybrid |
| Mixed content (typical personal KB) | Balanced hybrid |
| Structured data (dates, IDs) | Keyword with metadata filters |
For most personal search use cases, start with equal weights and adjust based on result quality.
Python Implementation
Complete hybrid search with sentence-transformers and SQLite FTS:
import sqlite3
from sentence_transformers import SentenceTransformer
import numpy as np
class HybridSearch:
def __init__(self, db_path="hybrid.db"):
self.db = sqlite3.connect(db_path)
self.model = SentenceTransformer('all-MiniLM-L6-v2')
self._init_db()
def _init_db(self):
self.db.executescript("""
CREATE TABLE IF NOT EXISTS documents (
id INTEGER PRIMARY KEY,
title TEXT,
content TEXT,
embedding BLOB
);
CREATE VIRTUAL TABLE IF NOT EXISTS documents_fts
USING fts5(title, content, content='documents', content_rowid='id');
""")
def index(self, title: str, content: str):
embedding = self.model.encode(content)
cursor = self.db.execute(
"INSERT INTO documents (title, content, embedding) VALUES (?, ?, ?)",
(title, content, embedding.tobytes())
)
doc_id = cursor.lastrowid
self.db.execute(
"INSERT INTO documents_fts (rowid, title, content) VALUES (?, ?, ?)",
(doc_id, title, content)
)
self.db.commit()
return doc_id
def search(self, query: str, limit: int = 10, k: int = 60) -> list:
# Keyword search with BM25
keyword_results = self.db.execute("""
SELECT rowid, bm25(documents_fts) AS score
FROM documents_fts
WHERE documents_fts MATCH ?
ORDER BY score
LIMIT ?
""", (query, limit * 2)).fetchall()
# Vector search
query_vec = self.model.encode(query)
all_docs = self.db.execute(
"SELECT id, title, embedding FROM documents"
).fetchall()
vector_scores = []
for doc_id, title, emb_bytes in all_docs:
doc_vec = np.frombuffer(emb_bytes, dtype=np.float32)
score = np.dot(query_vec, doc_vec) / (
np.linalg.norm(query_vec) * np.linalg.norm(doc_vec)
)
vector_scores.append((doc_id, score))
vector_scores.sort(key=lambda x: x[1], reverse=True)
vector_results = vector_scores[:limit * 2]
# Reciprocal Rank Fusion
rrf_scores = {}
for rank, (doc_id, _) in enumerate(keyword_results, start=1):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)
for rank, (doc_id, _) in enumerate(vector_results, start=1):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)
# Get top results with document data
sorted_ids = sorted(rrf_scores.keys(), key=lambda x: rrf_scores[x], reverse=True)
results = []
for doc_id in sorted_ids[:limit]:
doc = self.db.execute(
"SELECT title, content FROM documents WHERE id = ?", (doc_id,)
).fetchone()
results.append({
"id": doc_id,
"title": doc[0],
"content": doc[1][:200],
"score": rrf_scores[doc_id]
})
return results
# Usage
search = HybridSearch()
# Index documents
search.index("PostgreSQL Migration", "Steps to migrate from MySQL to PostgreSQL...")
search.index("Database Strategy", "When moving data stores, consider...")
search.index("Error ERR_429", "Rate limit exceeded. Wait 60 seconds...")
# Hybrid search
results = search.search("database migration")
for r in results:
print(f"{r['score']:.4f} | {r['title']}")
Tuning Hybrid Search
Three parameters matter:
| Parameter | Effect | Typical Value |
|---|---|---|
| RRF k | Higher k reduces top-rank dominance | 60 |
| Candidate count | More candidates = better recall | 20-100 per method |
| Weight ratio | Keyword vs semantic balance | Start 50/50 |
Test with queries that failed under single methods. If “ERR_429” doesn’t find the error doc, increase keyword weight. If “rate limiting strategies” misses conceptual matches, increase semantic weight.
Integration with RAG
Hybrid search improves retrieval for LLM-powered queries:
def ask(question: str):
# Hybrid retrieval gets better context
results = hybrid_search.search(question, limit=5)
context = "\n\n".join([f"# {r['title']}\n{r['content']}" for r in results])
prompt = f"""Based on my notes:
{context}
Question: {question}
Answer using only the context above."""
return llm.complete(prompt)
The LLM can only work with what you give it. If retrieval misses the relevant error code doc, the answer will be wrong. Hybrid search catches both the exact code snippets and the conceptual explanations.
Next: Personal Search
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.