Hybrid Search: Combining Keyword and Semantic Retrieval

Table of content

Vector search finds “database migration” when you search “moving data stores.” Keyword search finds “PostgreSQL” when you search “PostgreSQL.” Neither does both well. Hybrid search runs both methods and merges the results.

The Problem with Single Methods

Each approach fails in predictable ways:

Query	Keyword Search	Vector Search
“Q3-2024-roadmap.md”	Exact match found	Might rank it lower
“thoughts about APIs”	Misses “REST design notes”	Finds semantic matches
“error code ERR_429”	Precise hit	Gets lost in embeddings
“how to handle rate limiting”	Misses if no exact phrase	Understands intent

Error codes, filenames, and technical identifiers need exact matching. Conceptual queries need semantic understanding. Your personal knowledge base has both types of content.

How Hybrid Search Works

Run keyword search (BM25) and vector search in parallel, then combine results:

Query: "PostgreSQL migration strategy"

Keyword Search (BM25)           Vector Search (Embeddings)
─────────────────────           ──────────────────────────
1. postgres-migration.md        1. database-move-plan.md
2. migration-checklist.md       2. postgres-migration.md
3. sql-scripts/migrate.sql      3. switching-databases.md

                    ↓ Fusion ↓

Combined Results (Reciprocal Rank Fusion)
─────────────────────────────────────────
1. postgres-migration.md    (appeared in both)
2. database-move-plan.md    (strong semantic match)
3. migration-checklist.md   (keyword precision)

Documents appearing in both result sets get boosted. Documents with strong signals in either method still surface.

Reciprocal Rank Fusion

RRF merges ranked lists without needing comparable scores. The formula:

RRF_score = Σ 1/(k + rank)

Where k is a constant (typically 60) and rank is the position in each result list.

def reciprocal_rank_fusion(results_lists: list[list], k: int = 60) -> list:
    """Combine multiple ranked result lists using RRF."""
    scores = {}

    for results in results_lists:
        for rank, doc_id in enumerate(results, start=1):
            if doc_id not in scores:
                scores[doc_id] = 0
            scores[doc_id] += 1 / (k + rank)

    # Sort by combined score
    return sorted(scores.keys(), key=lambda x: scores[x], reverse=True)

# Example
keyword_results = ["doc_a", "doc_b", "doc_c"]
vector_results = ["doc_b", "doc_d", "doc_a"]

merged = reciprocal_rank_fusion([keyword_results, vector_results])
# doc_b ranks highest (position 2 in keyword, position 1 in vector)

RRF works because it:

Ignores raw scores (no normalization needed between different search methods)
Rewards documents found by multiple methods
Handles missing documents naturally (only contributes to score when present)

Implementation with pgvector

PostgreSQL with pgvector supports both full-text search and vector similarity:

-- Create table with both text search and vector columns
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding vector(384),
    content_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);

CREATE INDEX ON documents USING gin(content_tsv);
CREATE INDEX ON documents USING ivfflat(embedding vector_cosine_ops);

-- Hybrid search query
WITH keyword_search AS (
    SELECT id, ts_rank(content_tsv, query) AS keyword_score
    FROM documents, plainto_tsquery('english', 'database migration') query
    WHERE content_tsv @@ query
    ORDER BY keyword_score DESC
    LIMIT 20
),
vector_search AS (
    SELECT id, 1 - (embedding <=> $1) AS vector_score
    FROM documents
    ORDER BY embedding <=> $1
    LIMIT 20
),
rrf AS (
    SELECT
        COALESCE(k.id, v.id) AS id,
        COALESCE(1.0 / (60 + ROW_NUMBER() OVER (ORDER BY k.keyword_score DESC)), 0) +
        COALESCE(1.0 / (60 + ROW_NUMBER() OVER (ORDER BY v.vector_score DESC)), 0) AS score
    FROM keyword_search k
    FULL OUTER JOIN vector_search v ON k.id = v.id
)
SELECT d.*, rrf.score
FROM rrf
JOIN documents d ON d.id = rrf.id
ORDER BY rrf.score DESC
LIMIT 10;

Khoj’s Two-Stage Approach

Khoj uses bi-encoder embeddings for initial retrieval, then cross-encoder reranking for precision:

Query → Bi-Encoder → Top 100 candidates → Cross-Encoder → Top 10 results

The bi-encoder computes query and document embeddings separately (fast, scalable). The cross-encoder processes query-document pairs together (slow, accurate). For a personal knowledge base, you might have 10,000 documents. The bi-encoder narrows that to 100 candidates in milliseconds. The cross-encoder then spends its time on just those 100.

From the Khoj documentation:

“The search engine uses a two-stage retrieval approach: initial candidate retrieval via bi-encoder embeddings, followed by precise reranking using cross-encoder models.”

Configure the bi-encoder confidence threshold to balance recall and precision. Lower thresholds return more documents for the cross-encoder to rerank.

RAGFlow’s Hybrid Implementation

RAGFlow uses Elasticsearch for hybrid search, combining keyword and vector retrieval with configurable fusion:

Setting	Effect
Keyword weight 0.7, Vector weight 0.3	Favor exact matches
Keyword weight 0.3, Vector weight 0.7	Favor semantic understanding
60% keyword hit prerequisite	Require some lexical overlap

The prerequisite filter requires some word overlap before vector similarity kicks in. Without it, a query about “PostgreSQL” might return documents about “database philosophy” that never mention Postgres at all.

When to Use Each Approach

Scenario	Best Approach
Technical documentation with codes	Keyword-heavy hybrid
Journal entries, notes	Semantic-heavy hybrid
Mixed content (typical personal KB)	Balanced hybrid
Structured data (dates, IDs)	Keyword with metadata filters

For most personal search use cases, start with equal weights and adjust based on result quality.

Python Implementation

Complete hybrid search with sentence-transformers and SQLite FTS:

import sqlite3
from sentence_transformers import SentenceTransformer
import numpy as np

class HybridSearch:
    def __init__(self, db_path="hybrid.db"):
        self.db = sqlite3.connect(db_path)
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
        self._init_db()

    def _init_db(self):
        self.db.executescript("""
            CREATE TABLE IF NOT EXISTS documents (
                id INTEGER PRIMARY KEY,
                title TEXT,
                content TEXT,
                embedding BLOB
            );
            CREATE VIRTUAL TABLE IF NOT EXISTS documents_fts
            USING fts5(title, content, content='documents', content_rowid='id');
        """)

    def index(self, title: str, content: str):
        embedding = self.model.encode(content)
        cursor = self.db.execute(
            "INSERT INTO documents (title, content, embedding) VALUES (?, ?, ?)",
            (title, content, embedding.tobytes())
        )
        doc_id = cursor.lastrowid
        self.db.execute(
            "INSERT INTO documents_fts (rowid, title, content) VALUES (?, ?, ?)",
            (doc_id, title, content)
        )
        self.db.commit()
        return doc_id

    def search(self, query: str, limit: int = 10, k: int = 60) -> list:
        # Keyword search with BM25
        keyword_results = self.db.execute("""
            SELECT rowid, bm25(documents_fts) AS score
            FROM documents_fts
            WHERE documents_fts MATCH ?
            ORDER BY score
            LIMIT ?
        """, (query, limit * 2)).fetchall()

        # Vector search
        query_vec = self.model.encode(query)
        all_docs = self.db.execute(
            "SELECT id, title, embedding FROM documents"
        ).fetchall()

        vector_scores = []
        for doc_id, title, emb_bytes in all_docs:
            doc_vec = np.frombuffer(emb_bytes, dtype=np.float32)
            score = np.dot(query_vec, doc_vec) / (
                np.linalg.norm(query_vec) * np.linalg.norm(doc_vec)
            )
            vector_scores.append((doc_id, score))

        vector_scores.sort(key=lambda x: x[1], reverse=True)
        vector_results = vector_scores[:limit * 2]

        # Reciprocal Rank Fusion
        rrf_scores = {}

        for rank, (doc_id, _) in enumerate(keyword_results, start=1):
            rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)

        for rank, (doc_id, _) in enumerate(vector_results, start=1):
            rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank)

        # Get top results with document data
        sorted_ids = sorted(rrf_scores.keys(), key=lambda x: rrf_scores[x], reverse=True)

        results = []
        for doc_id in sorted_ids[:limit]:
            doc = self.db.execute(
                "SELECT title, content FROM documents WHERE id = ?", (doc_id,)
            ).fetchone()
            results.append({
                "id": doc_id,
                "title": doc[0],
                "content": doc[1][:200],
                "score": rrf_scores[doc_id]
            })

        return results

# Usage
search = HybridSearch()

# Index documents
search.index("PostgreSQL Migration", "Steps to migrate from MySQL to PostgreSQL...")
search.index("Database Strategy", "When moving data stores, consider...")
search.index("Error ERR_429", "Rate limit exceeded. Wait 60 seconds...")

# Hybrid search
results = search.search("database migration")
for r in results:
    print(f"{r['score']:.4f} | {r['title']}")

Tuning Hybrid Search

Three parameters matter:

Parameter	Effect	Typical Value
RRF k	Higher k reduces top-rank dominance	60
Candidate count	More candidates = better recall	20-100 per method
Weight ratio	Keyword vs semantic balance	Start 50/50

Test with queries that failed under single methods. If “ERR_429” doesn’t find the error doc, increase keyword weight. If “rate limiting strategies” misses conceptual matches, increase semantic weight.

Integration with RAG

Hybrid search improves retrieval for LLM-powered queries:

def ask(question: str):
    # Hybrid retrieval gets better context
    results = hybrid_search.search(question, limit=5)
    context = "\n\n".join([f"# {r['title']}\n{r['content']}" for r in results])

    prompt = f"""Based on my notes:

{context}

Question: {question}

Answer using only the context above."""

    return llm.complete(prompt)

The LLM can only work with what you give it. If retrieval misses the relevant error code doc, the answer will be wrong. Hybrid search catches both the exact code snippets and the conceptual explanations.

Next: Personal Search