Vector Databases for Personal RAG

Table of content

Your notes live in folders. Your brain connects them by meaning.

Vector databases bridge that gap. They store embeddings—numeric representations of text—and find similar content fast. For personal RAG (retrieval-augmented generation), you need one. But which?

I tested four options against real PKM constraints: cost sensitivity, local-first preference, and the ~10K-100K document scale most personal systems hit.

The Comparison

Database	Type	Self-host	Cloud	Free tier	Cost at 50K docs	Best for
Chroma	Embedded	✅	❌	N/A	$0	Prototyping, small PKM
Qdrant	Standalone	✅	✅	1GB free	~$9/mo	Production local-first
pgvector	Postgres ext	✅	✅	Varies	~$15/mo	Existing Postgres users
Pinecone	Managed	❌	✅	100K vectors	~$70/mo	Zero-ops cloud

Performance Reality Check

Benchmarks from ANN Benchmarks and Vectorview:

Queries per second: Qdrant ~6300, Chroma ~unknown (Python overhead), pgvector ~141
Latency at 99% recall: pgvector 8ms, Pinecone 1ms (batched)
Memory footprint: Chroma runs in-process, Qdrant needs ~500MB baseline

For personal scale (under 100K vectors), all four work fine. The bottleneck is your embedding model, not the database.

Setup Examples

Chroma (Simplest Start)

# pip install chromadb
import chromadb

client = chromadb.PersistentClient(path="./chroma_data")
collection = client.create_collection("notes")

# Add documents
collection.add(
    documents=["Meeting notes from Monday", "Ideas for the garden project"],
    ids=["note_1", "note_2"]
)

# Query
results = collection.query(
    query_texts=["project planning"],
    n_results=5
)

Chroma handles embeddings automatically using sentence-transformers. Your data stays in ./chroma_data. Done.

Qdrant (Production-Ready Local)

# Docker one-liner
docker run -p 6333:6333 -v ./qdrant_data:/qdrant/storage qdrant/qdrant

# pip install qdrant-client sentence-transformers
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer

client = QdrantClient("localhost", port=6333)
encoder = SentenceTransformer("all-MiniLM-L6-v2")

# Create collection
client.create_collection(
    collection_name="notes",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

# Add documents
docs = ["Meeting notes from Monday", "Ideas for the garden project"]
vectors = encoder.encode(docs)

client.upsert(
    collection_name="notes",
    points=[
        PointStruct(id=i, vector=v.tolist(), payload={"text": doc})
        for i, (v, doc) in enumerate(zip(vectors, docs))
    ]
)

# Query
query_vector = encoder.encode("project planning")
hits = client.search(
    collection_name="notes",
    query_vector=query_vector.tolist(),
    limit=5
)

More code, but you get filtering, snapshots, and a web UI at localhost:6333/dashboard.

pgvector (If You Already Use Postgres)

-- Enable extension
CREATE EXTENSION vector;

-- Create table
CREATE TABLE notes (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(384)
);

-- Create index (HNSW for speed)
CREATE INDEX ON notes USING hnsw (embedding vector_cosine_ops);

# pip install psycopg2-binary sentence-transformers
import psycopg2
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2")
conn = psycopg2.connect("postgresql://localhost/mydb")

# Insert
doc = "Meeting notes from Monday"
embedding = encoder.encode(doc).tolist()
with conn.cursor() as cur:
    cur.execute(
        "INSERT INTO notes (content, embedding) VALUES (%s, %s)",
        (doc, embedding)
    )
conn.commit()

# Query
query_vec = encoder.encode("project planning").tolist()
with conn.cursor() as cur:
    cur.execute("""
        SELECT content, 1 - (embedding <=> %s::vector) as similarity
        FROM notes
        ORDER BY embedding <=> %s::vector
        LIMIT 5
    """, (query_vec, query_vec))
    results = cur.fetchall()

The <=> operator is cosine distance. Use <-> for L2 (Euclidean).

Pinecone (Managed Cloud)

# pip install pinecone-client sentence-transformers
from pinecone import Pinecone
from sentence_transformers import SentenceTransformer

pc = Pinecone(api_key="YOUR_API_KEY")
encoder = SentenceTransformer("all-MiniLM-L6-v2")

# Create index (one-time)
pc.create_index(
    name="notes",
    dimension=384,
    metric="cosine",
    spec={"serverless": {"cloud": "aws", "region": "us-east-1"}}
)

index = pc.Index("notes")

# Upsert
docs = ["Meeting notes from Monday", "Ideas for the garden project"]
vectors = encoder.encode(docs)
index.upsert(vectors=[
    {"id": f"note_{i}", "values": v.tolist(), "metadata": {"text": doc}}
    for i, (v, doc) in enumerate(zip(vectors, docs))
])

# Query
query_vec = encoder.encode("project planning").tolist()
results = index.query(vector=query_vec, top_k=5, include_metadata=True)

Free tier gives you 100K vectors. Beyond that, expect $70+/month.

When to Use Which

Start with Chroma if:

You’re prototyping or learning
Your PKM is under 10K documents
You want zero infrastructure

Choose Qdrant if:

You want local-first with room to grow
You need metadata filtering (e.g., search only in #work notes)
You might migrate to their cloud later

Stick with pgvector if:

You already run Postgres for other data
You want transactional consistency (notes + vectors in one commit)
You’re comfortable with SQL

Pay for Pinecone if:

You hate ops work
You need multi-region availability
Budget isn’t the constraint

For most personal PKM projects, Qdrant or Chroma hits the sweet spot. Both are open source, both run locally, and both scale past what you’ll need.

Hybrid Search Matters

Pure vector search misses exact matches. “PostgreSQL configuration” might return results about “database setup” but miss documents that literally say “PostgreSQL configuration.”

Combine vector similarity with keyword search. Qdrant and Pinecone support this natively. For pgvector, add full-text search:

-- Add tsvector column
ALTER TABLE notes ADD COLUMN tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

CREATE INDEX ON notes USING gin(tsv);

-- Hybrid query
SELECT content,
       ts_rank(tsv, query) * 0.3 + (1 - (embedding <=> qvec)) * 0.7 as score
FROM notes, to_tsquery('english', 'postgresql & configuration') query
WHERE tsv @@ query
ORDER BY score DESC
LIMIT 5;

See hybrid search for the full implementation.

What You Can Steal

Chroma for weekend projects—pip install chromadb and you’re searching in 5 minutes.
Qdrant Docker one-liner—persistent storage, web dashboard, production-grade filtering.
pgvector HNSW index—don’t use IVFFlat for small datasets; HNSW is faster.
Hybrid scoring formula—keyword_score * 0.3 + vector_score * 0.7 works well for most queries.
Cost ceiling awareness—at personal scale, you should spend $0-15/month, not $70+.

Personal Search Architecture—how vector databases fit into your PKM stack
Hybrid Search—combining keywords and vectors

Next: Embedding Models for PKM—which model creates the best vectors for your notes.