Vector Databases for Personal RAG
Table of content
Your notes live in folders. Your brain connects them by meaning.
Vector databases bridge that gap. They store embeddings—numeric representations of text—and find similar content fast. For personal RAG (retrieval-augmented generation), you need one. But which?
I tested four options against real PKM constraints: cost sensitivity, local-first preference, and the ~10K-100K document scale most personal systems hit.
The Comparison
| Database | Type | Self-host | Cloud | Free tier | Cost at 50K docs | Best for |
|---|---|---|---|---|---|---|
| Chroma | Embedded | ✅ | ❌ | N/A | $0 | Prototyping, small PKM |
| Qdrant | Standalone | ✅ | ✅ | 1GB free | ~$9/mo | Production local-first |
| pgvector | Postgres ext | ✅ | ✅ | Varies | ~$15/mo | Existing Postgres users |
| Pinecone | Managed | ❌ | ✅ | 100K vectors | ~$70/mo | Zero-ops cloud |
Performance Reality Check
Benchmarks from ANN Benchmarks and Vectorview:
- Queries per second: Qdrant ~6300, Chroma ~unknown (Python overhead), pgvector ~141
- Latency at 99% recall: pgvector 8ms, Pinecone 1ms (batched)
- Memory footprint: Chroma runs in-process, Qdrant needs ~500MB baseline
For personal scale (under 100K vectors), all four work fine. The bottleneck is your embedding model, not the database.
Setup Examples
Chroma (Simplest Start)
# pip install chromadb
import chromadb
client = chromadb.PersistentClient(path="./chroma_data")
collection = client.create_collection("notes")
# Add documents
collection.add(
documents=["Meeting notes from Monday", "Ideas for the garden project"],
ids=["note_1", "note_2"]
)
# Query
results = collection.query(
query_texts=["project planning"],
n_results=5
)
Chroma handles embeddings automatically using sentence-transformers. Your data stays in ./chroma_data. Done.
Qdrant (Production-Ready Local)
# Docker one-liner
docker run -p 6333:6333 -v ./qdrant_data:/qdrant/storage qdrant/qdrant
# pip install qdrant-client sentence-transformers
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
client = QdrantClient("localhost", port=6333)
encoder = SentenceTransformer("all-MiniLM-L6-v2")
# Create collection
client.create_collection(
collection_name="notes",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Add documents
docs = ["Meeting notes from Monday", "Ideas for the garden project"]
vectors = encoder.encode(docs)
client.upsert(
collection_name="notes",
points=[
PointStruct(id=i, vector=v.tolist(), payload={"text": doc})
for i, (v, doc) in enumerate(zip(vectors, docs))
]
)
# Query
query_vector = encoder.encode("project planning")
hits = client.search(
collection_name="notes",
query_vector=query_vector.tolist(),
limit=5
)
More code, but you get filtering, snapshots, and a web UI at localhost:6333/dashboard.
pgvector (If You Already Use Postgres)
-- Enable extension
CREATE EXTENSION vector;
-- Create table
CREATE TABLE notes (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(384)
);
-- Create index (HNSW for speed)
CREATE INDEX ON notes USING hnsw (embedding vector_cosine_ops);
# pip install psycopg2-binary sentence-transformers
import psycopg2
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer("all-MiniLM-L6-v2")
conn = psycopg2.connect("postgresql://localhost/mydb")
# Insert
doc = "Meeting notes from Monday"
embedding = encoder.encode(doc).tolist()
with conn.cursor() as cur:
cur.execute(
"INSERT INTO notes (content, embedding) VALUES (%s, %s)",
(doc, embedding)
)
conn.commit()
# Query
query_vec = encoder.encode("project planning").tolist()
with conn.cursor() as cur:
cur.execute("""
SELECT content, 1 - (embedding <=> %s::vector) as similarity
FROM notes
ORDER BY embedding <=> %s::vector
LIMIT 5
""", (query_vec, query_vec))
results = cur.fetchall()
The <=> operator is cosine distance. Use <-> for L2 (Euclidean).
Pinecone (Managed Cloud)
# pip install pinecone-client sentence-transformers
from pinecone import Pinecone
from sentence_transformers import SentenceTransformer
pc = Pinecone(api_key="YOUR_API_KEY")
encoder = SentenceTransformer("all-MiniLM-L6-v2")
# Create index (one-time)
pc.create_index(
name="notes",
dimension=384,
metric="cosine",
spec={"serverless": {"cloud": "aws", "region": "us-east-1"}}
)
index = pc.Index("notes")
# Upsert
docs = ["Meeting notes from Monday", "Ideas for the garden project"]
vectors = encoder.encode(docs)
index.upsert(vectors=[
{"id": f"note_{i}", "values": v.tolist(), "metadata": {"text": doc}}
for i, (v, doc) in enumerate(zip(vectors, docs))
])
# Query
query_vec = encoder.encode("project planning").tolist()
results = index.query(vector=query_vec, top_k=5, include_metadata=True)
Free tier gives you 100K vectors. Beyond that, expect $70+/month.
When to Use Which
Start with Chroma if:
- You’re prototyping or learning
- Your PKM is under 10K documents
- You want zero infrastructure
Choose Qdrant if:
- You want local-first with room to grow
- You need metadata filtering (e.g., search only in #work notes)
- You might migrate to their cloud later
Stick with pgvector if:
- You already run Postgres for other data
- You want transactional consistency (notes + vectors in one commit)
- You’re comfortable with SQL
Pay for Pinecone if:
- You hate ops work
- You need multi-region availability
- Budget isn’t the constraint
For most personal PKM projects, Qdrant or Chroma hits the sweet spot. Both are open source, both run locally, and both scale past what you’ll need.
Hybrid Search Matters
Pure vector search misses exact matches. “PostgreSQL configuration” might return results about “database setup” but miss documents that literally say “PostgreSQL configuration.”
Combine vector similarity with keyword search. Qdrant and Pinecone support this natively. For pgvector, add full-text search:
-- Add tsvector column
ALTER TABLE notes ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
CREATE INDEX ON notes USING gin(tsv);
-- Hybrid query
SELECT content,
ts_rank(tsv, query) * 0.3 + (1 - (embedding <=> qvec)) * 0.7 as score
FROM notes, to_tsquery('english', 'postgresql & configuration') query
WHERE tsv @@ query
ORDER BY score DESC
LIMIT 5;
See hybrid search for the full implementation.
What You Can Steal
Chroma for weekend projects—
pip install chromadband you’re searching in 5 minutes.Qdrant Docker one-liner—persistent storage, web dashboard, production-grade filtering.
pgvector HNSW index—don’t use IVFFlat for small datasets; HNSW is faster.
Hybrid scoring formula—
keyword_score * 0.3 + vector_score * 0.7works well for most queries.Cost ceiling awareness—at personal scale, you should spend $0-15/month, not $70+.
Related Reading
- Personal Search Architecture—how vector databases fit into your PKM stack
- Hybrid Search—combining keywords and vectors
Next: Embedding Models for PKM—which model creates the best vectors for your notes.
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.