Personal Search: Searching Your Own Data
Table of content
Google searches everyone else’s data. A personal search engine searches yours: journals, notes, tweets, bookmarks, emails, contacts. Everything you’ve written or saved.
Why search your own data
Web search fails for personal queries:
| Question | Personal Search | |
|---|---|---|
| “That conversation about compilers with Alex” | Useless | Journal entry, March 2024 |
| “Why did we choose Postgres?” | Stack Overflow | Your decision notes |
| “Ideas I had about the API” | Nothing | Your scratch notes |
| “What did Sarah say about timelines?” | Can’t help | Meeting notes, emails |
Your past self solved problems, made decisions, recorded insights. That knowledge is trapped in scattered files. Personal search makes it accessible.
What to index
Everything you produce or curate:
| Source | Why |
|---|---|
| Notes | Your processed thoughts |
| Journals | Context, emotions, decisions |
| Tweets/posts | Public thinking, reactions |
| Bookmarks | Things you found valuable |
| Contacts | People context |
| Emails (sent) | Commitments, explanations |
| Code comments | Technical decisions |
| Voice memos | Fleeting ideas |
Start with notes and journals. Add sources as you find gaps.
Architecture
Three components:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Indexer │───▶│ Search │───▶│ Interface │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
Parse files Query matching Web UI or CLI
Extract text Rank results Display results
Store vectors Return top N Navigate sources
Linus Lee’s Monocle implements this pattern. See Linus Lee’s Custom AI Tools for the full breakdown. Debanjum Singh’s Khoj takes this further with RAG-based chat over your documents.
Semantic vs keyword search
| Type | How it works | Good for | Bad for |
|---|---|---|---|
| Keyword | Exact string matching | “Q3 roadmap”, names, dates | Concepts, fuzzy recall |
| Semantic | Vector similarity | “articles about remote work” | Exact phrases |
Use both. Keyword for precision, semantic for exploration.
# Keyword: exact match
results = search("PostgreSQL migration")
# Semantic: meaning match
results = search("database move", mode="semantic")
# Returns notes about "DB migration", "switching datastores", etc.
Building a simple version
SQLite + sentence embeddings. 50 lines of Python.
import sqlite3
import json
from pathlib import Path
from sentence_transformers import SentenceTransformer
class PersonalSearch:
def __init__(self, db_path="search.db"):
self.db = sqlite3.connect(db_path)
self.model = SentenceTransformer('all-MiniLM-L6-v2')
self._init_db()
def _init_db(self):
self.db.execute("""
CREATE TABLE IF NOT EXISTS documents (
id INTEGER PRIMARY KEY,
title TEXT,
content TEXT,
source TEXT,
embedding BLOB
)
""")
def index_file(self, path: Path):
content = path.read_text()
embedding = self.model.encode(content)
self.db.execute(
"INSERT INTO documents (title, content, source, embedding) VALUES (?, ?, ?, ?)",
(path.name, content, str(path), embedding.tobytes())
)
self.db.commit()
def search(self, query: str, limit: int = 10):
query_vec = self.model.encode(query)
# Fetch all and compute similarity (for small datasets)
rows = self.db.execute(
"SELECT title, content, source, embedding FROM documents"
).fetchall()
import numpy as np
results = []
for title, content, source, emb_bytes in rows:
doc_vec = np.frombuffer(emb_bytes, dtype=np.float32)
score = np.dot(query_vec, doc_vec) / (
np.linalg.norm(query_vec) * np.linalg.norm(doc_vec)
)
results.append((score, title, source, content[:200]))
results.sort(reverse=True)
return results[:limit]
# Usage
search = PersonalSearch()
# Index your notes
for note in Path("~/notes").expanduser().glob("**/*.md"):
search.index_file(note)
# Search
for score, title, source, snippet in search.search("API design decisions"):
print(f"{score:.2f} | {title}\n{snippet}\n")
For production, use a vector database (ChromaDB, pgvector, SQLite-VSS) instead of in-memory similarity.
AI as thought calculator
Connect search results to an LLM for synthesis:
def ask(question: str):
results = search.search(question, limit=5)
context = "\n\n".join([f"# {title}\n{content}" for _, title, _, content in results])
prompt = f"""Based on my notes:
{context}
Question: {question}
Answer based only on the notes above."""
return llm.complete(prompt)
# Usage
ask("What were my concerns about the database migration?")
ask("Summarize my thoughts on remote work")
ask("What did I decide about the API versioning?")
The LLM becomes a calculator for your thoughts. It doesn’t know anything. It manipulates your knowledge.
Privacy benefits
Personal search runs client-side:
| Cloud search | Personal search |
|---|---|
| Your queries sent to servers | Queries stay local |
| Your data indexed by others | Your data, your index |
| Results shaped by ads | Results shaped by relevance |
| Privacy policy changes | You control everything |
Monocle compiles the entire index at build time. Search runs in-browser. Nothing leaves your machine.
This enables searching sensitive content: journals, therapy notes, financial plans, private conversations. Content you’d never upload to a cloud service.
Getting started
Week 1: Minimal version
# Install dependencies
pip install sentence-transformers
# Create search.py with the code above
# Index your notes
python -c "
from search import PersonalSearch
from pathlib import Path
s = PersonalSearch()
for f in Path('~/notes').expanduser().glob('**/*.md'):
s.index_file(f)
print('Indexed')
"
Week 2: Add sources
Add Twitter archive, bookmarks, journal entries. Each source needs a parser:
def index_tweets(archive_path: str):
import json
tweets = json.load(open(archive_path))
for tweet in tweets:
search.index_document(
title=f"Tweet {tweet['id']}",
content=tweet['full_text'],
source=f"twitter:{tweet['id']}"
)
Week 3: Build interface
Options:
- CLI script with
fzffor fuzzy selection - Simple Flask/FastAPI web UI
- Alfred/Raycast plugin
- Browser extension (like Monocle)
Week 4: Connect to AI
Add LLM synthesis for multi-document queries. Use Claude, GPT, or local models (Ollama).
It gets better over time
Personal search improves as your index grows:
- More content means more answers
- Search patterns reveal gaps in your knowledge
- Retrieved context shapes what you capture next
- Old ideas show up when you actually need them
Your past self becomes useful. Those notes you forgot about? Now they resurface.
Next: Linus Lee’s Custom AI Tools
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.