Eugene Yan's Personal AI Workflow

Table of content

Eugene Yan is a Principal Applied Scientist at Amazon building recommendation systems and AI-powered products at scale. But outside work, he builds personal AI tools — and that’s where it gets interesting. He’s shipped Obsidian-Copilot, news-agents, an AI Coach, and writes extensively about practical LLM patterns.

What makes Eugene’s approach notable: he treats personal tools as learning vehicles. Each prototype explores a different AI pattern — RAG, multi-agent orchestration, evaluation. The byproduct is useful software. The real output is knowledge that applies to his day job.

Obsidian-Copilot: RAG for your notes

The question Eugene wanted to answer: “What would a copilot for writing and thinking look like?”

Obsidian-Copilot uses retrieval-augmented generation on your personal notes. Give it a section header, it drafts paragraphs pulling from relevant notes. Write a daily journal, it helps reflect on the week.

The interesting technical choice: chunking by top-level bullets instead of token length.

for line in lines:
    if '##' in line:  # Section header
        current_header = line
    
    if line.startswith('- '):  # Top-level bullet
        if current_chunk:
            if len(current_chunk) >= min_chunk_lines:
                chunks[chunk_idx] = current_chunk
                chunk_idx += 1
            current_chunk = []
        if current_header:
            current_chunk.append(current_header)
        current_chunk.append(line)

Why? Eugene found token-based chunking “didn’t work very well.” Bullet-based chunks preserve logical units of thought. Each chunk is a top-level bullet plus its sub-bullets — roughly paragraph-length, but semantically complete.

For retrieval, he combines BM25 + semantic search. Another lesson from experimentation: embedding-based retrieval alone isn’t enough. Classical keyword search catches things embeddings miss.

The architecture: OpenSearch for BM25, e5-small-v2 for embeddings (384 dimensions, small and fast), FastAPI server, Obsidian plugin for the UI.

News-agents: Multi-agent orchestration

News-agents generates daily news recaps. It’s built on Amazon Q CLI and MCP (Model Context Protocol), but the real lesson is the multi-agent pattern:

Main Agent (coordinator)
├── Read feeds.txt
├── Split feeds into 3 chunks
├── Spawn 3 Sub-Agents (separate tmux panes)
│   ├── Sub-Agent #1 → processes HackerNews, WSJ Tech
│   ├── Sub-Agent #2 → processes TechCrunch, AI News
│   └── Sub-Agent #3 → processes Wired, WSJ Markets
└── Combine into main-summary.md

Each news feed has its own RSS parser. The MCP server exposes them as tools:

@mcp.tool()
async def get_hackernews_stories(
    feed_url: str = DEFAULT_HN_RSS_URL, count: int = 30
) -> str:
    """Get top stories from Hacker News."""
    rss_content = await fetch_hn_rss(feed_url)
    stories = parse_hn_rss(rss_content)
    return "\n---\n".join([format_hn_story(s) for s in stories[:count]])

The sub-agents run in visible tmux panes so you can watch them work. When all finish, the main agent reads individual summaries and synthesizes a cross-source analysis: trending topics, category distribution, overlapping themes.

Practical output: a daily digest categorized by AI/ML, Business, Technology, with statistics like “Total Items: 124, Sources: 6, AI/ML mentions: 31 (25%)”.

The prototyping philosophy

Eugene maintains a prototyping page with 13+ projects. The common thread: each explores a specific pattern with real utility.

Obsidian-Copilot → RAG + hybrid retrieval
News-agents → multi-agent orchestration + MCP
AlignEval → LLM-as-judge evaluation
AI Reading Club → AI-powered reading experience
AI Coach → voice-based LLM interface (you can call it!)

This is deliberate practice for AI engineering. You learn more building something real than reading papers about it.

Patterns for LLM Systems

Eugene’s most influential piece is Patterns for Building LLM-based Systems & Products — 50k+ reads. The seven patterns:

Evals — measure before you optimize
RAG — add external knowledge
Fine-tuning — specialize for tasks
Caching — reduce latency/cost
Guardrails — ensure output quality
Defensive UX — anticipate errors gracefully
User feedback — build data flywheel

The key insight: these patterns aren’t mutually exclusive. You need evals before you know if fine-tuning helps. You need caching after you’ve validated with RAG. They stack.

He expanded this into Applied LLMs, co-authored with practitioners from Google, Anthropic, Microsoft, and more.

Weekly reflection workflow

Obsidian-Copilot includes a reflection feature. If you journal daily, it can summarize your week:

“Based on your entries, you spent significant time on Project X but expressed frustration about blocked dependencies. Consider: what can you unblock yourself? Your energy seemed highest on Tuesday when you did deep work in the morning.”

This is personal AI at its best — not generic advice, but insights grounded in your actual life.

Resources

eugeneyan.com — blog with 200+ posts
Obsidian-Copilot — RAG for your notes
news-agents — multi-agent news recap
Applied LLMs — practical lessons from building with LLMs
LLM Patterns — foundational article
@eugeneyan — active on Twitter/X