Episodic Memory for LLM Agents
Table of content
Your agent knows facts. It follows rules. But ask it “what happened last Tuesday?” and you get nothing. That’s the episodic memory gap.
Three Memory Types
The CoALA framework (Sumers et al. 2023) defines three memory types for language agents:
| Type | What it stores | Example | Implementation |
|---|---|---|---|
| Semantic | General facts | “User prefers dark mode” | RAG, knowledge bases |
| Procedural | How to do things | “Run tests before commit” | CLAUDE.md rules |
| Episodic | Specific events | “Tuesday we debugged auth” | Session logs, diaries |
Most agent systems implement semantic and procedural memory well. Episodic memory remains underbuilt.
Why Episodic Matters
A February 2025 position paper, “Episodic Memory is the Missing Piece for Long-Term LLM Agents” (Pink et al.), argues episodic memory does things the other types cannot:
Single-shot learning. You tell an agent once that a particular API returns pagination tokens. It should remember that specific interaction, not generalize it into a rule first.
Contextual retrieval. “What did we try when the build broke?” pulls relevant episodes even without exact keyword matches. You retrieve by context, not just content.
Temporal grounding. “Before the refactor” vs “after we added caching” changes what’s relevant. Episodic memory knows when things happened.
Five Properties
The position paper identifies five properties that episodic memory must have:
| Property | Description |
|---|---|
| Long-term storage | Persist across sessions and context windows |
| Explicit reasoning | Reflect on and query memories directly |
| Single-shot learning | Capture experiences from single exposures |
| Instance-specific | Store particular events, not generalizations |
| Contextualized | Bind when, where, why to each memory |
Working memory (the context window) has the last four but lacks long-term storage. Semantic memory has long-term storage and explicit reasoning but lacks instance specificity and context binding.
Implementation Approaches
Session Logging
Store raw conversation logs. Query them later with vector search.
# Directory structure
~/.claude/projects/myproject/sessions/
├── 2026-01-15-auth-debug.jsonl
├── 2026-01-18-perf-optimization.jsonl
└── 2026-01-21-feature-deploy.jsonl
Pros: Simple, complete record. Cons: Noisy, expensive to search at scale.
Structured Diaries
Lance Martin’s Claude Diary takes a different approach. Instead of logging everything, it captures structured summaries:
# Session: 2026-01-21
## Accomplished
- Fixed race condition in auth flow
- Added retry logic to API client
## Decisions
- Chose exponential backoff over fixed delay
- Kept timeout at 30s despite suggestion to increase
## Challenges
- Mock server didn't match production behavior
- Test flakiness from shared state
The /diary command captures these at session end. The /reflect command later analyzes patterns across entries.
Episodic Memory MCP
The episodic-memory plugin provides searchable storage:
# Install
claude mcp add episodic-memory
# Searches past conversations
claude "What approach did we use for rate limiting?"
The plugin indexes session logs into a SQLite database with embeddings. Queries return relevant conversation snippets with timestamps and project context.
Retrieval Patterns
Vector Search
Embed queries and memories. Return semantically similar episodes.
# Pseudocode
query_embedding = embed("debugging authentication")
results = vector_db.search(query_embedding, top_k=5)
Good for “find similar situations.” Misses exact matches and temporal queries.
Hybrid Search
Combine vector similarity with keyword matching:
| Method | Good For |
|---|---|
| Vector only | Conceptual similarity |
| Keyword only | Exact terms, names |
| Hybrid | Most real queries |
Temporal Filters
Add date ranges to narrow results:
# Find episodes from before the refactor
results = search(
query="performance issues",
before="2026-01-15"
)
What to Store
Not everything deserves episodic storage. Focus on:
| Store | Skip |
|---|---|
| Decisions and their reasoning | Routine file reads |
| Debugging sessions | Standard completions |
| User corrections | Successful outputs |
| Failed approaches | Intermediate steps |
| Context that influenced choices | Boilerplate generation |
Store episodes that inform future decisions, not a complete transcript.
From Episodes to Rules
Episodic memory feeds procedural memory. The pattern:
- Store specific episodes (episodic)
- Identify recurring patterns across episodes
- Synthesize into rules (procedural)
- Archive source episodes
Claude Diary’s reflect command automates step 2-3. It finds patterns like “user always requests atomic commits” and proposes a CLAUDE.md rule.
Episodes (raw experiences)
↓ reflection
Patterns (identified themes)
↓ synthesis
Rules (procedural memory)
This matches how humans learn: specific experiences first, abstractions later.
Getting Started
Start simple:
- Enable session logging. Let episodes accumulate for a week.
- Install episodic-memory MCP. Query past sessions when you’re stuck.
- Add the diary pattern. Summarize sessions that matter.
- Run reflection monthly. Look for patterns to promote to rules.
Remove stale episodes. Strengthen useful rules. Repeat.
Trade-offs
| Approach | Storage | Query Speed | Completeness |
|---|---|---|---|
| Raw logs | High | Slow | Total |
| Structured diaries | Medium | Fast | Curated |
| Embeddings only | Low | Fast | Lossy |
| Hybrid | Medium | Medium | Balanced |
For most personal systems: structured diaries plus vector search over summaries. Keep raw logs if you need them for compliance or debugging.
Links
- Position Paper: Episodic Memory for LLM Agents (Pink et al. 2025)
- CoALA: Cognitive Architectures for Language Agents (Sumers et al. 2023)
- episodic-memory plugin
- Claude Diary
Next: Lance Martin’s Claude Diary
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.