Episodic Memory for LLM Agents

Table of content

Your agent knows facts. It follows rules. But ask it “what happened last Tuesday?” and you get nothing. That’s the episodic memory gap.

Three Memory Types

The CoALA framework (Sumers et al. 2023) defines three memory types for language agents:

Type	What it stores	Example	Implementation
Semantic	General facts	“User prefers dark mode”	RAG, knowledge bases
Procedural	How to do things	“Run tests before commit”	CLAUDE.md rules
Episodic	Specific events	“Tuesday we debugged auth”	Session logs, diaries

Most agent systems implement semantic and procedural memory well. Episodic memory remains underbuilt.

Why Episodic Matters

A February 2025 position paper, “Episodic Memory is the Missing Piece for Long-Term LLM Agents” (Pink et al.), argues episodic memory does things the other types cannot:

Single-shot learning. You tell an agent once that a particular API returns pagination tokens. It should remember that specific interaction, not generalize it into a rule first.

Contextual retrieval. “What did we try when the build broke?” pulls relevant episodes even without exact keyword matches. You retrieve by context, not just content.

Temporal grounding. “Before the refactor” vs “after we added caching” changes what’s relevant. Episodic memory knows when things happened.

Five Properties

The position paper identifies five properties that episodic memory must have:

Property	Description
Long-term storage	Persist across sessions and context windows
Explicit reasoning	Reflect on and query memories directly
Single-shot learning	Capture experiences from single exposures
Instance-specific	Store particular events, not generalizations
Contextualized	Bind when, where, why to each memory

Working memory (the context window) has the last four but lacks long-term storage. Semantic memory has long-term storage and explicit reasoning but lacks instance specificity and context binding.

Implementation Approaches

Session Logging

Store raw conversation logs. Query them later with vector search.

# Directory structure
~/.claude/projects/myproject/sessions/
├── 2026-01-15-auth-debug.jsonl
├── 2026-01-18-perf-optimization.jsonl
└── 2026-01-21-feature-deploy.jsonl

Pros: Simple, complete record. Cons: Noisy, expensive to search at scale.

Structured Diaries

Lance Martin’s Claude Diary takes a different approach. Instead of logging everything, it captures structured summaries:

# Session: 2026-01-21

## Accomplished
- Fixed race condition in auth flow
- Added retry logic to API client

## Decisions
- Chose exponential backoff over fixed delay
- Kept timeout at 30s despite suggestion to increase

## Challenges
- Mock server didn't match production behavior
- Test flakiness from shared state

The /diary command captures these at session end. The /reflect command later analyzes patterns across entries.

Episodic Memory MCP

The episodic-memory plugin provides searchable storage:

# Install
claude mcp add episodic-memory

# Searches past conversations
claude "What approach did we use for rate limiting?"

The plugin indexes session logs into a SQLite database with embeddings. Queries return relevant conversation snippets with timestamps and project context.

Retrieval Patterns

Vector Search

Embed queries and memories. Return semantically similar episodes.

# Pseudocode
query_embedding = embed("debugging authentication")
results = vector_db.search(query_embedding, top_k=5)

Good for “find similar situations.” Misses exact matches and temporal queries.

Hybrid Search

Combine vector similarity with keyword matching:

Method	Good For
Vector only	Conceptual similarity
Keyword only	Exact terms, names
Hybrid	Most real queries

Temporal Filters

Add date ranges to narrow results:

# Find episodes from before the refactor
results = search(
    query="performance issues",
    before="2026-01-15"
)

What to Store

Not everything deserves episodic storage. Focus on:

Store	Skip
Decisions and their reasoning	Routine file reads
Debugging sessions	Standard completions
User corrections	Successful outputs
Failed approaches	Intermediate steps
Context that influenced choices	Boilerplate generation

Store episodes that inform future decisions, not a complete transcript.

From Episodes to Rules

Episodic memory feeds procedural memory. The pattern:

Store specific episodes (episodic)
Identify recurring patterns across episodes
Synthesize into rules (procedural)
Archive source episodes

Claude Diary’s reflect command automates step 2-3. It finds patterns like “user always requests atomic commits” and proposes a CLAUDE.md rule.

Episodes (raw experiences)
    ↓ reflection
Patterns (identified themes)
    ↓ synthesis
Rules (procedural memory)

This matches how humans learn: specific experiences first, abstractions later.

Getting Started

Start simple:

Enable session logging. Let episodes accumulate for a week.
Install episodic-memory MCP. Query past sessions when you’re stuck.
Add the diary pattern. Summarize sessions that matter.
Run reflection monthly. Look for patterns to promote to rules.

Remove stale episodes. Strengthen useful rules. Repeat.

Trade-offs

Approach	Storage	Query Speed	Completeness
Raw logs	High	Slow	Total
Structured diaries	Medium	Fast	Curated
Embeddings only	Low	Fast	Lossy
Hybrid	Medium	Medium	Balanced

For most personal systems: structured diaries plus vector search over summaries. Keep raw logs if you need them for compliance or debugging.