Simon Willison's Workflow

Table of content

Simon Willison co-created the Django web framework and has been blogging about web development since 2002. After co-founding Lanyrd (acquired by Eventbrite) and working as an engineering director, he now builds open source tools for working with data and LLMs. His blog is one of the most useful resources for practical AI development.

Simon Willison does something that sounds paranoid until you try it: he logs every LLM interaction to a SQLite database. Every prompt, every response, timestamped and searchable.

At first I thought this was overkill. Now I think it might be the most underrated AI workflow pattern out there.

The tool collection

Willison built a bunch of small CLI tools that work together. Each does one thing well:

llm — talk to any model from the terminal, logs everything to SQLite
datasette — explore and publish those databases
files-to-prompt — bundle files for context
ttok — count tokens before you blow your budget

They all connect through SQLite. That’s the trick — one query and you can find every time you asked about a specific error message, what the AI said, and whether it actually helped.

Why CLI over ChatGPT

The web interface works fine for most people. Willison uses the terminal instead, and the reason isn’t speed or convenience — it’s the logging.

pip install llm
llm keys set anthropic

# Ask a question
llm "explain this error" -m claude-3.5-sonnet

# Or pipe something in
cat error.log | llm "what's causing this?"

Every interaction goes to ~/.llm/logs.db. You can query it:

sqlite3 ~/.llm/logs.db "SELECT prompt, response
  FROM responses
  WHERE prompt LIKE '%django%'
  ORDER BY datetime DESC"

Six months later, when you hit the same error again, you can find what you asked before and whether it helped. I’ve started doing this and the historical context is genuinely useful. See LLM Logging Guide for setup instructions.

Throwaway projects in Claude Artifacts

Here’s something counterintuitive: Willison builds a lot of small tools he never ships. Throwaway projects in Claude Artifacts, just to learn how something works.

“Create an interactive SVG timeline from git log. Use D3.js.”

Two minutes later: working timeline. You learn D3 scales, SVG positioning, date parsing. Then you delete it, because it was never meant to be a product. The learning was the point.

This is different from vibe coding. You’re not trying to ship. You’re using AI to accelerate understanding of a library or technique you’ll use properly later.

TIL as a personal knowledge base

Willison has hundreds of “Today I Learned” entries. Short markdown files, no polish required:

til/
├── python/
│   └── sqlite-json-extract.md
├── javascript/
│   └── optional-chaining.md
└── llm/
    └── anthropic-prompt-caching.md

The workflow is dead simple: learn something, write it down, commit. No editing, no perfectionism. The point is capturing while it’s fresh.

echo "# SQLite JSON functions
You can extract JSON without Python:
\`\`\`sql
SELECT json_extract(data, '$.name') FROM records;
\`\`\`" > til/sqlite/json-extract.md

git add til/ && git commit -m "TIL: SQLite JSON" && git push

Six months later, you search your own TIL directory instead of Stack Overflow. The answer is in your own words, for your own context. See TIL System Guide for how to set this up.

Git worktrees for risky experiments

Same pattern as Boris Cherny, but Willison explicitly uses it for experiments he might throw away:

git worktree add ../datasette-feature feature/new-export
cd ../datasette-feature
# Let AI work here, main branch is untouched

If it works, merge. If not, git worktree remove and it’s gone. No messy revert commits.

Bundling context with files-to-prompt

One of his most useful tools. Bundle your entire codebase into a single prompt:

files-to-prompt src/*.py | llm "find security issues in this codebase"

Check token count first so you don’t blow your budget:

files-to-prompt src/ | ttok

The underlying philosophy

Willison is explicit about treating LLMs as autocomplete, not oracle. They hallucinate confidently. Test everything.

Include actual docs when working with APIs:

cat anthropic-api-docs.md | llm "write code using these exact patterns"

Be specific in prompts. “Add type hints, docstring with examples, handle empty input” — not “improve this function.”

Examples that stuck with me

Automated weeknotes:

git log --since="1 week ago" --format="%s" > commits.txt
ls -lt til/**/*.md | head -n 5 > recent-tils.txt
cat commits.txt recent-tils.txt | llm "write my weeknotes"

Your git history and TILs become a first draft of what you accomplished.

Screenshot diffs with shot-scraper:

shot-scraper https://myapp.com/feature -o before.png
# Make changes...
shot-scraper https://myapp.com/feature -o after.png
llm "compare these screenshots" -a before.png -a after.png

Visual regression testing without the infrastructure.

What I adopted

The logging is the big one. I resisted it for months — felt like overkill. Then I hit the same obscure error I’d solved before and couldn’t remember how. Now everything goes to SQLite.

The TIL habit took longer to stick. The trick was lowering the bar: no polish, no editing, just capture. Most entries are three lines. That’s fine.

Start with pip install llm, one API key, and the commitment to use it for a week instead of the web interface. After a week, check your log count:

sqlite3 ~/.llm/logs.db "SELECT COUNT(*) FROM responses"

That’s how many times AI helped you. Also how many times you can search later.

Next: Getting Started