Matt Rickard's Approach to Structured LLM Output

Table of content

Matt Rickard spent years at Google building the infrastructure tools developers actually use—minikube, skaffold, Kubeflow, distroless containers. The kind of projects that get millions of downloads but nobody knows who made them.

Now he’s obsessed with a different problem: getting reliable structure out of language models.

The Core Insight

Here’s his argument: LLMs are probabilistic. They generate tokens one at a time, each choice influenced by everything before it. But most applications need structure. JSON. Dates. Yes/no answers. Specific formats.

His solution? Constrain the generation as it happens.

ReLLM does this with regex. You give it a pattern like [0-9]{2}/[0-9]{2}/[0-9]{4} for a date, and it masks every token that would break that pattern. The model literally cannot generate invalid output.

Here’s what that looks like in practice:

Prompt: Return the first three letters of the alphabet in a JSON array:
Pattern: ["[a-z]", "[a-z]", "[a-z]"]

With ReLLM: ["a", "b", "c"]
Without: { "index": 0, "id":"1", "description":"", "text": "[{ "id": 0...

The unconstrained version goes completely off the rails. The constrained version has no choice but to give you exactly what you asked for.

Why This Matters

Traditional approaches to structured output involve:

Hoping the model follows your instructions
Parsing the output and retrying on failure
Complex prompt engineering with examples

All of these waste tokens and still fail. Matt’s approach eliminates the problem at the source.

He extended this thinking with ParserLLM, which uses context-free grammars instead of regex. Now you can constrain output to any valid JSON, any programming language syntax, any formal structure.

The Daily Blogging Practice

Matt’s been publishing every single day since May 2021. Over 800 posts. Short, dense observations about AI, engineering, and startups.

The posts read like someone thinking in public. No fluff, no SEO padding. Just observations:

“The Model is Not the Product”
“Strategies for the GPU-Poor”
“Unix Philosophy for AI”
“Static Sites Aren’t Simple Anymore”

He treats writing like compounding. Most individual posts don’t matter much. But the cumulative effect of publicly working through ideas builds something.

Personal AI Tools

Matt builds the tools he actually uses. His Standard Input suite includes:

Standard Workout: AI-powered lift logging
Standard Habits: habit tracking
Standard Reader: AI text-to-speech and chat

These aren’t trying to be everything. They’re small, focused utilities that solve specific problems.

He shared his daily LLM workflows in a 2023 post:

Sorting grocery lists by store section
Summarizing book highlights (he exports from Apple Books’ SQLite database)
Dictionary lookups for unfamiliar words
Breaking tasks into subtasks
Using LLMs as critical editors, not writers

Notice what’s missing? He’s not trying to make AI write his blog posts. He’s using it for mechanical tasks that have clear structure but would be tedious to do manually.

Browser-Based Inference

Another thread in his work: running models locally, in the browser, without servers.

@react-llm provides React hooks for WebGPU-based inference. Load a model once, run it anywhere. No API keys, no latency, no costs.

LLaMaTab takes this further—a Chrome extension that runs a full language model in your browser.

The vision here is independence. Not everything needs to hit an API. Local models give you privacy, speed, and control. Even small models become useful when you constrain their output properly.

Lessons From His Writing

A few principles that show up repeatedly:

Constrain, don’t hope. Whether it’s LLM output or system design, explicit constraints beat implicit assumptions.

Think step by step. Chain-of-thought works because putting reasoning on paper (or in tokens) forces clarity. He applies this to his own thinking.

Style is tangible but hard to describe. He fine-tuned GPT-3 on his own blog posts. It didn’t work well—but the exercise revealed how much style matters in ways that are hard to specify.

The next choice is sometimes obvious, sometimes not. LLMs show you token probabilities. Some sequences have one clear next step. Others are wide open. Good judgment means knowing which situation you’re in.

Where To Start

Read through his archive—pick topics that interest you, they’re all short
Play with ReLLM if you’re building structured output pipelines
Try @react-llm for browser-based inference experiments

His work is less about flashy AI products and more about solid engineering that makes AI reliable. The kind of infrastructure thinking you’d expect from someone who built minikube and skaffold.