Max Woolf's Selective AI Approach

Table of content
Max Woolf's Selective AI Approach

Max Woolf builds tools that help millions of people generate AI text. He created gpt-2-simple, aitextgen, and simpleaichat. As a Senior Data Scientist at BuzzFeed, he shipped AI-generated quizzes and content tools to massive audiences.

Here’s the twist: he barely uses generative LLMs for his own work.

Background

GitHub | Blog | LinkedIn | Bluesky

The Selective Use Philosophy

Most AI evangelists use LLMs constantly. Woolf takes the opposite stance: use them only where they provide real value, and be honest about the limitations.

His criteria for when to use LLMs:

Use CaseWhy It Works
Classification at scale80% accuracy is fine when humans review edge cases
Semantic clusteringGroups similar items without predefined categories
Style guide complianceChecks against rules with cited reasoning
Critical feedback simulationStress-tests ideas before publishing

What he explicitly avoids:

Avoided UseWhy
Writing blog postsEthical authorship concerns, recency bias
Coding assistantsContext switching destroys focus
Vibe codingUnprofessional for production systems
Companionship/chat“No fix for the lying”

API-First, Not Chat-First

Woolf accesses LLMs through backend APIs, not ChatGPT or Claude.ai. The reasoning: more control over parameters, better reproducibility, and cleaner integration.

His default settings:

# Woolf's preferred configuration
response = client.messages.create(
    model="claude-3-5-sonnet",
    temperature=0.0,  # Deterministic outputs
    system="You are a classifier. Return only the category name.",
    messages=[{"role": "user", "content": text}]
)

Temperature at 0 forces greedy decoding. The model always picks the highest-probability token. Less creative, but more predictable for classification tasks.

System prompts over user prompts. Constraints belong in the system message where they’re treated as authoritative, not suggestions.

simpleaichat: Minimal Wrapper

simpleaichat is Woolf’s Python package for LLM interactions. The design goal: minimal code complexity, maximum control.

from simpleaichat import AIChat

ai = AIChat(system="You are a helpful assistant.")
response = ai("What is 2+2?")

Features that matter:

What it deliberately lacks:

If you need those features, add them yourself. The library stays small.

Production Patterns at BuzzFeed

Woolf documented several real-world applications from his BuzzFeed work:

Taxonomy classification:

# Classify articles into predefined categories
def classify_article(title, content):
    prompt = f"""Classify this article into one category:
    - Entertainment
    - News
    - Shopping
    - Food

    Title: {title}
    Content excerpt: {content[:500]}

    Return only the category name."""

    return ai(prompt, temperature=0)

Gets 80% of the way to a working solution. Human reviewers handle the edge cases.

Style guide checking:

def check_style(text, guidelines):
    prompt = f"""Check this text against these guidelines:
    {guidelines}

    Text: {text}

    For each violation, cite the specific guideline."""

    return ai(prompt)

Returns violations with reasoning, not just pass/fail.

The Skeptic’s Checklist

Woolf maintains a list of LLM limitations he considers unsolved:

  1. Hallucination remains unfixed. LLMs confidently state false information. Critical for any factual use case.

  2. Recency bias in training data. Models don’t know about recent library changes or API updates.

  3. Library version confusion. LLMs suggest functions that exist in newer versions, causing silent failures.

  4. Focus destruction from inline suggestions. Copilot and similar tools interrupt the coding flow.

  5. Agents are incremental. MCP and tool use are useful but not the revolution some claim.

His verification pattern:

# Before trusting any LLM code suggestion
# 1. Check the function actually exists
python -c "from library import suggested_function"

# 2. Check the signature matches
python -c "import inspect; print(inspect.signature(func))"

# 3. Run against known test cases
pytest test_specific_function.py

Text Embeddings Over Generation

Woolf’s recent work focuses on embeddings rather than generation. His argument: embeddings are more useful and less prone to hallucination.

From his blog post on embeddings with Parquet and Polars:

import polars as pl

# Store embeddings portably
df = pl.DataFrame({
    "text": texts,
    "embedding": embeddings  # List of floats
})
df.write_parquet("embeddings.parquet")

# Load and search
df = pl.read_parquet("embeddings.parquet")
# Compute cosine similarity in Polars

No vector database required for casual projects. Parquet files are portable, fast, and don’t need a running service.

Key Takeaways

PrincipleImplementation
Use LLMs selectivelyClassification and clustering, not generation
API over chat interfaceTemperature=0, system prompts, structured output
Verify everythingFunctions exist, signatures match, tests pass
Embeddings over generationMore useful, less hallucination risk
Stay skepticalNo fix for lying, agents overhyped

Next: Ariya Hidayat’s Anti-Framework Approach

Topics: ai-coding open-source automation workflow