Omar Khattab's DSPy Framework

Table of content

Omar Khattab is an Assistant Professor at MIT EECS and CSAIL who created DSPy, a framework that treats language models as programmable components rather than text generators requiring manual prompts. His research at Stanford produced both DSPy and ColBERT, two open-source projects downloaded millions of times monthly and adopted by Google, Amazon, IBM, and Databricks.

Khattab’s core insight: prompts are the assembly language of AI. Writing them by hand doesn’t scale. DSPy lets you write normal Python code with declarative specifications, then compiles optimized prompts automatically.

Background

PhD in Computer Science from Stanford (advised by Matei Zaharia and Christopher Potts)
Apple Scholars in AI/ML PhD Fellow
Research Scientist at Databricks (post-PhD)
Creator of ColBERT retrieval model, which shaped modern neural information retrieval
DSPy paper: ICLR 2024 Spotlight

GitHub | Twitter | Google Scholar

The DSPy Programming Model

DSPy introduces three abstractions that replace prompt engineering:

Abstraction	Purpose	Replaces
Signatures	Declare input/output behavior	Prompt templates
Modules	Composable LLM operations	Chain-of-thought scripts
Optimizers	Automatic prompt/weight tuning	Manual iteration

Signatures

Signatures are declarative specs, not prompts. You define what you want, not how to ask for it:

# Simple: question -> answer
classify = dspy.Predict('sentence -> sentiment: bool')

# With types: multiple inputs and outputs
qa = dspy.Predict('context, question -> reasoning: str, answer: str')

# Field names carry meaning: "question" differs from "query"

The runtime materializes these into prompts or fine-tuned weights. Your code stays clean.

Modules

Modules wrap prompting techniques. Swap implementations without changing logic:

# Basic prediction
predict = dspy.Predict('question -> answer')

# Same signature, adds reasoning step
cot = dspy.ChainOfThought('question -> answer')

# Same signature, uses tools
agent = dspy.ReAct('question -> answer', tools=[search, calculator])

Built-in modules:

Module	Behavior
`dspy.Predict`	Direct LLM call
`dspy.ChainOfThought`	Adds reasoning before output
`dspy.ProgramOfThought`	Generates and executes code
`dspy.ReAct`	Agent loop with tools
`dspy.MultiChainComparison`	Runs multiple chains, picks best

Optimizers

DSPy compiles your program against a metric. No manual prompt tuning:

# Define what "good" means
def accuracy(example, prediction):
    return example.answer == prediction.answer

# Compile with optimizer
optimizer = dspy.MIPROv2(metric=accuracy)
optimized_program = optimizer.compile(
    my_program,
    trainset=training_examples
)

Available optimizers:

Optimizer	Method
`BootstrapFewShot`	Auto-selects demonstrations
`MIPROv2`	Optimizes instructions + examples
`BootstrapFinetune`	Fine-tunes model weights
`GRPO`	Online reinforcement learning

Building a RAG Pipeline

Traditional approach: 500 lines of prompt templates, retriever glue code, and manual few-shot examples.

DSPy approach:

class RAG(dspy.Module):
    def __init__(self, num_docs=3):
        self.retrieve = dspy.Retrieve(k=num_docs)
        self.generate = dspy.ChainOfThought('context, question -> answer')

    def forward(self, question):
        docs = self.retrieve(question)
        return self.generate(context=docs, question=question)

# That's it. Compile against your eval set.
rag = RAG()
optimized_rag = dspy.MIPROv2(metric=answer_accuracy).compile(
    rag,
    trainset=qa_examples
)

Results from the DSPy paper: this simple pipeline outperforms hand-crafted prompts by 25-65% depending on the model.

Key Principles

Khattab summarizes DSPy’s philosophy:

“Prompts are what signatures wanted to be when they grow up”
“This is just software. Happens to be AI software.”
“DSPy asks you to not write prompts”

His view on “multi-agent systems”: after 6 years building AI architectures, he considers the term a distraction. It’s just programs calling programs.

Applying DSPy to Personal AI

For building personal AI systems, DSPy offers:

Use Case	DSPy Approach
Note summarization	Signature: `notes -> summary, key_points: list`
Task extraction	Module: `dspy.ChainOfThought('text -> tasks: list')`
Research assistant	RAG pipeline with your documents
Writing helper	Custom module composing outline + draft + edit

The compile step matters most. Define your quality metric (does the summary capture main points? does the task list match human extraction?) and let DSPy optimize.

Key Takeaways

Principle	Implementation
Declare, don’t prompt	Write signatures specifying behavior
Compose modules	Chain operations like normal functions
Compile against metrics	Let optimizers find best prompts/weights
Trust the abstraction	Runtime handles prompt engineering
Version your programs	Code is cleaner than prompt files

Links

Next: Jesse Vincent’s Superpowers Framework