Omar Khattab's DSPy Framework
Table of content

Omar Khattab is an Assistant Professor at MIT EECS and CSAIL who created DSPy, a framework that treats language models as programmable components rather than text generators requiring manual prompts. His research at Stanford produced both DSPy and ColBERT, two open-source projects downloaded millions of times monthly and adopted by Google, Amazon, IBM, and Databricks.
Khattab’s core insight: prompts are the assembly language of AI. Writing them by hand doesn’t scale. DSPy lets you write normal Python code with declarative specifications, then compiles optimized prompts automatically.
Background
- PhD in Computer Science from Stanford (advised by Matei Zaharia and Christopher Potts)
- Apple Scholars in AI/ML PhD Fellow
- Research Scientist at Databricks (post-PhD)
- Creator of ColBERT retrieval model, which shaped modern neural information retrieval
- DSPy paper: ICLR 2024 Spotlight
GitHub | Twitter | Google Scholar
The DSPy Programming Model
DSPy introduces three abstractions that replace prompt engineering:
| Abstraction | Purpose | Replaces |
|---|---|---|
| Signatures | Declare input/output behavior | Prompt templates |
| Modules | Composable LLM operations | Chain-of-thought scripts |
| Optimizers | Automatic prompt/weight tuning | Manual iteration |
Signatures
Signatures are declarative specs, not prompts. You define what you want, not how to ask for it:
# Simple: question -> answer
classify = dspy.Predict('sentence -> sentiment: bool')
# With types: multiple inputs and outputs
qa = dspy.Predict('context, question -> reasoning: str, answer: str')
# Field names carry meaning: "question" differs from "query"
The runtime materializes these into prompts or fine-tuned weights. Your code stays clean.
Modules
Modules wrap prompting techniques. Swap implementations without changing logic:
# Basic prediction
predict = dspy.Predict('question -> answer')
# Same signature, adds reasoning step
cot = dspy.ChainOfThought('question -> answer')
# Same signature, uses tools
agent = dspy.ReAct('question -> answer', tools=[search, calculator])
Built-in modules:
| Module | Behavior |
|---|---|
dspy.Predict | Direct LLM call |
dspy.ChainOfThought | Adds reasoning before output |
dspy.ProgramOfThought | Generates and executes code |
dspy.ReAct | Agent loop with tools |
dspy.MultiChainComparison | Runs multiple chains, picks best |
Optimizers
DSPy compiles your program against a metric. No manual prompt tuning:
# Define what "good" means
def accuracy(example, prediction):
return example.answer == prediction.answer
# Compile with optimizer
optimizer = dspy.MIPROv2(metric=accuracy)
optimized_program = optimizer.compile(
my_program,
trainset=training_examples
)
Available optimizers:
| Optimizer | Method |
|---|---|
BootstrapFewShot | Auto-selects demonstrations |
MIPROv2 | Optimizes instructions + examples |
BootstrapFinetune | Fine-tunes model weights |
GRPO | Online reinforcement learning |
Building a RAG Pipeline
Traditional approach: 500 lines of prompt templates, retriever glue code, and manual few-shot examples.
DSPy approach:
class RAG(dspy.Module):
def __init__(self, num_docs=3):
self.retrieve = dspy.Retrieve(k=num_docs)
self.generate = dspy.ChainOfThought('context, question -> answer')
def forward(self, question):
docs = self.retrieve(question)
return self.generate(context=docs, question=question)
# That's it. Compile against your eval set.
rag = RAG()
optimized_rag = dspy.MIPROv2(metric=answer_accuracy).compile(
rag,
trainset=qa_examples
)
Results from the DSPy paper: this simple pipeline outperforms hand-crafted prompts by 25-65% depending on the model.
Key Principles
Khattab summarizes DSPy’s philosophy:
- “Prompts are what signatures wanted to be when they grow up”
- “This is just software. Happens to be AI software.”
- “DSPy asks you to not write prompts”
His view on “multi-agent systems”: after 6 years building AI architectures, he considers the term a distraction. It’s just programs calling programs.
Applying DSPy to Personal AI
For building personal AI systems, DSPy offers:
| Use Case | DSPy Approach |
|---|---|
| Note summarization | Signature: notes -> summary, key_points: list |
| Task extraction | Module: dspy.ChainOfThought('text -> tasks: list') |
| Research assistant | RAG pipeline with your documents |
| Writing helper | Custom module composing outline + draft + edit |
The compile step matters most. Define your quality metric (does the summary capture main points? does the task list match human extraction?) and let DSPy optimize.
Key Takeaways
| Principle | Implementation |
|---|---|
| Declare, don’t prompt | Write signatures specifying behavior |
| Compose modules | Chain operations like normal functions |
| Compile against metrics | Let optimizers find best prompts/weights |
| Trust the abstraction | Runtime handles prompt engineering |
| Version your programs | Code is cleaner than prompt files |
Links
Next: Jesse Vincent’s Superpowers Framework
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.