Omar Khattab's DSPy Framework

Table of content
Omar Khattab's DSPy Framework

Omar Khattab is an Assistant Professor at MIT EECS and CSAIL who created DSPy, a framework that treats language models as programmable components rather than text generators requiring manual prompts. His research at Stanford produced both DSPy and ColBERT, two open-source projects downloaded millions of times monthly and adopted by Google, Amazon, IBM, and Databricks.

Khattab’s core insight: prompts are the assembly language of AI. Writing them by hand doesn’t scale. DSPy lets you write normal Python code with declarative specifications, then compiles optimized prompts automatically.

Background

GitHub | Twitter | Google Scholar

The DSPy Programming Model

DSPy introduces three abstractions that replace prompt engineering:

AbstractionPurposeReplaces
SignaturesDeclare input/output behaviorPrompt templates
ModulesComposable LLM operationsChain-of-thought scripts
OptimizersAutomatic prompt/weight tuningManual iteration

Signatures

Signatures are declarative specs, not prompts. You define what you want, not how to ask for it:

# Simple: question -> answer
classify = dspy.Predict('sentence -> sentiment: bool')

# With types: multiple inputs and outputs
qa = dspy.Predict('context, question -> reasoning: str, answer: str')

# Field names carry meaning: "question" differs from "query"

The runtime materializes these into prompts or fine-tuned weights. Your code stays clean.

Modules

Modules wrap prompting techniques. Swap implementations without changing logic:

# Basic prediction
predict = dspy.Predict('question -> answer')

# Same signature, adds reasoning step
cot = dspy.ChainOfThought('question -> answer')

# Same signature, uses tools
agent = dspy.ReAct('question -> answer', tools=[search, calculator])

Built-in modules:

ModuleBehavior
dspy.PredictDirect LLM call
dspy.ChainOfThoughtAdds reasoning before output
dspy.ProgramOfThoughtGenerates and executes code
dspy.ReActAgent loop with tools
dspy.MultiChainComparisonRuns multiple chains, picks best

Optimizers

DSPy compiles your program against a metric. No manual prompt tuning:

# Define what "good" means
def accuracy(example, prediction):
    return example.answer == prediction.answer

# Compile with optimizer
optimizer = dspy.MIPROv2(metric=accuracy)
optimized_program = optimizer.compile(
    my_program,
    trainset=training_examples
)

Available optimizers:

OptimizerMethod
BootstrapFewShotAuto-selects demonstrations
MIPROv2Optimizes instructions + examples
BootstrapFinetuneFine-tunes model weights
GRPOOnline reinforcement learning

Building a RAG Pipeline

Traditional approach: 500 lines of prompt templates, retriever glue code, and manual few-shot examples.

DSPy approach:

class RAG(dspy.Module):
    def __init__(self, num_docs=3):
        self.retrieve = dspy.Retrieve(k=num_docs)
        self.generate = dspy.ChainOfThought('context, question -> answer')

    def forward(self, question):
        docs = self.retrieve(question)
        return self.generate(context=docs, question=question)

# That's it. Compile against your eval set.
rag = RAG()
optimized_rag = dspy.MIPROv2(metric=answer_accuracy).compile(
    rag,
    trainset=qa_examples
)

Results from the DSPy paper: this simple pipeline outperforms hand-crafted prompts by 25-65% depending on the model.

Key Principles

Khattab summarizes DSPy’s philosophy:

His view on “multi-agent systems”: after 6 years building AI architectures, he considers the term a distraction. It’s just programs calling programs.

Applying DSPy to Personal AI

For building personal AI systems, DSPy offers:

Use CaseDSPy Approach
Note summarizationSignature: notes -> summary, key_points: list
Task extractionModule: dspy.ChainOfThought('text -> tasks: list')
Research assistantRAG pipeline with your documents
Writing helperCustom module composing outline + draft + edit

The compile step matters most. Define your quality metric (does the summary capture main points? does the task list match human extraction?) and let DSPy optimize.

Key Takeaways

PrincipleImplementation
Declare, don’t promptWrite signatures specifying behavior
Compose modulesChain operations like normal functions
Compile against metricsLet optimizers find best prompts/weights
Trust the abstractionRuntime handles prompt engineering
Version your programsCode is cleaner than prompt files

Next: Jesse Vincent’s Superpowers Framework

Topics: ai-coding framework open-source prompting