Stan Girard's Open Source RAG Framework
Table of content

Stan Girard is the founder of Quivr, an open-source RAG framework with 38K+ GitHub stars. Based in Paris, he runs GenAI at Theodo while building tools that let developers integrate document search into their applications. His side project went viral in May 2023 after he built the first version in a single afternoon.
Background
- Engineering degree from EPITA (French computer science school)
- Head of GenAI at Theodo
- Site Reliability Engineer background with AWS, Azure, Kubernetes
- Y Combinator W24 batch with co-founder Antoine Dewez
- Built the initial Quivr prototype in one afternoon, tweeted it, went viral
GitHub | Twitter | LinkedIn | Blog
The Quivr Approach
Quivr started as a personal project: dump all your documents into a vector store and query them with GPT-4. The original pitch was “your second brain.”
The framework has since evolved into an opinionated RAG toolkit for developers:
from quivr_core import Brain
brain = Brain.from_files(
name="my-brain",
file_paths=["./docs/report.pdf", "./docs/notes.md"]
)
answer = brain.ask("What were the key findings?")
print(answer.answer)
Five lines to go from files to conversational search.
Architecture
Quivr uses a node-based workflow system. A basic RAG pipeline looks like this:
workflow_config:
name: "standard-rag"
nodes:
- name: "filter_history"
edges: ["rewrite"]
- name: "rewrite"
edges: ["retrieve"]
- name: "retrieve"
edges: ["generate_rag"]
- name: "generate_rag"
edges: []
| Node | Purpose |
|---|---|
| filter_history | Trim conversation context to fit token limits |
| rewrite | Transform user query for better retrieval |
| retrieve | Vector search against document embeddings |
| generate_rag | LLM generates answer from retrieved chunks |
You can swap nodes, add reranking, or insert custom processing steps.
MegaParse: The Document Problem
RAG systems live or die by parsing quality. Girard built MegaParse (7K stars) to handle the messiest part of the pipeline.
The core insight: different documents need different strategies.
OCR vs direct extraction:
- If a PDF page is more than 50% images, use OCR
- Otherwise, use pdfminer for fast text extraction
Table handling:
- Use LLMs to reconstruct tables from draft extractions
- Use vision models for complex table layouts
from megaparse import MegaParse
parser = MegaParse()
result = parser.parse("quarterly-report.pdf")
# Handles tables, images, mixed layouts
This modular approach means you can tune parsing for your specific document types.
Model Flexibility
Quivr supports any LLM provider:
from quivr_core import LLMEndpoint
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
# OpenAI
openai_llm = LLMEndpoint(llm=ChatOpenAI(model="gpt-4o"))
# Anthropic
claude_llm = LLMEndpoint(llm=ChatAnthropic(model="claude-sonnet-4-20250514"))
# Local with Ollama
from langchain_community.chat_models import ChatOllama
local_llm = LLMEndpoint(llm=ChatOllama(model="llama3.2:8b"))
Run fully local with Ollama, or use cloud APIs. The abstraction stays the same.
From Second Brain to Enterprise
The project has shifted focus since Y Combinator. Quivr now targets customer support automation, using the same RAG infrastructure for a different use case.
From the Quivr newsletter:
“We tried to support every LLM provider simultaneously. It made the backend complicated and the UX worse. So we built Genoss as a separate abstraction layer. Now Quivr is a simple application with streaming that’s actually nice to use.”
The lesson: solve complexity by splitting it out, not by adding more options.
Open Source Strategy
Girard’s take on building in public:
- Open source without marketing stays a hobby project
- The code is table stakes; distribution matters more
- Sponsors and backers (Theodo, Aleios, Padok, Sicara) provide runway
- 65th most-starred AI project on GitHub within 6 months
| Metric | Value |
|---|---|
| GitHub stars | 38.9K |
| Contributors | 123 |
| Releases | 354 |
| License | Apache 2.0 |
Practical RAG Patterns
From Quivr’s documentation, patterns that work:
Chunk size matters:
- Too small: lose context
- Too large: dilute relevance
- Default: 500 tokens with 100 token overlap
Reranking improves quality:
reranker_config:
supplier: "cohere"
model: "rerank-multilingual-v3.0"
top_n: 5
Conversation history window:
- Default: 10 turns
- Trim aggressively to stay within token limits
Key Takeaways
| Principle | Implementation |
|---|---|
| Start with a working demo | Built v1 in one afternoon, iterated from there |
| Parsing quality determines RAG quality | MegaParse handles documents, tables, images |
| Abstraction over configuration | YAML workflows, any LLM provider |
| Solve complexity by splitting | Genoss handles LLM abstraction separately |
| Open source needs distribution | Marketing matters as much as code |
Links
Next: Jesse Vincent’s Superpowers Framework
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.