markitdown
Table of content
Python tool for converting files and office documents to Markdown. PDF, DOCX, PPTX, HTML, images → clean Markdown. Microsoft ships it open-source.
the workflow
your agent needs to read documents. but:
- PDFs are binary blobs
- DOCX is XML hell
- every format has its own quirks
markitdown normalizes them all into one format: Markdown.
why it matters
if your life is a repo, your documents need to be git-diffable. markitdown is the bridge from “proprietary formats” to “agent-readable text.”
Markdown as the universal substrate for knowledge work.