Ben Firshman's Mission to Make AI Usable
Table of content
Ben Firshman builds tools that hide complexity. He co-created Docker Compose, which turned multi-container orchestration into a simple YAML file. He co-authored the Command Line Interface Guidelines, a document about making CLIs human-first. And with Replicate, he’s applying the same thinking to machine learning.
The Problem with ML Tooling
Firshman’s thesis is blunt: machine learning isn’t inherently hard. We just don’t have good tools yet.
“You shouldn’t have to understand GPUs to use machine learning, in the same way you don’t have to understand TCP/IP to build a website.”
He draws a comparison to web development twenty years ago. Back then, building a website meant constructing your own servers, concatenating HTML and SQL by hand, uploading files via FTP. Then Rails came along. And Django. And Heroku. The complexity didn’t disappear—it got packaged behind better abstractions.
ML today is where web dev was in 2004. Messy Python scripts. Broken Colab notebooks. Perplexing CUDA errors. Weights scattered across Google Drive links. It’s a mess of tribal knowledge that gates who can actually build things.
Cog: Docker for Models
The core artifact from Replicate is Cog, an open-source tool that defines a standard container format for ML models. Think of it as Docker specifically for machine learning.
With Cog, you define your model’s interface in Python:
from cog import BasePredictor, Input
class Predictor(BasePredictor):
def predict(self, prompt: str = Input(description="Text prompt")) -> str:
return self.model.generate(prompt)
Cog packages everything—code, weights, dependencies, CUDA versions—into a reproducible container with a standard API. The model becomes portable. You can share it, deploy it, run it anywhere.
This approach mirrors what Firshman learned building Docker Compose: find the smallest useful abstraction, make it a standard, then build everything else on top.
The Numbers Argument
Firshman often cites a specific ratio: there are roughly 30 million software engineers worldwide, versus about 500,000 ML engineers. That’s two orders of magnitude.
His argument: if we build good enough tools, those 30 million developers can use ML the same way they use any other library. You should be able to import an image generator the same way you import an npm package.
Replicate’s API embodies this:
import replicate
replicate.run(
"stability-ai/stable-diffusion",
input={"prompt": "an astronaut riding a horse"}
)
No GPU setup. No dependency wrangling. No CUDA debugging. Just call a function.
The Cloudflare Acquisition
In November 2025, Replicate joined Cloudflare. Firshman’s reasoning reveals his systems-level thinking.
He describes Replicate’s abstractions as “low-level primitives of an operating system.” But these primitives run in the cloud—they need specialized GPUs and clusters. It’s a distributed operating system for AI.
Cloudflare, in his view, has built complementary pieces: Workers for running agents and glue code, Durable Objects for state management, R2 for file storage, WebRTC for streaming. The combination creates infrastructure for building AI applications at the network edge.
CLI Guidelines: Human-First Design
Beyond ML infrastructure, Firshman’s co-authored Command Line Interface Guidelines reflects his broader philosophy about developer tools.
The document argues that CLIs have historically been “machine-first”—designed as REPLs for scripting platforms. Today’s command line should be “human-first”: a text-based UI designed for humans who happen to also need automation.
Key principles from the guide:
- Human-first design: If a command will primarily be used by humans, design for humans first
- Simple parts that work together: Small tools that compose, following Unix philosophy
- Ease of discovery: Make features findable without reading documentation
- Conversation as the norm: Error messages should guide users toward solutions
The guidelines inform how Replicate’s CLI works, and they represent Firshman’s general stance on tooling: hide complexity, but not capability.
Pattern: Infrastructure from Practice
Firshman’s background shows a consistent pattern. He contributed to Django, including class-based views and feeds. He built The Guardian’s iPad app. He worked on GOV.UK. He even wrote JSNES, one of the first JavaScript emulators, back when running an NES in a browser was absurd.
In each case, he was building practical software that pushed against existing limitations. Docker Compose came from needing to orchestrate containers at scale. Cog came from watching ML engineers at Spotify struggle to deploy models to production.
The tools come from the problems, not the other way around.
Core Ideas
Abstraction over specialization: Most people don’t need to understand the full stack. Good tools let you operate at the right level of abstraction.
Standard formats enable ecosystems: Docker images created an ecosystem of containers. Cog aims to create an ecosystem of ML models.
The network is the computer: AI primitives need to run in the cloud. Building infrastructure that treats the network as the fundamental platform.
Human-first design: Tools should be designed for people, with automation as a bonus, not the other way around.
Links
- Replicate - Run ML models in the cloud
- Cog - Containers for machine learning
- Command Line Interface Guidelines - Human-first CLI design
- Machine learning needs better tools - Firshman’s manifesto on ML tooling
- GitHub: @bfirsh
- Twitter/X: @bfirsh
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.