Gregor Zunic's Browser Use Framework

Table of content

Gregor Zunic is a software developer and startup founder based in San Francisco. He co-founded Browser Use with Magnus Muller, creating the most popular open-source library for AI browser automation. The project hit 75,000 GitHub stars within months of launch and raised $17 million from Felicis, Paul Graham, and Y Combinator (W25 batch).

Browser Use converts website interfaces into structured text that AI models can process reliably. Instead of relying on brittle screenshot-based approaches, it parses HTML and visual elements together, letting agents click buttons, fill forms, and navigate sites with human-like precision.

Background

Studied at ETH Zurich with co-founder Magnus Muller
Previously built Spexia, an SEO tool for startup founders (2023-2024)
Co-founded Real Fake Photos, an AI headshot generator that grew to 100k+ users
Built Browser Use in 5 weeks as an experiment, watched it go viral
GitHub: @gregpr07

The Pivot Story

In July 2024, Zunic quit his SEO startup despite growing revenue. From his LinkedIn post:

“I woke up every day not ready to grind.”

He was building something that worked financially but felt pointless. The SEO tool was sales-heavy and technically straightforward. He wanted something ambitious.

After quitting, he and Muller experimented with web scraping. They noticed AI agents struggled with browser interactions. Existing tools were fragile and required constant maintenance. They built a prototype in five weeks. It worked better than expected.

Browser Use launched on GitHub in late 2024. Within three months, it became the most-starred open-source web agent project.

How Browser Use Works

The library combines visual analysis with HTML parsing:

from browser_use import Agent
from langchain_openai import ChatOpenAI

agent = Agent(
    task="Find the cheapest flight from SF to NYC next Friday",
    llm=ChatOpenAI(model="gpt-4o"),
)
result = await agent.run()

The agent observes the page, reasons about what to do, then acts (click, type, scroll). This loop repeats until the task completes or fails.

Key technical decisions:

Choice	Rationale
HTML + vision	Vision alone is unreliable; HTML alone misses dynamic content
Self-healing selectors	Adapts when page layouts change
LLM-agnostic	Works with GPT-4, Claude, Gemini, or local models
Stealth mode	Bypasses bot detection for legitimate automation

Open Source Strategy

Browser Use is MIT-licensed and fully open-source. Zunic also offers a cloud version at $30/month for users who don’t want to manage infrastructure.

This mirrors the pattern from other successful AI infrastructure projects: give away the core library, charge for convenience. The open-source version has 75k+ stars and millions of downloads. The cloud version handles authentication, proxies, and scaling.

The library powers Manus, one of the first viral AI agent demos that showed agents completing complex multi-step tasks autonomously.

Technical Philosophy

Zunic’s approach to browser automation:

Combine modalities. Pure vision models hallucinate button locations. Pure HTML parsing misses JavaScript-rendered content. Use both together.

Recover from errors. Agents will click wrong buttons and navigate to dead ends. Build retry logic and state recovery into the core library.

Stay model-agnostic. Today’s best model won’t be tomorrow’s. Abstract the LLM layer so users can swap providers.

Optimize for developers. The library works in 3 lines of Python. Advanced configuration is available but not required.

Key Takeaways

Principle	Implementation
Quit when you’re not excited	Left a growing startup to find better work
Build fast, validate faster	5-week prototype became a company
Open source builds trust	MIT license, public development
Solve infrastructure problems	Let others build agents; you handle the browser

Links

Next: Jesse Vincent’s Superpowers Skills Framework