Build Your First Browser Agent with browser-use

Table of content

Browser agents let AI control your browser: clicking buttons, filling forms, and extracting data. browser-use is the leading open-source library for this, with 75k+ GitHub stars and backing from Y Combinator. This guide gets you running in 10 minutes.

For the theory behind browser agents, see Browser Agents concepts.

How browser-use Works

The library runs an observe-reason-act loop:

Observe - Captures page HTML and screenshots
Reason - LLM decides next action
Act - Playwright executes click, type, or scroll
Repeat - Checks result, continues until task completes

Approach	How it works	Trade-off
Vision-only	Screenshot analysis	Hallucinates button positions
HTML-only	DOM parsing	Misses JavaScript content
Hybrid (browser-use)	Both together	Best accuracy, higher token cost

Installation

Requirements: Python 3.11+, an API key from OpenAI/Anthropic/Google.

pip install browser-use
playwright install chromium

Set your API key:

export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."

Your First Agent

import asyncio
from browser_use import Agent
from browser_use.llm import ChatOpenAI

async def main():
    agent = Agent(
        task="Go to Hacker News and return the top 3 post titles",
        llm=ChatOpenAI(model="gpt-4o-mini"),
    )
    result = await agent.run()
    print(result.final_result())

asyncio.run(main())

Run it:

python my_agent.py

The browser opens, navigates to Hacker News, extracts titles, and returns them.

Structured Output

Extract data into typed objects:

from pydantic import BaseModel
from browser_use import Agent, Controller
from browser_use.llm import ChatOpenAI

class Post(BaseModel):
    title: str
    url: str
    points: int

class Posts(BaseModel):
    posts: list[Post]

controller = Controller(output_model=Posts)

agent = Agent(
    task="Get the top 5 posts from Hacker News with title, URL, and points",
    llm=ChatOpenAI(model="gpt-4o"),
    controller=controller,
)

result = await agent.run()
posts = Posts.model_validate_json(result.final_result())
for post in posts.posts:
    print(f"{post.points} - {post.title}")

Browser Configuration

Control headless mode, viewport, and profiles:

from browser_use import BrowserSession, BrowserProfile

profile = BrowserProfile(
    headless=False,              # Watch the browser work
    viewport={"width": 1280, "height": 1024},
    user_data_dir="./my_profile", # Persist cookies/sessions
)

session = BrowserSession(browser_profile=profile)

agent = Agent(
    task="Log into my account and check notifications",
    llm=llm,
    browser_session=session,
)

Using Different LLMs

browser-use works with any model:

# OpenAI
from browser_use.llm import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")

# Anthropic
from browser_use.llm import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-20250514")

# Google
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro")

# Local via Ollama
from langchain_ollama import ChatOllama
llm = ChatOllama(model="qwen2.5:14b")

Common Mistakes

Mistake	Why it fails	Fix
Vague tasks	LLM doesn’t know when to stop	Be specific: “Get the first 5 results” not “Get results”
No error handling	Agent crashes on popups	Use `max_failures` parameter
Wrong model size	Small models miss complex pages	Use GPT-4o or Claude for multi-step tasks
Blocking headless mode	Some sites block headless browsers	Set `headless=False` or use stealth mode
No wait time	Actions fail on slow pages	Configure `wait_for_network_idle_page_load_time`

Handling Sensitive Data

Never hardcode credentials:

from dotenv import load_dotenv
load_dotenv()

sensitive_data = {
    "login_user": os.getenv("MY_USERNAME"),
    "login_pass": os.getenv("MY_PASSWORD"),
}

agent = Agent(
    task="Log in using 'login_user' and 'login_pass', then check my balance",
    llm=llm,
    sensitive_data=sensitive_data,
)

The agent sees placeholders in logs, not actual credentials.

Debugging

Enable verbose logging:

import logging
logging.basicConfig(level=logging.INFO)

agent = Agent(
    task="...",
    llm=llm,
    save_recording=True,  # Saves video of the session
)

Watch the logs to see each step:

INFO [agent] Step 1: Navigate to https://example.com
INFO [agent] Step 2: Click element "Login button"
INFO [agent] Step 3: Input text into "Username field"

Real-World Example: Job Search

Scrape job listings with structured output:

from pydantic import BaseModel
from browser_use import Agent, Controller, BrowserSession
from browser_use.llm import ChatOpenAI

class Job(BaseModel):
    title: str
    company: str
    location: str
    salary: str

class Jobs(BaseModel):
    jobs: list[Job]

controller = Controller(output_model=Jobs)

agent = Agent(
    task="""
    Go to indeed.com, search for "Python developer" in "San Francisco".
    Extract the first 5 job listings with title, company, location, and salary.
    """,
    llm=ChatOpenAI(model="gpt-4o"),
    controller=controller,
)

result = await agent.run()
jobs = Jobs.model_validate_json(result.final_result())

Key Takeaways

Principle	Implementation
Start simple	Single-task agents before complex workflows
Be specific	Precise instructions reduce failures
Watch first	Run with `headless=False` until confident
Handle failures	Set `max_failures` and add retry logic
Protect secrets	Use `sensitive_data` parameter, never hardcode

What’s Next

Add browser automation to existing Claude Code workflows
Learn about Gregor Zunic, who built browser-use
Understand the theory behind browser agents
Explore parallel sessions for running multiple agents

Next: Gregor Zunic’s Browser Use Framework

Topics: browser-automation ai-agents automation