Build Your First Browser Agent with browser-use
Table of content
Browser agents let AI control your browser: clicking buttons, filling forms, and extracting data. browser-use is the leading open-source library for this, with 75k+ GitHub stars and backing from Y Combinator. This guide gets you running in 10 minutes.
For the theory behind browser agents, see Browser Agents concepts.
How browser-use Works
The library runs an observe-reason-act loop:
- Observe - Captures page HTML and screenshots
- Reason - LLM decides next action
- Act - Playwright executes click, type, or scroll
- Repeat - Checks result, continues until task completes
| Approach | How it works | Trade-off |
|---|---|---|
| Vision-only | Screenshot analysis | Hallucinates button positions |
| HTML-only | DOM parsing | Misses JavaScript content |
| Hybrid (browser-use) | Both together | Best accuracy, higher token cost |
Installation
Requirements: Python 3.11+, an API key from OpenAI/Anthropic/Google.
pip install browser-use
playwright install chromium
Set your API key:
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."
Your First Agent
import asyncio
from browser_use import Agent
from browser_use.llm import ChatOpenAI
async def main():
agent = Agent(
task="Go to Hacker News and return the top 3 post titles",
llm=ChatOpenAI(model="gpt-4o-mini"),
)
result = await agent.run()
print(result.final_result())
asyncio.run(main())
Run it:
python my_agent.py
The browser opens, navigates to Hacker News, extracts titles, and returns them.
Structured Output
Extract data into typed objects:
from pydantic import BaseModel
from browser_use import Agent, Controller
from browser_use.llm import ChatOpenAI
class Post(BaseModel):
title: str
url: str
points: int
class Posts(BaseModel):
posts: list[Post]
controller = Controller(output_model=Posts)
agent = Agent(
task="Get the top 5 posts from Hacker News with title, URL, and points",
llm=ChatOpenAI(model="gpt-4o"),
controller=controller,
)
result = await agent.run()
posts = Posts.model_validate_json(result.final_result())
for post in posts.posts:
print(f"{post.points} - {post.title}")
Browser Configuration
Control headless mode, viewport, and profiles:
from browser_use import BrowserSession, BrowserProfile
profile = BrowserProfile(
headless=False, # Watch the browser work
viewport={"width": 1280, "height": 1024},
user_data_dir="./my_profile", # Persist cookies/sessions
)
session = BrowserSession(browser_profile=profile)
agent = Agent(
task="Log into my account and check notifications",
llm=llm,
browser_session=session,
)
Using Different LLMs
browser-use works with any model:
# OpenAI
from browser_use.llm import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
# Anthropic
from browser_use.llm import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
# Google
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro")
# Local via Ollama
from langchain_ollama import ChatOllama
llm = ChatOllama(model="qwen2.5:14b")
Common Mistakes
| Mistake | Why it fails | Fix |
|---|---|---|
| Vague tasks | LLM doesn’t know when to stop | Be specific: “Get the first 5 results” not “Get results” |
| No error handling | Agent crashes on popups | Use max_failures parameter |
| Wrong model size | Small models miss complex pages | Use GPT-4o or Claude for multi-step tasks |
| Blocking headless mode | Some sites block headless browsers | Set headless=False or use stealth mode |
| No wait time | Actions fail on slow pages | Configure wait_for_network_idle_page_load_time |
Handling Sensitive Data
Never hardcode credentials:
from dotenv import load_dotenv
load_dotenv()
sensitive_data = {
"login_user": os.getenv("MY_USERNAME"),
"login_pass": os.getenv("MY_PASSWORD"),
}
agent = Agent(
task="Log in using 'login_user' and 'login_pass', then check my balance",
llm=llm,
sensitive_data=sensitive_data,
)
The agent sees placeholders in logs, not actual credentials.
Debugging
Enable verbose logging:
import logging
logging.basicConfig(level=logging.INFO)
agent = Agent(
task="...",
llm=llm,
save_recording=True, # Saves video of the session
)
Watch the logs to see each step:
INFO [agent] Step 1: Navigate to https://example.com
INFO [agent] Step 2: Click element "Login button"
INFO [agent] Step 3: Input text into "Username field"
Real-World Example: Job Search
Scrape job listings with structured output:
from pydantic import BaseModel
from browser_use import Agent, Controller, BrowserSession
from browser_use.llm import ChatOpenAI
class Job(BaseModel):
title: str
company: str
location: str
salary: str
class Jobs(BaseModel):
jobs: list[Job]
controller = Controller(output_model=Jobs)
agent = Agent(
task="""
Go to indeed.com, search for "Python developer" in "San Francisco".
Extract the first 5 job listings with title, company, location, and salary.
""",
llm=ChatOpenAI(model="gpt-4o"),
controller=controller,
)
result = await agent.run()
jobs = Jobs.model_validate_json(result.final_result())
Key Takeaways
| Principle | Implementation |
|---|---|
| Start simple | Single-task agents before complex workflows |
| Be specific | Precise instructions reduce failures |
| Watch first | Run with headless=False until confident |
| Handle failures | Set max_failures and add retry logic |
| Protect secrets | Use sensitive_data parameter, never hardcode |
What’s Next
- Add browser automation to existing Claude Code workflows
- Learn about Gregor Zunic, who built browser-use
- Understand the theory behind browser agents
- Explore parallel sessions for running multiple agents
Next: Gregor Zunic’s Browser Use Framework
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.