Suchintan Singh's Vision-Based Browser Automation

Table of content
Suchintan Singh's Vision-Based Browser Automation

Suchintan Singh is the co-founder and CEO of Skyvern, an open-source browser automation platform that uses vision models and LLMs instead of DOM selectors. The Y Combinator S23 company raised $2.7M to help enterprises automate manual browser work. With 20,000+ GitHub stars and state-of-the-art WebBench scores, Skyvern has become a leading tool for RPA-style automation that doesn’t break when websites change their layouts.

Background

GitHub | Twitter | LinkedIn

The Three Pivots

Skyvern is Singh’s third attempt at a startup. From the APIs You Won’t Hate podcast:

First pivot: An engineer onboarding tool. Failed because they built without user input.

Second pivot: A search and discovery platform. Discovered limited addressable market due to organizational silos.

Third pivot: Browser automation with AI. Happened at the end of 2023 after deciding to focus on a hard technical problem.

Singh and co-founder Shuchang Zheng decided to start a company together on a random trip while going surfing. They now work alongside CTO Kerem YILMAZ.

Vision-Based Automation

Traditional browser automation uses XPath and CSS selectors that break whenever a website updates its frontend code. Skyvern takes a different approach: it looks at the page like a human would.

from skyvern import Skyvern

client = Skyvern(api_key="your_key")

task = client.run(
    url="https://geico.com",
    goal="Get an auto insurance quote",
    data={
        "first_name": "John",
        "last_name": "Doe",
        "zip_code": "94103"
    }
)

The system processes each screen through three layers:

LayerPurpose
Computer VisionIdentifies buttons, forms, and interactive elements by appearance
Vision LLMReasons about page structure and determines actions
PlaywrightExecutes clicks, typing, and navigation

This means Skyvern can recognize a submit button whether it’s styled as a green rectangle, a blue rounded pill, or a custom graphic element.

Planner-Actor-Validator Architecture

Skyvern separates concerns across specialized agents:

Planner: Maintains high-level goals and breaks tasks into steps.

Actor: Executes browser interactions based on the plan.

Validator: Confirms actions succeeded before proceeding.

From the funding announcement:

“We just raised a $2.7M seed round to fix one of the most boring but expensive problems in business: manual browser work.”

The validator is key. Most automation failures happen silently, with the script clicking the wrong button and continuing. Skyvern’s validator catches these mistakes and triggers retries.

WebBench Performance

Skyvern leads on WRITE-task performance in the WebBench benchmark:

Task TypeSkyvern’s Strength
Form filling64.4% accuracy (state of the art)
LoginsHandles 2FA and magic links
File downloadsAutomates portal extractions
Government portalsNavigates complex multi-step forms

The benchmark tests real RPA workflows: insurance quotes, vendor portal logins, compliance form submissions.

Low-Stakes Philosophy

Singh deliberately avoids high-consequence automation:

“The cost of getting an insurance quote wrong is low.”

Skyvern focuses on research and data gathering, not transactions. Getting a wrong insurance quote means you run the task again. Submitting a wrong payment means real money disappears.

The company leaves human review in place for sensitive operations, using automation to prepare data rather than execute final actions.

Open Source Strategy

Skyvern is fully open-source (MIT license) with a hosted cloud option:

# Local deployment
docker-compose up -d
skyvern quickstart

The CLAUDE.md file in the repo shows the system’s architecture: Python 3.11+, FastAPI, PostgreSQL, and Playwright for browser control.

Singh explains the open-source approach:

“Developers can try it out for this idea that you had over the weekend and potentially fix issues, allowing us to harness the power of the open source community.”

Integration Ecosystem

Skyvern connects to workflow platforms:

IntegrationUse Case
n8nTrigger browser tasks from 400+ workflow nodes
ZapierConnect Skyvern to 5,000+ apps
LangChainUse as a tool in agent chains
LlamaIndexIntegrate with RAG pipelines
OllamaRun entirely local with on-device models

The company achieved SOC-2 compliance in August 2025, making it suitable for enterprise environments.

Key Takeaways

PrincipleImplementation
Vision over selectorsUse what the screen looks like, not HTML structure
Validate every actionConfirm before proceeding to catch silent failures
Focus on low-stakesResearch and data prep, not high-risk transactions
Keep teams smallThree technical co-founders enables ruthless prioritization
Open source builds reachMIT license, cloud for convenience

Next: Gregor Zunic’s Browser Use Framework

Topics: agents open-source automation ai-coding