Suchintan Singh's Vision-Based Browser Automation
Table of content

Suchintan Singh is the co-founder and CEO of Skyvern, an open-source browser automation platform that uses vision models and LLMs instead of DOM selectors. The Y Combinator S23 company raised $2.7M to help enterprises automate manual browser work. With 20,000+ GitHub stars and state-of-the-art WebBench scores, Skyvern has become a leading tool for RPA-style automation that doesn’t break when websites change their layouts.
Background
- 12+ years as a software engineer and ML professional
- University of Waterloo graduate (Systems Design Engineering)
- Founded the ML Infrastructure team at Faire, built an online Feature Store
- Engineering Manager at Gopuff, established the Search Engineering team
- Started programming in grade seven when his father taught him C++
- Got real coding experience writing bots for RuneScape in high school
The Three Pivots
Skyvern is Singh’s third attempt at a startup. From the APIs You Won’t Hate podcast:
First pivot: An engineer onboarding tool. Failed because they built without user input.
Second pivot: A search and discovery platform. Discovered limited addressable market due to organizational silos.
Third pivot: Browser automation with AI. Happened at the end of 2023 after deciding to focus on a hard technical problem.
Singh and co-founder Shuchang Zheng decided to start a company together on a random trip while going surfing. They now work alongside CTO Kerem YILMAZ.
Vision-Based Automation
Traditional browser automation uses XPath and CSS selectors that break whenever a website updates its frontend code. Skyvern takes a different approach: it looks at the page like a human would.
from skyvern import Skyvern
client = Skyvern(api_key="your_key")
task = client.run(
url="https://geico.com",
goal="Get an auto insurance quote",
data={
"first_name": "John",
"last_name": "Doe",
"zip_code": "94103"
}
)
The system processes each screen through three layers:
| Layer | Purpose |
|---|---|
| Computer Vision | Identifies buttons, forms, and interactive elements by appearance |
| Vision LLM | Reasons about page structure and determines actions |
| Playwright | Executes clicks, typing, and navigation |
This means Skyvern can recognize a submit button whether it’s styled as a green rectangle, a blue rounded pill, or a custom graphic element.
Planner-Actor-Validator Architecture
Skyvern separates concerns across specialized agents:
Planner: Maintains high-level goals and breaks tasks into steps.
Actor: Executes browser interactions based on the plan.
Validator: Confirms actions succeeded before proceeding.
From the funding announcement:
“We just raised a $2.7M seed round to fix one of the most boring but expensive problems in business: manual browser work.”
The validator is key. Most automation failures happen silently, with the script clicking the wrong button and continuing. Skyvern’s validator catches these mistakes and triggers retries.
WebBench Performance
Skyvern leads on WRITE-task performance in the WebBench benchmark:
| Task Type | Skyvern’s Strength |
|---|---|
| Form filling | 64.4% accuracy (state of the art) |
| Logins | Handles 2FA and magic links |
| File downloads | Automates portal extractions |
| Government portals | Navigates complex multi-step forms |
The benchmark tests real RPA workflows: insurance quotes, vendor portal logins, compliance form submissions.
Low-Stakes Philosophy
Singh deliberately avoids high-consequence automation:
“The cost of getting an insurance quote wrong is low.”
Skyvern focuses on research and data gathering, not transactions. Getting a wrong insurance quote means you run the task again. Submitting a wrong payment means real money disappears.
The company leaves human review in place for sensitive operations, using automation to prepare data rather than execute final actions.
Open Source Strategy
Skyvern is fully open-source (MIT license) with a hosted cloud option:
# Local deployment
docker-compose up -d
skyvern quickstart
The CLAUDE.md file in the repo shows the system’s architecture: Python 3.11+, FastAPI, PostgreSQL, and Playwright for browser control.
Singh explains the open-source approach:
“Developers can try it out for this idea that you had over the weekend and potentially fix issues, allowing us to harness the power of the open source community.”
Integration Ecosystem
Skyvern connects to workflow platforms:
| Integration | Use Case |
|---|---|
| n8n | Trigger browser tasks from 400+ workflow nodes |
| Zapier | Connect Skyvern to 5,000+ apps |
| LangChain | Use as a tool in agent chains |
| LlamaIndex | Integrate with RAG pipelines |
| Ollama | Run entirely local with on-device models |
The company achieved SOC-2 compliance in August 2025, making it suitable for enterprise environments.
Key Takeaways
| Principle | Implementation |
|---|---|
| Vision over selectors | Use what the screen looks like, not HTML structure |
| Validate every action | Confirm before proceeding to catch silent failures |
| Focus on low-stakes | Research and data prep, not high-risk transactions |
| Keep teams small | Three technical co-founders enables ruthless prioritization |
| Open source builds reach | MIT license, cloud for convenience |
Links
- Skyvern
- GitHub: Skyvern-AI/skyvern
- Y Combinator Profile
- Documentation
- APIs You Won’t Hate Podcast
- Skyvern Blog
Next: Gregor Zunic’s Browser Use Framework
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.