name: browser description: Control a headless Chromium browser — navigate pages, read content via accessibility snapshots, interact with elements, fill forms, take screenshots
Browser Control
Headless Chromium browser via CLI for web automation.
Setup
Check if the server is running, start if needed:
browser status
browser start # auto-starts Chrome on first use
If the server process isn't running at all:
# Derive the engine dir from the browser tool path in the Tool Paths table above
# e.g., if browser = "tsx /foo/bar/lib/browser-cli.ts", then:
tsx /foo/bar/lib/browser/server.ts &
CLI Reference
browser is shorthand — actual path is in the Tool Paths section of the assistant config.
Navigation
browser open <url> # New tab
browser navigate <url> # Current tab
browser tabs # List tabs
browser close [targetId] # Close tab
Reading Pages
browser snapshot --efficient # Accessibility tree (~12k chars)
browser snapshot --selector "main" # Scope to element
browser evaluate "document.title" # Run JS, return result
browser screenshot # PNG screenshot (returns path)
browser console # View JS console messages
Choose the right tool:
snapshot --efficient— default for most pages. Text-based, cheap (~3-5K tokens).snapshot --selector "main"— large pages where nav/sidebar bloats the snapshot.evaluate— structured data extraction. Returns JSON. Best for scraping lists, tables, feeds.screenshot— visual layouts, images, CAPTCHAs. Requires vision. Use as fallback.
Interacting with Elements
browser click e5 # Click by ref
browser type e5 Hello world # Type into element
browser press Enter # Keyboard key
browser fill e3=John e4=john@x.com # Fill form fields
browser select e7 option1 # Select dropdown
Snapshots
The snapshot command returns an accessibility tree with element refs:
[url: https://news.ycombinator.com]
- heading "Hacker News" [level=1]
- link "Show HN: Something cool" [ref=e5]
- text "142 points by user 3 hours ago"
- link "89 comments" [ref=e6]
Refs like e5 are used in click/type/fill. They change between snapshots — always re-snapshot after acting.
Workflow
- Snapshot → see page state
- Act (click/type/fill) using refs
- Snapshot again → verify result
Structured Data Extraction
For scraping lists, tables, or feeds, use evaluate with JS instead of parsing snapshots:
browser evaluate "JSON.stringify(Array.from(document.querySelectorAll('.item')).map(el => ({title: el.querySelector('a')?.textContent, url: el.querySelector('a')?.href})))"
This is faster, cheaper, and more reliable than parsing accessibility trees for bulk data.
Reddit (old.reddit.com)
browser open https://old.reddit.com/r/<subreddit>/
browser evaluate "JSON.stringify(Array.from(document.querySelectorAll('#siteTable .thing.link')).map(t => ({title: t.querySelector('a.title')?.textContent, score: t.querySelector('.score.unvoted')?.textContent, comments: t.querySelector('.comments')?.textContent, time: t.querySelector('time')?.getAttribute('title'), flair: t.querySelector('.linkflairlabel')?.textContent, url: t.querySelector('a.title')?.href})))"
Always use old.reddit.com — simpler HTML, no infinite scroll, works with evaluate. Rate limit: 30s between subreddits.
Login-Required Sites
The browser profile persists (cookies survive restarts). To log in:
- Navigate to login page
- Snapshot → find form refs
- Fill credentials (from secrets, never hardcode)
- Submit → snapshot to verify
Troubleshooting
- Connection refused: Server isn't running. Start the browser server (see "Starting the Server" above).
- Timeout on start: Stale Chrome process holding the user-data lock. Kill it:
pkill -f 'chrome.*browser-control'then retry. - Snapshot too large: Use
--efficientor--selector "main"to scope down. - Reddit blocked: IP rate-limited. Wait 30s+ between requests. Log in for higher limits.