name: browser-tools description: >- Use when interacting with web pages, automating browser actions, performing E2E web application testing, navigating visually, scraping dynamic content, or starting a local dev server to test against. Triggers include requests to open a website, fill a form, click a button, take a screenshot, scrape dynamic content, test a web app, login to a site, or automate any browser-based task. Do NOT use for fetching static API JSON or simple HTTP requests — use web-search or web- crawler for those. license: MIT tags: [browser, playwright, testing, web, automation, e2e, agent-browser] metadata: author: Genius version: '0.47.0'
Browser Tools
Overview
Two approaches for browser automation and web application testing:
agent-browserCLI — Preferred for interactive navigation tasks (snapshot-based, works with any live site)- Playwright scripts — Preferred for automated E2E test suites against local servers
Approach 1: agent-browser CLI (Interactive Navigation)
Core Workflow
Every browser automation follows: Navigate → Snapshot → Interact → Re-snapshot
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Verify result
Essential Commands
# Navigation
agent-browser open <url> # Navigate to URL
agent-browser close # Close browser
# Snapshot (get element refs @e1, @e2, ...)
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick)
agent-browser snapshot -s "#selector" # Scope snapshot to CSS selector
# Interaction (always use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown value
agent-browser check @e1 # Toggle checkbox
agent-browser press Enter # Press keyboard key
agent-browser scroll down 500 # Scroll page
# Information
agent-browser get text @e1 # Get element text content
agent-browser get url # Get current URL
agent-browser get title # Get page title
# Waiting
agent-browser wait @e1 # Wait for element to appear
agent-browser wait --load networkidle # Wait for network to settle
agent-browser wait --url "**/path" # Wait for URL pattern
agent-browser wait 2000 # Wait fixed milliseconds
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full-page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save page as PDF
Important: Ref Lifecycle
Refs (@e1, @e2, ...) are invalidated whenever the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (modals, dropdowns)
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # MUST re-snapshot before using any refs
agent-browser click @e1 # Use new refs
Common Patterns
Form Submission:
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser click @e4
agent-browser wait --load networkidle
Data Extraction:
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > /tmp/page.txt
Annotated Screenshots (when element labels unclear):
agent-browser screenshot --annotate
# Output: image path + legend: [1] @e1 button "Submit", [2] @e2 link "Home"
agent-browser click @e1 # Use ref from annotated screenshot
Semantic Locators (when @refs unavailable):
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
Approach 2: Playwright Scripts (E2E Testing Against Local Servers)
Decision Tree
Task → Is it a static HTML file?
├─ Yes → Read HTML directly to identify selectors
│ ├─ Success → Write Playwright script with found selectors
│ └─ Fails → Treat as dynamic (below)
│
└─ No (dynamic app) → Is the server already running?
├─ No → Run with_server.py (see below) + write Playwright script
└─ Yes → Reconnaissance-then-action:
1. Navigate and wait for networkidle
2. Screenshot or inspect DOM
3. Identify selectors from rendered state
4. Execute actions with discovered selectors
Using scripts/with_server.py
Always run --help first to see options. Use for managing server lifecycle:
# Single server + run automation
python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py
# Multiple servers (backend + frontend)
python scripts/with_server.py \
--server "cd backend && python server.py" --port 3000 \
--server "cd frontend && npm run dev" --port 5173 \
-- python your_automation.py
Playwright Script Pattern
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True) # Always headless
page = browser.new_page()
page.goto('http://localhost:5173')
page.wait_for_load_state('networkidle') # CRITICAL: wait for JS to execute
# Reconnaissance: take screenshot or inspect DOM first
page.screenshot(path='/tmp/inspect.png', full_page=True)
# Then interact using discovered selectors
page.fill('input[type="email"]', 'user@example.com')
page.click('button[type="submit"]')
browser.close()
Best Practices
- Always
wait_for_load_state('networkidle')before inspecting DOM on dynamic apps - Use semantic selectors:
text=,role=, CSS, or IDs — avoid brittle XPath - Add explicit waits with
page.wait_for_selector()for slow elements - Use
scripts/with_server.pywith--helpfirst — do not read its source unless necessary - For
agent-browser, always close the session when done:agent-browser close