name: browser-testing description: "Full browser automation via Agent Browser Protocol (ABP). Navigate, click, type, scroll, drag, screenshot, extract text, handle dialogs/downloads/file pickers, manage tabs, control JS execution. Single CLI tool."
Browser Automation — ABP
Single tool: {baseDir}/browser.js <command> [args] [--flags]
ABP is a Chromium fork with a REST API baked into the engine. Every action is deterministic — JS freezes between steps, no race conditions, no manual waits.
Setup
{baseDir}/browser.js start # Launch ABP on :8222
Core Commands
B={baseDir}/browser.js
# Navigate
$B nav https://example.com # Navigate active tab
$B nav https://other.com --new # New tab
$B back # History back
$B forward # History forward
$B reload # Reload
# Mouse
$B click 450 320 # Left click
$B click 450 320 --right # Right click
$B click 450 320 --double # Double click
$B click 450 320 --mod CTRL # Ctrl+click
$B hover 300 200 # Mouse move (trigger tooltips/menus)
$B scroll 640 400 --dy 500 # Scroll down 500px
$B scroll 640 400 --dy -300 # Scroll up
$B scroll 640 400 --dx 200 # Scroll right
$B drag 100 200 500 200 # Drag from→to
$B drag 100 200 500 200 --steps 20 # Smooth drag
# Keyboard
$B type hello world # Type text
$B key ENTER # Press key
$B key TAB # Tab
$B key ESCAPE # Escape
$B key a --mod CTRL # Ctrl+A (select all)
$B key c --mod CTRL # Ctrl+C (copy)
$B key ARROWDOWN # Arrow keys
$B key BACKSPACE
$B key a --mod CTRL --action down # Key down only (hold)
$B key a --action up # Key up (release)
# Input helpers
$B slider 400 300 75 # Set range input to 75
$B clear 400 300 # Clear text field (click + select all + delete)
$B pick "Select the login button" # Interactive: user clicks element in browser
# Screenshot
$B screenshot # Viewport with interactive markup
$B screenshot --markup clickable # Only clickable elements
$B screenshot --markup typeable # Only input fields
$B screenshot --markup clickable,typeable,scrollable,grid
$B screenshot --markup none # Clean, no overlays
$B screenshot --format png # PNG instead of WebP
# Extract content
$B text # All visible text (fast, API-native)
$B text "h1.title" # Text within CSS selector
$B eval 'document.title' # Execute JavaScript
$B eval '({links: document.querySelectorAll("a").length})'
$B content # Current page as Markdown (Readability)
$B content https://example.com # Navigate + extract as Markdown
$B cookies # Non-HttpOnly cookies
Tabs
$B tabs # List all tabs
$B tabs new https://google.com # New tab with URL
$B tabs activate <id> # Switch to tab
$B tabs close <id> # Close tab
$B tabs info <id> # Tab details
$B tabs stop <id> # Stop loading
Browser Events
ABP surfaces events that normally require polling — dialogs, file pickers, downloads, select dropdowns, permission prompts. They appear in the output of any action.
# Dialogs (alert, confirm, prompt)
$B dialog # Check for pending dialog
$B dialog accept # Accept
$B dialog accept "response text" # Accept prompt with text
$B dialog dismiss # Dismiss/cancel
# Downloads
$B download # List all
$B download status <id> # Check progress
$B download cancel <id> # Cancel
$B download get <id> # Get content (base64)
# File chooser (triggered by file input click)
$B file <chooser_id> /path/to/file.pdf # Upload file
$B file <chooser_id> file1.jpg file2.jpg # Multiple files
$B file <chooser_id> --cancel # Cancel picker
$B file <chooser_id> --save /path/out.pdf # Save dialog
# Native <select> dropdown
$B select <select_id> 2 # Choose option at index
# Permissions (geolocation, camera, etc.)
$B permission # List pending
$B permission grant <id> # Grant
$B permission grant <id> --lat 42.36 --lng -71.06 # Grant geo with coords
$B permission deny <id> # Deny
Execution Control
ABP freezes JS between actions by default. You can control this:
$B execution # Current state
$B execution pause # Freeze JS & virtual time
$B execution resume # Unfreeze
Advanced
# Batch: multiple actions, one screenshot
$B batch '[{"type":"mouse_click","x":350,"y":200},{"type":"keyboard_type","text":"hello"},{"type":"keyboard_press","key":"ENTER"}]'
# Session history (SQLite-backed, for training data)
$B history # List sessions
$B history current # Current session
$B history actions # Action log
$B history clear # Delete all
# Lifecycle
$B status # Browser readiness
$B shutdown # Graceful shutdown
Global Flags
| Flag | Description |
|---|---|
--tab <id> |
Target specific tab (default: active) |
--shot |
Save screenshot after action (prints path) |
--markup <types> |
Screenshot markup: interactive, clickable,typeable,scrollable,grid,selected, or none |
--format <fmt> |
Screenshot format: webp (default), png, jpeg |
--json |
Output raw API response as JSON |
Event Indicators
When events occur during any action, they're printed automatically:
→ https://new-page.com # Navigation happened
⚠ dialog (confirm): Delete item? # Dialog appeared
📁 file chooser id=fc_1 # File picker opened
⬇ download: report.pdf # Download started
▾ select id=s_1 (5 options) # Native select opened
🔐 permission id=p_1 geolocation # Permission requested
↗ popup: https://popup.com # Popup window
Speed Rules
The fast pattern: navigate → eval to extract. Skip screenshots unless you're lost.
- Start ABP first:
browser.js start - Don't screenshot every step: Skip
--shotduring form-filling. Only screenshot when you need to see layout. - Observe the URL after search: Most SPAs encode filters in URL params. Copy it, modify it,
navdirectly next time — skip the form entirely. - Extract data via
eval, not vision: One JS query extracts 10 results faster than scrolling + screenshotting. - Batch related inputs: Click + type + Enter = one
batchcall instead of three. - Use
textfor simple data:textis faster thanevalfor plain text extraction. - Use pick for ambiguity: When coordinates are unclear, let the user click.
Anti-pattern: click → screenshot → read image → decide → click → screenshot → ... (each step: ~3s for screenshot + LLM vision round-trip)
Fast pattern: nav → click click click (no shots) → eval to extract all data → screenshot once to verify