name: vellum-browser-use
description: Browse the web using assistant browser CLI commands
compatibility: "Designed for Vellum personal assistants"
metadata:
emoji: "๐"
vellum:
category: "browsing"
display-name: "Browser"
activation-hints:
- "Load first if you need to browse the web (navigating, clicking, extracting web content) via assistant browser commands"
Use this skill to browse the web. All browser operations are executed through the assistant browser CLI, invoked via bash or host_bash. Each operation is a subcommand:
| Command | Description |
|---|---|
assistant browser navigate |
Navigate to a URL |
assistant browser snapshot |
List interactive elements on the current page |
assistant browser screenshot |
Take a visual screenshot |
assistant browser click |
Click an element |
assistant browser type |
Type text into an input |
assistant browser press-key |
Press a keyboard key |
assistant browser scroll |
Scroll the page or a specific element |
assistant browser select-option |
Select an option from a native <select> element |
assistant browser hover |
Hover over an element to reveal menus/tooltips |
assistant browser wait-for |
Wait for a condition |
assistant browser extract |
Extract page text content |
assistant browser wait-for-download |
Wait for a file download to complete |
assistant browser fill-credential |
Fill a stored credential into a form field |
assistant browser attach |
Attach the Chrome debugger to the active tab |
assistant browser detach |
Detach the Chrome debugger from the active tab |
assistant browser close |
Close the browser page |
assistant browser status |
Diagnose browser backend readiness and setup steps |
Getting Started โ Check Browser Readiness
Before using any browser commands, run assistant browser --json status first to check which browser backends are available. The status command returns JSON with readiness information for each backend mode:
assistant browser --json status
The response includes:
recommendedModeโ the best available backend (use this)modes[]โ per-mode status withavailable,summary, anduserActions(remediation steps)
Browser Modes
Use --browser-mode <mode> on the assistant browser parent command to pin the browser backend:
| Value | Backend | Description |
|---|---|---|
auto |
Automatic | Default. Picks the best available backend based on context. |
extension |
Chrome extension | Routes through the user's Chrome browser via the extension debugger. |
cdp-inspect |
CDP inspect | Connects to an already-running Chrome instance via DevTools Protocol. |
local |
Playwright | Drives a dedicated Playwright-managed Chromium instance. |
assistant browser --browser-mode extension navigate --url http://www.example.com
Prefer the Chrome Extension
The Chrome extension (extension mode) is the preferred browser backend. It is:
- More secure than Chrome's native remote debugging
- Uses the user's real browser profile (cookies, sessions, saved logins)
- Best experience for interacting with the user's actual browsing context
If the status check shows the extension is not available, encourage the user to install and pair it:
- Install the Vellum Assistant Chrome Extension from the Chrome Web Store: https://chromewebstore.google.com/detail/vellum-assistant-browser/hphbdmpffeigpcdjkckleobjmhhokpne
- Open the extension in Chrome and pair it with the assistant.
The status response's userActions array for the extension mode provides these same steps when the extension is not connected.
Fallback Modes
If the user declines to install the extension:
cdp-inspectโ Connects to an already-running Chrome instance via DevTools Protocol (Chrome 146+). Requires enabling remote debugging in Chrome settings.localโ Drives a dedicated Playwright-managed Chromium instance. Last resort โ does not use the user's browser profile.
Only fall back to these if the user explicitly indicates they do not want to install the extension. Prefer cdp-inspect over local.
Targeting a Specific Client
When multiple clients support host_browser (e.g. two Chrome profiles, a macOS client and a Chrome extension), use --target-client-id <id> on the assistant browser parent command to pin all operations in the invocation to one specific client:
assistant browser --target-client-id <client-id> navigate --url https://example.com
Obtain client IDs from:
assistant clients list --capability host_browser
Omit --target-client-id when only one client is connected โ the default interface-preference order (chrome-extension first, then macos) picks the best available client automatically.
Session Management
Use --session <id> on the assistant browser parent command to group sequential operations so they share browser state (same page, cookies, etc.). Different session IDs create independent browser contexts.
assistant browser --session myflow navigate --url https://example.com
assistant browser --session myflow snapshot
assistant browser --session myflow click --element-id e3
Omitting --session uses the default session.
Machine-Readable Output
Use --json on the assistant browser parent command to get structured JSON output suitable for parsing in scripts:
assistant browser --json navigate --url https://example.com
# {"ok":true,"content":"Page title: Example Domain"}
assistant browser --json snapshot
# {"ok":true,"content":"...element list..."}
assistant browser --json screenshot
# {"ok":true,"content":"...","screenshots":[{"mediaType":"image/jpeg","data":"<base64>"}]}
Error responses use {"ok":false,"error":"..."}.
Screenshots
To save a screenshot to disk, use --output <path>:
assistant browser screenshot --output page.jpg
assistant browser screenshot --full-page --output full.jpg
To receive base64 screenshot data in JSON output:
assistant browser --json screenshot
The response includes a screenshots array with mediaType and data (base64) fields.
Typical Workflow
assistant browser --json statusto check backend readiness โ if the extension is not available, help the user install it- (Optional)
assistant browser attachto establish the session assistant browser navigate --url <url>to load a pageassistant browser snapshotto discover interactive elements- Use
click,type,press-key,scroll,select-option, orhoverto interact assistant browser extractorassistant browser screenshot --output <path>to capture results- Always
assistant browser detachwhen you are done โ this releases the debugger so the user can browse freely
Interaction Strategies
Date pickers / calendars: Click the date input to open the picker, re-snapshot to see calendar controls, click month navigation arrows to reach the target month, then click the target date. For <input type="date">, use type with YYYY-MM-DD format.
Native <select> elements: Use select-option with --value, --label, or --index. Do not try to click individual <option> elements.
ARIA / custom dropdowns: Click to open, take a new snapshot, then click the desired option by --element-id.
Autocomplete inputs: Type the search text, wait 500-1000ms (wait-for --duration), re-snapshot for suggestions, then click the suggestion or use press-key --key ArrowDown + press-key --key Enter.
Multi-step forms: Complete each step, wait for the next section to load, re-snapshot to discover new elements, then proceed.
Dynamic content: After interactions that change the page, use wait-for (with --selector or --text) or re-snapshot to see updated elements before continuing.
Scrolling: Use scroll --direction down to reveal below-the-fold content before snapshotting. Long pages may require multiple scrolls.
Hover menus / tooltips: Use hover to reveal hidden menus or tooltips, then re-snapshot to see newly revealed elements.
Verification
After critical actions (form submission, booking confirmation, checkout), take a screenshot and then read the saved image to visually verify results before reporting success to the user:
assistant browser screenshot --output /tmp/verify.jpg
Then read the saved image to inspect it before reporting success. Use file_read if the screenshot was taken via bash, or host_file_read if it was taken via host_bash.