name: web-browser description: "Browse the web, take screenshots, fill forms, and interact with pages via the chrome-devtools MCP server. Auto-triggers when user mentions visual bugs, wants to check a page, needs to fill a form, or discusses anything that benefits from seeing a live website."
Web browser skill
Browser automation via the chrome-devtools MCP server. Uses Helium (Chromium).
When to use this skill
Activate when the user:
- Says "check this page", "look at", "open", "browse to", "go to [url]"
- Reports a visual bug, layout issue, or styling problem
- Asks "what does X look like", "can you see", "screenshot"
- Needs to fill a form, log in, or interact with a web UI
- Wants to test responsive/mobile layout
- Mentions a URL and seems to want you to inspect it
- Says "pick", "show me", "which element", "point at"
Do NOT ask for permission to use the browser — just use it. Navigate, snapshot, screenshot as needed to answer the user's question.
Quick reference
See a page
navigate_page → go to URL
take_screenshot → capture what it looks like
take_snapshot → get accessibility tree with element uids
Interact with elements
click {uid: "..."} → click element from snapshot
fill {uid, value} → fill input field
fill_form {fields} → fill multiple fields
type_text {text} → type keystrokes
press_key {key} → Enter, Tab, Escape, etc.
Let the user pick
Read scripts/pick.js and pass its contents as the function parameter to
evaluate_script. The picker shows a blue highlight overlay — user clicks to
select, Cmd+Click for multi-select, Enter to finish, Escape to cancel.
# Agent workflow:
# 1. Read the picker script
read scripts/pick.js
# 2. Pass the function text to evaluate_script
mcp chrome-devtools evaluate_script {"function": "<contents of pick.js>"}
# 3. User clicks in the browser → result returned as JSON
Other tools
| Tool | Purpose |
|---|---|
new_page |
Open new tab |
list_pages |
List open tabs |
select_page |
Switch tab |
close_page |
Close tab |
hover |
Hover over element |
drag |
Drag element |
upload_file |
Upload to file input |
handle_dialog |
Accept/dismiss alerts |
wait_for |
Wait for selector, URL, or time |
emulate |
Mobile device emulation |
resize_page |
Change viewport size |
list_network_requests |
See network activity |
list_console_messages |
See console output |
lighthouse_audit |
Run Lighthouse |
evaluate_script |
Run arbitrary JS in page |
Browsing modes
Isolated (default)
Every chrome-devtools tool call uses a fresh temp profile. No cookies, no
logins. Works for public pages, testing, screenshots.
Copied profile (logged-in sessions)
When the user needs their logged-in session (e.g., "check my dashboard", "look at my account page"):
- Run the copy script:
./scripts/copy-profile.sh # copy daily Helium profile ./scripts/copy-profile.sh brave # copy daily Brave Nightly profile - Tell the user: "I've copied your profile. To use it, enable
chrome-devtools-profilein~/.pi/agent/mcp.json(setdisabled: falseand setchrome-devtoolstodisabled: true), then reload the session."
Attach to running Helium
When the user's daily Helium is running with a debug port:
helium --remote-debugging-port=9223 # user launches from fish
Use the chrome-devtools-attach MCP server (enable in mcp.json, disable
the default chrome-devtools). This shares the live session — all cookies,
tabs, and extensions.
Tips
- Prefer
take_snapshotovertake_screenshotfor understanding page structure — it's text, cheaper, and gives you uids to click. - Use
take_screenshotwhen the user asks about visual appearance or layout. - Combine both: snapshot to find the uid, screenshot to verify visually.
- The
chrome-cdpskill is separate — it connects to existing browser tabs viachrome://inspect. Use it for inspecting the user's actual open tabs.