name: browser description: Use the built-in background browser for interactive web tasks, including opening pages, inspecting the DOM, filling forms, clicking through multi-step flows, and verifying page state. Use this when real browser interaction is required instead of only fetching static text.
Browser
Use this skill for real-time browser interaction in a hidden background session.
Tool Set (Current Agent Support)
Use only these tools:
| Tool | Purpose |
|---|---|
BrowserOpen(url) |
Open/reuse hidden background browser session and navigate to URL |
BrowserClick(selector) |
Click element by CSS selector |
BrowserType(selector, text, pressEnter?) |
Type into input; optional Enter submit |
BrowserScreenshot() |
Capture current page screenshot |
BrowserRead() |
Extract current page as markdown-like readable text |
BrowserReadDom(limit?) |
List interactive elements with selector hints |
BrowserBack() |
Navigate backward |
BrowserForward() |
Navigate forward |
BrowserReload() |
Reload current page |
BrowserEval(code) |
Execute JavaScript in page context |
BrowserClose() |
Close background browser session |
Do not reference WebFetch or WebSearch in this skill. They are not part of this browser toolset.
Default Workflow
- Open page with
BrowserOpen. - Capture initial state with
BrowserScreenshot. - Discover operable elements with
BrowserReadDom(setlimitwhen pages are large). - Interact with
BrowserClick/BrowserType. - Re-capture with
BrowserScreenshotafter each key action/navigation. - Use
BrowserReadwhen textual content is needed. - Use
BrowserBack/BrowserForward/BrowserReloadfor navigation control. - Close with
BrowserClosewhen task is complete.
Element-Finding Strategy
- Prefer
BrowserReadDomfirst to get selector candidates. - Validate visually with
BrowserScreenshot. - Use stable selectors:
id,name,type,data-*, or constrained attribute selectors. - If selectors are unstable or hidden, use
BrowserEvalfor custom logic.
Execution Rules
- Take a screenshot before first interaction and after each major step.
- If a click triggers navigation or async rendering, wait for page settle, then screenshot again.
- For object/array returns in
BrowserEval, wrap withJSON.stringify(...)for stable output. - Keep each interaction atomic and verifiable; avoid chaining many risky actions without checkpoints.
- Always close the browser at the end unless the user explicitly asks to keep it open.
Example: Search Flow
1. BrowserOpen(url: "https://example.com")
2. BrowserScreenshot()
3. BrowserType(selector: "input[name='q']", text: "search query", pressEnter: true)
4. BrowserScreenshot()
5. BrowserRead()
6. BrowserClose()
Example: Multi-Step Form
1. BrowserOpen(url: "https://example.com/register")
2. BrowserScreenshot()
3. BrowserType(selector: "#name", text: "John Doe")
4. BrowserType(selector: "#email", text: "john@example.com")
5. BrowserClick(selector: "button[type='submit']")
6. BrowserScreenshot()
7. BrowserClose()
Example: Advanced Eval
BrowserEval(code: "window.scrollTo(0, document.body.scrollHeight)")
BrowserEval(code: "document.querySelector('select#country').value = 'US'; document.querySelector('select#country').dispatchEvent(new Event('change', { bubbles: true }))")
BrowserEval(code: "JSON.stringify(Array.from(document.querySelectorAll('a')).map(a => ({ text: a.textContent.trim(), href: a.href })))")