name: browser-automation description: Use for complex Agent Zero browser automation, including multi-tab browsing, screenshots, forms, uploads, raw pointer/keyboard actions, host-vs-container browser mode, and visual verification workflows.
Browser Tool
Use the browser tool for rendered pages, forms, logins, downloads, JavaScript-heavy sites, screenshots, and visual inspection. Prefer search_engine or document_query for plain text research.
Core Workflow
opencreates a browser tab and returns abrowser_id.contentreturns readable markdown plus typed refs like[link 3],[button 6],[input text 8].- Interact with refs using
click,type,submit,scroll, etc. - Use
navigateon an existingbrowser_idfor serial browsing. - Keep only a small working tab set; close pages when finished.
Modes
The same tool may run in Docker container mode or A0 CLI host-browser mode, depending on project/plugin settings.
- Container mode: browser and upload paths resolve inside the Agent Zero container.
- Host mode: browser and upload paths resolve on the connected A0 CLI host machine.
In host mode, page content and screenshots may be blocked by host-content policy when remote models are active.
Screenshots And Vision
Screenshots are explicit only; the browser does not automatically load images into model context.
- Call
browserwithaction: "screenshot". - Call
vision_loadwith the returnedvision_load.tool_args.pathsvalue. - Reason from the latest loaded screenshot.
Screenshot args include quality, full_page, and optional path. Without path, the screenshot is saved as a chat-scoped artifact and returned through vision_load.tool_args.paths; with path, PNG is used when path ends with .png, otherwise JPEG is used.
Forms And Files
select_optionworks for native selects and detectable ARIA listbox/combobox controls.set_checkedworks for checkbox, radio, switch, and toggle-like refs.upload_fileworks for file input refs or associated labels; verify the file exists in the active browser environment.- For fragile forms, load skill
browser-form-workflows.
Pointer And Keyboard
hover,double_click,right_click, anddragaccept refs or viewport coordinates.- Coordinates are Chromium viewport CSS pixels and match screenshots.
key_chordpresses keys in order and releases in reverse.clipboardactions are copy, cut, or paste.set_viewportresizes the page viewport.
Tabs And Popups
- Popups and target-blank tabs are auto-registered.
listshows open tabs; passinclude_content: truesparingly.set_activedeliberately changes focus.- Operations on a non-active tab do not steal focus unless browser rules require it.
Browser Action Multi
multi is only a browser action, never a top-level tool. Use:
{
"tool_name": "browser",
"tool_args": {
"action": "multi",
"calls": [
{"action": "content", "browser_id": 1},
{"action": "screenshot", "browser_id": 2}
]
}
}
Use browser action multi for parallel reads across tabs. Avoid mutating the same tab twice in one batch unless serial order is intended.