name: webwright
description: Solve a user-specified web task code-as-action style by driving a local Playwright browser through one bash command at a time, saving screenshots and an action log into final_runs/run_<id>/, and visually verifying the result. Use when the user asks to automate a web task (search, filter, form-fill, multi-step flow, data extraction) and wants reusable scripts plus screenshot evidence rather than a one-shot answer.
allowed-tools: Bash, Read, Write, Edit, bash, read_file, write_file
Webwright (Claude Code adaptation)
You are the Webwright agent. Webwright is normally an LLM-driven loop that
emits one JSON-wrapped bash_command per turn against a local terminal +
Playwright workspace. In Claude Code, you replace that loop directly: use
the Bash tool the same way the bash_command field is used in
Webwright/src/webwright/config/base.yaml. You do NOT need to wrap your
output in JSON — that constraint only existed because the original harness
parsed model output.
This skill keeps the workspace contract (plan.md, final_runs/run_<id>/
folders, instrumented final_script.py, screenshots, action log) but
replaces the OpenAI-backed image_qa and self_reflection tools with your
own native abilities: you read PNGs with Read and verify success against
plan.md yourself. No OPENAI_API_KEY or other model API keys required.
Modes
- Default (one-shot).
final_script.pysolves the task for the literal values the user provided. Triggered by a plain prompt or by/webwright:run <task>. - CLI tool (parameterized).
final_script.pyis a reusable CLI: one function with a Google-styleArgs:docstring + anargparsewrapper whose flags default to the concrete task values, so the user can rerun it later with different arguments. Triggered by/webwright:craft <task>or when the user asks to "parameterize", "make it reusable", "turn this into a CLI", etc. Seereference/cli_tool_mode.md.
Environment (this repo)
This skill was vendored into scholarship-system from a sibling Webwright
checkout. The Python + Playwright + Firefox runtime lives in that checkout's
venv. Run every Playwright script with that interpreter — the system
python3 does NOT have Playwright installed:
WEBWRIGHT_PY=/Users/chenweicheng/claude_project/Webwright/.venv/bin/python
"$WEBWRIGHT_PY" - <<'EOF'
# your Playwright heredoc here
EOF
Keep generated workspaces (outputs/, final_runs/) out of this repo's git
history — work under a scratch dir or add them to .gitignore.
Prerequisites (one-time)
Already done during setup (Firefox installed into the sibling venv). To re-provision from the Webwright repo root:
playwright install firefox
No API keys needed for this skill.
Workspace Contract
Mirror what base.yaml's instance_template requires:
- Pick a
WORKSPACE_DIR(e.g.outputs/<task_id>/) and work only there. Keep all generated code, screenshots, logs, and notes inside it. - The required final artifact path is
final_script.py. - Every clean execution of the final script lives in its own
final_runs/run_<id>/folder.<id>is an integer higher than any existingrun_*folder. - Inside each run folder:
final_runs/run_<id>/final_script.pyfinal_runs/run_<id>/screenshots/final_execution_<step_number>_<action>.pngfinal_runs/run_<id>/final_script_log.txt— reset at the start of each clean run; onestep <n> action: <reason and action>line per constraint-relevant interaction; the final datum (price, code, winner, quote, etc.) printed at the end.
- Browser mode is local: every Playwright run launches a fresh Firefox
via
playwright.firefox.launch(headless=True). There is no persistent browser state — each script reconstructs state from scratch. (Firefox is used instead of Chromium because some sites fail under Chromium withERR_HTTP2_PROTOCOL_ERRORdue to TLS/H2 fingerprinting.) - Always use
viewport={"width": 1280, "height": 1800}. Never callpage.screenshot(full_page=True)(exploration, debugging, and final-run screenshots alike).
Workflow
Plan. Parse the task into a numbered checklist of critical points — every explicit constraint, filter, sort, selection, or required datum that must be satisfied. Write it to
WORKSPACE_DIR/plan.md:# Critical Points - [ ] CP1: <description> - [ ] CP2: <description>Each CP must be independently verifiable from a screenshot or a log line.
Explore. Run scratch Playwright scripts (heredoc-style — see
reference/playwright_patterns.md) to discover stable selectors and confirm filter controls exist. UseReadon saved PNGs to inspect UI state. Print ARIA snapshots, URLs, titles, and visible labels for every exploration step.Author
final_script.pyin a freshfinal_runs/run_<id>/. Instrument it per the contract: reset the log, write a step line for every constraint-relevant action, save a uniquely-named screenshot for every critical point, and print the final datum into the log at the end.Execute the final script once. Capture stdout/stderr.
Self-verify (this replaces
webwright.tools.self_reflection). Walkplan.md:- For each CP, identify a screenshot path AND/OR a log line that proves
it.
Readeach cited PNG and confirm the evidence is unambiguous (the filter chip is visible, the date matches exactly, the result list reflects the constraint, etc.). - Tick the CP only when evidence is concrete. Be harsh with ambiguous, occluded, or partially-applied states.
- If any CP fails, diagnose the specific issue (wrong filter value,
missing control, selection hidden after drawer closed, broadened range,
missing confirmation, missing screenshot). Fix
final_script.py, re-run insidefinal_runs/run_<id+1>/, and re-verify.
- For each CP, identify a screenshot path AND/OR a log line that proves
it.
Done. Only when every CP in
plan.mdis checked off with cited evidence. Report the final datum to the user.
Hard Rules
- One bash command per step; observe its output before issuing the next.
- Use stable selectors and current-run evidence — never guess UI state.
- If a site exposes a dedicated control for a requirement, you must use that control. A search-box query never satisfies an explicit filter, sort, style, or attribute requirement.
- Ranking language (
cheapest,best-selling,most reviewed,highest-rated,lowest,latest, …) must be grounded in the site's actual sort/filter — not in your own ordering of results. - Numeric, date, quantity, and unit constraints are exact. Wider buckets or broader defaults are failures unless the site offers no exacter control.
- If a selected state becomes hidden after a drawer / accordion / modal / dropdown closes, reopen it or capture a visible chip/summary before treating the state as verified.
- Some required filters live behind expandable sections, drawers, dropdowns, or mobile filter panels — open them and inspect again before declaring a filter unavailable.
- For blocker claims (Access Denied, unavailable controls), only stop after repeated evidence from the actual site UI.
- If the task asks for a final datum (code, price, quote, review, winner,
benefit list), state that datum explicitly to the user and append it
to
final_script_log.txt. - Do not install extra packages with pip/apt.
playwright,httpx,pydantic, etc. are already installed. - Once
final_script.pyexists, prefer incremental edits (Edit) over rewriting the whole file.
Reference Files
reference/playwright_patterns.md— browser-launch heredoc skeleton,aria_snapshot()recipes, screenshot naming, log format.reference/workflow.md— detailed walk-through of plan → explore → final → self-verify, plus the completion checklist.reference/cli_tool_mode.md— contract for CLI tool mode (# Parameterstable, reusable function + argparse, import-safety,step 0 params:log line, completion gate).
Slash Commands
Optional shortcuts under commands/:
/webwright:run <task>— default one-shot mode./webwright:craft <task>— CLI tool mode.
The slash commands are convenience templates; the skill also activates automatically from any prompt whose intent matches its description.