name: plan description: Decompose a task into steps with completion checkpoints. when_to_use: For multi-step workflows or broad testing objectives.
Plan Skill
Structure your approach before acting. Decompose broad objectives into sequenced steps with completion criteria at key milestones.
When to Use
- The task describes a multi-step workflow (login, search, checkout, etc.)
- The objective is broad ("test fundamental functionalities")
- You are unfamiliar with the page structure
- The task involves state that carries across steps (form data, cart items)
Skip this skill for single-action tasks ("click the login button").
Planning Phases
Phase 1: Understand the Objective
- What is the end goal? What does success look like?
- Which pages/features are involved? Ignore unrelated ones.
- What data flows through the workflow? (e.g., search query -> results -> detail page)
Phase 2: Sequence the Steps
Order steps as a continuous workflow, each following from the previous outcome.
- Aim for 4-8 steps for substantial tasks, fewer for simple ones. Each step is one reasoning unit — typically 1-3 tool calls that achieve a single sub-goal.
- Use semantic intent, not element selectors:
- Good:
"Click the 'Submit' button below the form" - Bad:
"Click element #btn_47"or"Click at coordinates (400, 300)"
- Good:
- For each step, state what and why:
- Good:
"Enter a search query to test the search flow" - Bad:
"Type 'hello world' in the search box"
- Good:
- Mark dependencies between steps: if step 4 depends on data from
step 2's outcome, say so explicitly.
- Example:
"Step 4 uses the product name found in step 2's results"
- Example:
Phase 3: Place Completion Checkpoints
Insert a checkpoint every 3-5 steps at key milestones.
For each checkpoint, define what to verify (the completion condition), not how to verify it:
- Good:
"Verify: search results page loaded with matching items" - Bad:
"take_snapshot and check for a results container"
The completion condition answers: what state proves this milestone was reached? The execution cycle (system prompt Step 4) handles the how.
Merge multiple conditions into one checkpoint to reduce round-trips:
- Good: one checkpoint verifying title + items + no error state
- Bad: separate checkpoints for each
Observation Batching
The engine runs read-only tools concurrently. Batch independent observations in a single turn for efficiency:
Turn N (one LLM response, all run in parallel):
- take_snapshot -> DOM structure
- take_screenshot -> visual state
- list_console_messages -> JS errors
- list_network_requests -> failed API calls
Use this pattern at completion checkpoints for a comprehensive view without extra round-trips.
Multi-Tab Workflows
You have full tab management. Use it when beneficial:
- Preserve state: Open a link in a new tab (
new_page) to inspect it without losing the current page's form state. - Compare pages: Keep the original open, navigate the new tab,
then
select_pageback to compare. - Clean up:
close_pagewhen a tab is no longer needed.
Re-planning
The plan is a living document, not a fixed script. When execution reveals new information that invalidates the plan:
- The original page structure differs from expectations.
- A step's outcome opens a different path than planned.
- A step is blocked and must be skipped.
Revise the remaining steps based on what you now know. Do not force the original plan when reality has diverged.
Completion
When the objective is fully achieved, stop executing even if you planned more steps. Remaining steps that add no new information are waste. State clearly what was accomplished and why remaining steps are unnecessary.
Do not loop after completion. Do not retry successful actions.
Error Handling
When a tool returns an error or an action produces unexpected results,
load the recovery skill for structured guidance:
load_skill(skill_name="recovery")
The recovery skill provides error classification, diagnosis with progress assessment, and concrete recovery strategies with escalation.