capture

name: capture version: 0.5.0 description: > Captures HTTP traffic from a web app using playwright-cli — site fingerprinting (framework, protections, auth, API discovery) plus full traffic recording into raw-traffic.json. Use as Phase 1 of CLI generation whenever a target URL needs its API surface recorded or assessed. when_to_use: > Trigger phrases: "record traffic from", "capture API calls from", "start Phase 1", "analyze traffic from URL", "assess site", "site fingerprint", "open browser for", or any URL given as the first step of CLI generation. Not for Phase 2 implementation, test writing, or quality validation.

Traffic Capture (Phase 1)

Assess the site, then capture comprehensive HTTP traffic. This skill combines site assessment with full traffic recording in a single browser session.

Copy this checklist and check off items as you complete them:

Phase 1 Progress:
- [ ] Prerequisites: playwright-cli available, checkpoint restored if one exists
- [ ] Step 1: Browser open on target URL
- [ ] Step 2: Fingerprint run + auth handled + profile classified + assessment.md written
- [ ] Step 3: Full capture — exploration targets met for the profile
- [ ] Step 4: Trace parsed, validate-capture.py PASSED (the gate)
- [ ] Step 5: Browser closed, checkpoint + phase-state marked complete

CRITICAL EXECUTION RULES

NEVER use run_in_background: true for ANY playwright-cli command. All playwright-cli commands must run in the foreground with appropriate timeouts. Background execution causes task ID tracking failures — the command completes before you can read the output. See references/playwright-cli-commands.md for the timeout table.

NEVER use eval for complex expressions. eval fails silently on ternaries, comma operators, and multi-branch logic with "not well-serializable" errors. Use run-code instead. See references/framework-detection.md for details.

ESM context — no require(). run-code uses ESM. Use await import('fs') instead of require('fs'). See references/playwright-cli-commands.md.

Prerequisites (Hard Gate)

Do NOT start unless:

playwright-cli is available (npx @playwright/cli@latest --version)
Target URL is known

Default capture method: playwright-cli tracing (standard workflow below).

Optional `--mitmproxy` mode

When the --mitmproxy flag was passed to /cli-anything-web (or the capture needs untruncated bodies / enhanced metadata), follow this workflow with the step substitutions in references/mitmproxy-mode.md — Steps 1, 3, and 4 differ; everything else is identical. When playwright-cli itself is unavailable (e.g., no Node), fall back to chrome-devtools-mcp via launch-chrome-debug.sh — see HARNESS.md Tool Hierarchy.

Public API Shortcut

If the target site has a documented public REST/JSON API (e.g., Hacker News Firebase API, Dev.to API, Reddit API, Wikipedia API), browser capture is optional:

Probe the API endpoints directly with httpx or curl
Save responses as <app>/traffic-capture/raw-traffic.json
Skip to Phase 2 (methodology)

This applies when:

API docs exist (OpenAPI/Swagger, developer docs page, /api/ prefix)
The API is publicly accessible without browser-specific auth
Endpoints return JSON (not HTML)

If unsure whether a public API exists, proceed with browser capture as normal.

Resume from Checkpoint

Before starting, check if a previous capture session exists:

python ${CLAUDE_PLUGIN_ROOT}/scripts/capture-checkpoint.py restore <app>

If a checkpoint exists, read the guidance field and resume from the last completed step instead of starting over. This prevents duplicate work when sessions are interrupted.

Step 1: Setup

# Create output directory
mkdir -p <app>/traffic-capture

# Clear any stale sessions
npx @playwright/cli@latest kill-all 2>/dev/null || true

npx @playwright/cli@latest -s=<app> open <url> --headed --persistent
# Note: heavy SPAs (Next.js, React) may show "TimeoutError: page._snapshotForAI" on open.
# This is non-fatal — verify with: npx @playwright/cli@latest list
#
# IMPORTANT — "Browser opened with pid..." in command output means the daemon
# RE-ATTACHED to the existing browser, NOT that a new session was created.
# Do NOT re-navigate or restart when you see this. The session is still open.

--mitmproxy mode: use the Step 1 substitution in references/mitmproxy-mode.md (start the proxy, then open through it).

Do NOT ask the user to log in yet — Step 2 will determine if auth is needed.

Step 2: Site Fingerprint (Single Command)

Run the all-in-one site fingerprint command instead of individual eval calls. This is faster, more reliable, and detects framework + protection + iframes + auth requirements in one shot.

Use the script file — multi-line JS with arrow functions and optional chaining fails in playwright-cli's single-line command parser. The script file approach has been tested and works reliably:

npx @playwright/cli@latest -s=<app> run-code "$(grep -v '^\s*//' ${CLAUDE_PLUGIN_ROOT}/scripts/site-fingerprint.js | tr '\n' ' ')"

IMPORTANT: The site-fingerprint.js script must be loaded via the command above. Do NOT copy-paste the JS inline — it will fail with SyntaxError. The grep -v strips comments and tr joins lines for single-line execution.

Interpret fingerprint results

The fingerprint returns four groups: framework, protection, auth, iframes. Map each true flag to the next action:

Group	Action
framework	See `references/framework-detection.md` for the full protocol table (`googleBatch` / `nextPages` / `nextApp` / `nuxt` / `spaRoot`).
protection	See `references/protection-detection.md` — always start at the escalation ladder at the top (plain httpx → curl_cffi → curl_cffi + cookies → camoufox → hybrid).
auth	Table below (Auth detection section).
iframes	If `iframeCount > 0`, see `references/playwright-cli-advanced.md` for the in-iframe re-run snippet.

Claude-facing shortcuts:

googleBatch: true → generate rpc/ subpackage (batchexecute protocol).
cloudflareManagedChallenge: true → tier 4 (camoufox) is required; curl_cffi alone will fail.
awsWaf: true → capture aws-waf-token cookie; use curl_cffi for GraphQL, cookie-only for SSR.
akamai: true or datadome: true → 1–2 s delays between requests are mandatory.
serviceWorker: true → note in assessment.md; generated CLI uses service_workers="block".
iframeCount > 0 → re-run the fingerprint inside the iframe. Google Labs apps (Stitch / MusicFX / ImageFX) follow this pattern — parent has WIZ_global_data, iframe has the real app.

Note: snapshot and click <ref> auto-resolve iframes. Only drop down to run-code for iframe interaction when built-in commands fail.

Auth detection (BEFORE exploration)

Check the fingerprint auth fields:

Condition	Meaning	Action
`hasLoginButton && !hasUserMenu`	Login required, not logged in	Ask user to log in NOW
`hasUserMenu`	Already logged in	Proceed to capture
`!hasLoginButton && !hasUserMenu`	No auth needed (public site)	Skip auth, proceed

If auth is needed:

Tell the user: "This site requires login. Please log in in the browser window."
Wait for user confirmation
Save auth state and tighten permissions (CLAUDE.md mandates chmod 600):

npx @playwright/cli@latest -s=<app> state-save <app>/traffic-capture/<app>-auth.json
chmod 600 <app>/traffic-capture/<app>-auth.json

If NO auth is needed: Skip directly to Step 2b.

2b. Classify Site Profile

Based on fingerprint results AND what you see in the UI, classify the site:

Profile	Auth?	Operations	Exploration Focus
Auth + CRUD	Yes	Create, Read, Update, Delete	Full CRUD per resource
Auth + Generation	Yes	Generate, Poll, Download	Generation lifecycle + projects
Auth + Read-only	Yes	Read, Search, Export	Read operations + auth flow
No-auth + CRUD	No/Optional	Full CRUD	Skip auth, full CRUD
No-auth + Read-only	No	Read, Search	Minimal capture

2c. Quick API Probe (Force SPA Navigation Trick)

Start a SHORT trace, click 3-4 internal links, stop. This reveals hidden API endpoints that SSR hides on initial page load.

npx @playwright/cli@latest -s=<app> tracing-start
npx @playwright/cli@latest -s=<app> click <internal-link-1>
npx @playwright/cli@latest -s=<app> click <internal-link-2>
npx @playwright/cli@latest -s=<app> click <internal-link-3>
npx @playwright/cli@latest -s=<app> tracing-stop

# Quick parse to see what endpoints appeared (saved alongside the full capture
# so it survives the session — don't output to /tmp).
python ${CLAUDE_PLUGIN_ROOT}/scripts/parse-trace.py .playwright-cli/traces/ --latest \
  --output <app>/traffic-capture/probe-traffic.json

This probe trace is separate from the full capture in Step 3 — Step 3 will start a fresh trace that overwrites the .network file in .playwright-cli/traces/. The parsed probe-traffic.json is kept in traffic-capture/ so it stays available for cross-referencing during Step 4.

Check the probe results — what API patterns did you find? See references/api-discovery.md for the priority chain and decision tree.

2d. Write Assessment Summary

Create <app>/traffic-capture/assessment.md to consolidate all findings:

# Site Assessment: <app>

- **URL**: <url>
- **Framework**: <detected framework or "none/custom">
- **Protocol**: <REST / GraphQL / batchexecute / HTML scraping / hybrid>
- **Protection**: <none / cloudflare / captcha / aws-waf / etc.>
- **Auth required**: <yes (type: Google SSO / cookie / JWT / API key) / no>
- **Iframes**: <yes (N frames, app in frame N at <url>) / no>
- **Site profile**: <Auth+CRUD / Auth+Generation / Auth+Read-only / No-auth+CRUD / No-auth+Read-only>
- **Capture strategy**: <API-first / SSR+API hybrid / batchexecute / HTML scraping / protected-manual>
- **Key observations**: <any quirks, localized UI, rate limits, special patterns>

Step 3: Full Traffic Capture

Now do the comprehensive capture based on what Step 2 revealed.

# Optional: Start HAR recording alongside trace for standard-format capture
# HAR files enable mitmproxy2swagger (auto OpenAPI spec) and third-party analysis tools
npx @playwright/cli@latest -s=<app> run-code "async page => {
  await page.context().routeFromHAR('<app>/traffic-capture/capture.har', {
    update: true,
    updateContent: 'embed',
    updateMode: 'full'
  });
  return 'HAR recording started';
}"

# Start fresh trace for full capture (note the trace ID from output!)
npx @playwright/cli@latest -s=<app> tracing-start
# Output: "trace-<ID>" — record this ID

--mitmproxy mode: skip tracing-start and HAR recording — the proxy is already capturing (see references/mitmproxy-mode.md).

HAR recording is optional but recommended. It produces a standard HAR file alongside the trace. This enables mitmproxy2swagger to auto-generate an OpenAPI spec: pip install mitmproxy2swagger && mitmproxy2swagger -i capture.har -o api-spec.yaml -p <base-url> The HAR file is saved when the browser context is closed (Step 5).

Exploration by site profile

Use the concrete targets in references/exploration-checklists.md for the profile identified in Step 2b. Each profile has an explicit entry count, distinct-path count, and WRITE-op target that validate-capture.py (Step 4) will enforce. Minimum bar across all profiles:

≥ 15 entries, ≥ 3 distinct URL paths, protocol ≠ unknown
≥ 1 WRITE op (unless the site is genuinely read-only — pass --read-only to the validator)
< 50% error rate (dominant 4xx/5xx means auth or rate-limit failure)

Pacing for protected sites

If any of cloudflare, cloudflareManagedChallenge, akamai, datadome, awsWaf, or rateLimit fired in the fingerprint, leave 1–2 s between clicks / form submits. Faster exploration triggers per-IP challenges within ~30 requests and corrupts the trace.

General interaction rules

Click by ref (from snapshot) is most reliable: snapshot → note ref → click <ref>
Refs go stale — always take a fresh snapshot before clicking
For localized UIs (Hebrew, Arabic, etc.) — use refs or data-testid, not text
For iframe-embedded apps — snapshot + click <ref> auto-resolves iframes
Wait after generation — if the app generates content async, wait for ≥ 15 s before the next action, otherwise the polling loop won't appear in the trace:
```
npx @playwright/cli@latest -s=<app> run-code "async page => {
  await page.waitForTimeout(15000);
  return 'waited';
}"
```
Debounced inputs — after typing a search query, pause 1–2 s before the next action; submitting immediately misses the auto-complete endpoint.

Step 4: Stop, Save, Parse

npx @playwright/cli@latest -s=<app> tracing-stop

If tracing-stop fails: retry once with a 15s timeout; if it fails again the trace is lost — restart the trace at Step 3. Never retry more than twice. Full decision tree: skills/shared/RECOVERY.md §tracing-stop Failure (error signatures: references/playwright-cli-tracing.md).

python ${CLAUDE_PLUGIN_ROOT}/scripts/parse-trace.py \
  .playwright-cli/traces/ --latest \
  --output <app>/traffic-capture/raw-traffic.json

# parse-trace.py now auto-runs analyze-traffic.py and produces:
#   - <app>/traffic-capture/raw-traffic.json (raw request/response data)
#   - <app>/traffic-capture/traffic-analysis.json (auto-detected protocol, auth, endpoints)

# Gate — validate the capture before declaring Phase 1 complete.
# This check enforces: ≥15 entries, ≥3 distinct paths, protocol ≠ unknown,
# ≥1 WRITE op (add --read-only if the site is genuinely read-only), <50% error rate.
python ${CLAUDE_PLUGIN_ROOT}/scripts/validate-capture.py <app>
# OR for genuinely read-only sites:
# python ${CLAUDE_PLUGIN_ROOT}/scripts/validate-capture.py <app> --read-only

If parse-trace.py fails or produces an empty/static-only raw-traffic.json, follow skills/shared/RECOVERY.md §parse-trace Failure.

If validate-capture.py returns a non-zero exit code, do not proceed to Step 5. Map each failed gate to its targeted remediation in skills/shared/RECOVERY.md §validate-capture Non-Zero Exit (e.g., <15 entries → capture more pages; <3 distinct paths → exercise more features; no WRITE op → perform a create/update/delete in the UI). Re-open the browser (Step 1), fill the gaps, then re-run Step 4. Only mark the capture complete after the validator passes (or warns, with your explicit sign-off on each warning).

For deeper inspection:

python ${CLAUDE_PLUGIN_ROOT}/scripts/analyze-traffic.py \
  <app>/traffic-capture/raw-traffic.json --summary

--mitmproxy mode: replace the parse/analyze block with the Step 4 substitution in references/mitmproxy-mode.md (stop-proxy already includes analysis; the same validate-capture gate applies).

Step 5: Close

npx @playwright/cli@latest -s=<app> close

# Mark the capture checkpoint AND the pipeline phase complete
python ${CLAUDE_PLUGIN_ROOT}/scripts/capture-checkpoint.py update <app> --step complete
python ${CLAUDE_PLUGIN_ROOT}/scripts/phase-state.py complete <app> --phase capture \
  --output traffic-capture/raw-traffic.json

If an endpoint is missing — USE THE FEATURE

Don't grep JS bundles. Start a new trace → screenshot → click the button → fill → submit → stop → parse. The browser IS the API documentation.

Fallback

Fallback: If playwright-cli is not available, see HARNESS.md Tool Hierarchy for chrome-devtools-mcp fallback instructions.

Next Step

When capture is complete (raw-traffic.json has WRITE operations, or the site is read-only with only GET requests), invoke methodology to analyze the traffic and build the CLI.

References

Gate failures (tracing-stop, parse-trace, validate-capture, phase-state): skills/shared/RECOVERY.md. Implementation rules: skills/shared/CONVENTIONS.md.

See references/ for:

playwright-cli-commands.md — command syntax, timeouts, ESM rules
playwright-cli-tracing.md — trace file format, recovery protocol
playwright-cli-sessions.md — named sessions, auth persistence
playwright-cli-advanced.md — waits, iframes, localized UIs, downloads
framework-detection.md — framework → protocol table
protection-detection.md — anti-bot escalation ladder (curl_cffi → camoufox → hybrid)
api-discovery.md — protocol priority chain, decision tree
exploration-checklists.md — per-profile capture targets with concrete numbers
mitmproxy-mode.md — step substitutions for --mitmproxy capture