name: methodology version: 0.3.0 description: > Analyzes captured HTTP traffic, designs the CLI architecture, and implements the Python CLI package (Phase 2): parse raw-traffic.json, identify the protocol, write api-spec.json, scaffold from templates, and implement endpoint methods and Click command groups. Use after a capture completes and raw-traffic.json exists. when_to_use: > Trigger phrases: "analyze traffic", "design CLI", "implement CLI", "build CLI from network traffic", "generate API wrapper", "reverse engineer web API", "start Phase 2", or after the capture skill finishes. Not for traffic recording (capture), test writing (testing), or quality checks (standards).
CLI-Anything-Web Methodology (Phase 2)
Analyze captured traffic, design the CLI command structure, and implement the complete Python CLI package. This skill owns the core transformation from raw HTTP traffic to a production-ready CLI.
Copy this checklist and check off items as you complete them:
Phase 2 Progress:
- [ ] Prerequisites: raw-traffic.json exists (+ auth state if the site needs auth)
- [ ] Step A: traffic analyzed, protocol identified, <APP>.md written
- [ ] Step A: api-spec.json written (every endpoint cites raw-traffic.json evidence)
and passes `cli-web-devkit spec validate`
- [ ] Step B.0: scaffolded via scaffold-cli.py (.manifest.json present)
- [ ] Step B: client endpoint methods implemented from the spec
- [ ] Step B: command modules implemented + registered, REPL help in sync
- [ ] Smoke check passed (no protocol leaks), phase-state marked complete
Prerequisites (Hard Gate)
Do NOT start unless:
-
raw-traffic.jsonexists (with WRITE operations, or read-only GET-only traffic) - Auth state was captured during Phase 1 (if the site requires auth)
If raw-traffic.json is missing or has no WRITE operations, invoke the
capture skill first. If Phase 1 state shows failed, follow
skills/shared/RECOVERY.md §phase-state Check Failures before re-running.
Exception for read-only sites: If the site is genuinely read-only (search engine,
dashboard, analytics viewer with no create/update/delete), the trace may contain only
GET requests. In this case, note "read-only site — no write operations" in <APP>.md
and proceed. The generated CLI will have read-only commands (list, get, search) but
no create/update/delete commands. This is valid.
No-auth sites: If the target site requires no authentication (public API,
no login needed), the "Auth state captured" prerequisite does not apply. Note
"no-auth site" in <APP>.md and proceed.
Step A: Analyze (API Discovery)
Goal: Map raw traffic to a structured API model.
Process:
Read
traffic-analysis.jsonfirst (if it exists alongsideraw-traffic.json). This file is auto-generated byparse-trace.pyormitmproxy-capture.py→analyze-traffic.pyand contains pre-detected protocol type, auth pattern, endpoint grouping, GraphQL operations, batchexecute RPC IDs, and suggested CLI commands. Use it as a starting point — verify its findings and fill in anything marked "unknown" by readingraw-traffic.jsonmanually.Enhanced analysis (present only when captured via mitmproxy):
request_sequence(timeline-ordered requests with auth-flow detection),session_lifecycle(cookie inventory, auth-cookie identification, session pattern), andendpoint_sizes(response-size classification). If these are missing (has_timestamps: false), the capture came from the default trace path — rely on manual analysis for sequence/session detail.If
traffic-analysis.jsondoesn't exist, run the analyzer:python ${CLAUDE_PLUGIN_ROOT}/scripts/analyze-traffic.py \ <app>/traffic-capture/raw-traffic.json --summaryParse
raw-traffic.json(for details the analyzer couldn't extract)Group requests by base path (e.g.,
/api/v1/boards/,/api/v1/items/)For each endpoint group, identify:
- HTTP method (GET/POST/PUT/DELETE/PATCH)
- URL pattern (extract path parameters like
:id) - Query parameters and their types
- Request body schema (JSON fields, types, required/optional)
- Response body schema
- Authentication method (Bearer token, cookie, API key)
- Rate limiting signals (429 responses, retry-after headers)
Identify RPC protocol type -- classify the API transport:
Protocol Detection Signal Client Pattern REST Resource URLs ( /api/v1/boards/:id), standard HTTP methodsclient.pywith method-per-endpointGraphQL Single /graphqlendpoint,query/mutationin bodyclient.pywith query templatesgRPC-Web application/grpc-webcontent type, binary payloadsProto-based client Google batchexecute batchexecutein URL,f.req=body,)]}'\nprefixrpc/subpackage (seereferences/google-batchexecute.md)Custom RPC Single endpoint, method name in body, proprietary encoding Custom codec module Public REST API Documented /api/endpoints, OpenAPI spec, JSON responsesStandard client.pywith httpxPlain HTML (no framework) No SPA root, no framework globals, data in <table>/<div>client.pywith httpx + BeautifulSoup4This determines client architecture in Step B -- REST uses simple
client.py, non-REST protocols need a dedicatedrpc/subpackage with encoder/decoder/types.Detect data model:
- Entity types (boards, items, users, projects...)
- Relationships (board has many items, item belongs to board)
- ID formats (UUID, numeric, slug)
Detect auth pattern:
- Cookie-based sessions
- Bearer/JWT tokens
- OAuth refresh flow
- API key headers
- Browser-delegated auth: tokens embedded in page JavaScript (e.g.,
WIZ_global_data), not in HTTP headers. Requires CDP for initial cookies, HTTP for token extraction. Seereferences/auth-strategies.md"Browser-Delegated Auth" section. - No auth / public access: fully public API, no login required. CLI may optionally support API key auth for write operations (e.g., dev.to).
Write
<APP>.md-- software-specific SOP documentWrite
agent-harness/api-spec.json-- the machine-readable API spec. Every endpoint MUST carry anevidencefield citing its captured traffic entry (raw-traffic.json#<index>) — never invent endpoints (this is the structural enforcement of the RPC-ID verification rule). Schema and validator:cli-web-devkit spec validate <app>/agent-harness/api-spec.jsonDownstream consumers: client method implementation (Step B), the gap-analyzer (
cli-web-devkit gaps), and the traffic-fidelity review in Phase 4 (spec-vs-traffic becomes a deterministic diff).
Output: <APP>.md (human SOP) + api-spec.json (machine spec, validated).
References: traffic-patterns.md, google-batchexecute.md, ssr-patterns.md
Step B: Implement (Code Generation)
Study Existing CLIs First (Critical for Accuracy)
Before implementing, read an existing CLI that uses the same protocol as your target. These are battle-tested implementations that solved the same problems you'll face.
| Protocol | Reference CLI | Key files to read |
|---|---|---|
| Google batchexecute | notebooklm/agent-harness/cli_web/notebooklm/ |
core/rpc/encoder.py, core/rpc/decoder.py, core/client.py, core/auth.py |
| GraphQL + WAF | booking/agent-harness/cli_web/booking/ |
core/client.py (curl_cffi + GraphQL), core/auth.py (WAF tokens) |
| HTML scraping | futbin/agent-harness/cli_web/futbin/ |
core/client.py (httpx + BS4), commands/players.py |
| Next.js RSC | producthunt/agent-harness/cli_web/producthunt/ |
core/client.py (curl_cffi + __next_f flight parsing) |
| REST API | unsplash/agent-harness/cli_web/unsplash/ |
core/client.py, commands/photos.py |
| Simple HTML | gh-trending/agent-harness/cli_web/gh_trending/ |
Minimal structure example |
How to use reference CLIs:
- Read the reference CLI's
core/client.py— understand the request/response pattern - Read
core/auth.py— copy the login_browser() pattern exactly for Google apps - Read
core/rpc/(for batchexecute) — understand encoder/decoder, DO NOT reinvent - Read
commands/— see how Click commands are structured, how --json works - Read
utils/helpers.py— see handle_errors(), _resolve_cli(), repl patterns
For batchexecute apps specifically, the notebooklm CLI is your bible:
- Copy the encoder/decoder architecture (don't reinvent the batchexecute wire format)
- Copy the auth token extraction pattern (CSRF, session ID, build label)
- Copy the cookie domain priority logic (critical for Israeli/international users)
- Adapt the RPC method IDs and param structures to your target app
The agent implementing the CLI MUST read these files before writing code. Use the
Agent tool to dispatch a research agent that reads
the reference implementation while you design the command structure.
Design Before You Code
Before writing any code, note the command structure in <APP>.md (10 minutes max):
- Map each API endpoint group to a Click command group:
/api/v1/boards/*→boardscommand group/api/v1/items/*→itemscommand group
- Map CRUD operations to subcommands (GET list →
list, GET single →get, POST →create, PUT/PATCH →update, DELETE →delete) - Note auth design:
auth login,auth status,auth refresh; credentials at~/.config/cli-web-<app>/auth.json - Note REPL design: bare command enters REPL, branded banner via
repl_skin.py
Goal: Generate the complete Python CLI package.
Package Structure
See HARNESS.md "Generated CLI Structure" for the complete package template.
Key points: cli_web/ namespace (NO __init__.py), <app>/ sub-package (HAS __init__.py),
core/, commands/, utils/, tests/ directories.
Step B.0: Scaffold Core Modules
Run the scaffold generator script (v2 — Jinja2 templates, requires
pip install jinja2) to create all boilerplate files:
python ${CLAUDE_PLUGIN_ROOT}/scripts/scaffold-cli.py <app>/agent-harness \
--app-name <app> \
--protocol <rest|graphql|html-scraping|batchexecute> \
--http-client <httpx|curl_cffi> \
--auth-type <none|cookie|api-key|google-sso> \
--resource <name> [--resource <name> ...] \
[--has-polling] [--has-context] [--has-partial-ids]
This renders exceptions.py, client.py skeleton, the unified auth.py (google-sso
handled via a template conditional), helpers.py, config.py, output.py, the CLI
entry point with REPL, one commands/<resource>.py per --resource flag,
setup.py, conftest.py, test_e2e.py skeleton, README/SKILL skeletons,
repl_skin.py, and (for batchexecute) the rpc/ subpackage. It also writes
.manifest.json (template version + profile) at the harness root — keep it;
fleet tooling depends on it. See skills/boilerplate/SKILL.md for the
template → output map and per-profile flag recipes.
Fallback: If the script is unavailable, follow
skills/shared/RECOVERY.md§scaffold-cli.py Unavailable — adapt from the newest generated CLI (e.g.,capitoltrades/agent-harness/), do NOT reconstruct boilerplate from memory.
After scaffolding, review the generated files and customize client.py with actual
endpoint methods from <APP>.md.
Implementation Rules
All rules below are DEFINED in skills/shared/CONVENTIONS.md — this section
tells you when to apply them during implementation.
exceptions.py-- implement first. Required hierarchy and error-code mapping: CONVENTIONS.md §Exception Hierarchy. Complete code:references/exception-hierarchy-example.py.client.py-- HTTP client with exception mapping and auth retry:- HTTP library choice:
httpx(default) — for most sites (REST, GraphQL, batchexecute)curl_cffi— for Cloudflare-protected sites. Uses Chrome TLS fingerprint impersonation to bypass bot detection without cookies or auth:
Usefrom curl_cffi import requests as curl_requests resp = curl_requests.get(url, impersonate="chrome")curl_cffiwhen Phase 1 detects Cloudflare (cf-rayheader, challenge page). Addcurl_cffi, beautifulsoup4tosetup.pyinstead ofhttpx.
- Centralized auth header/cookie injection
- Automatic JSON parsing with response body verification
- Status code → exception mapping: 401/403→
AuthError, 404→NotFoundError, 429→RateLimitError, 5xx→ServerError(CONVENTIONS.md §Exception Hierarchy) - Auth retry (3-attempt auto-refresh): current cookies → reload
auth.json→ headless refresh, never more. The full table is CONVENTIONS.md §Auth Rules; the templates generate it by default. - Exponential backoff for rate limits (CONVENTIONS.md §Exponential Backoff & Polling; code in
references/polling-backoff-example.py) - For apps with 3+ resource types: split into namespaced sub-clients (
client.notebooks.list(),client.sources.add()) - See
references/client-architecture-example.pyfor the full pattern
- HTTP library choice:
auth.py-- handles token storage, refresh, expiry. Implementation depends on auth type:For no-auth sites: DO NOT create
auth.py,session.py, or auth command groups. These files are dead code for public APIs and confuse users. The CLI should have NO auth-related files or commands. The only exception is if the site has optional auth (e.g., API key for write operations) — in that case, implement a minimal auth module.For browser-delegated auth (Google, Microsoft, etc.): Python
sync_playwright()login flow with cookie domain priority for international users (CONVENTIONS.md §Auth Rules).Storage, env var, cookie priority, and dual-format handling are defined in CONVENTIONS.md §Auth Rules; implementation code for each pattern is in
references/auth-strategies.md(read section-addressed).Anti-bot resilient client construction (when detected in Phase 2):
- Extract session tokens via CDP first (cookies), then HTTP GET + HTML parsing (CSRF, session IDs)
- Never hardcode build labels (
bl), session IDs (f.sid), or CSRF tokens -- extract dynamically at runtime - Replicate same-origin headers captured during Phase 1 traffic (e.g.,
x-same-domain: 1for Google apps) - Implement auto-retry on 401/403: re-fetch homepage -> re-extract tokens -> retry once
- See
references/google-batchexecute.mdfor the complete Google pattern
RPC codec subpackage (for non-REST protocols like batchexecute): When the API uses a non-REST protocol, add
core/rpc/with:types.py-- method ID enum, URL constantsencoder.py-- request encoding (protocol-specific format)decoder.py-- response decoding (strip prefix, parse chunks, extract results) Theclient.pystill exists but delegates encoding/decoding torpc/.
Progress feedback -- Use
rich>=13.0spinners for operations >2s (suppress in --json mode). Seereferences/rich-output-example.py.JSON error output --
--jsonmode errors are JSON too, not plain text (CONVENTIONS.md §JSON Envelope). Implement viautils/output.pyjson_error().All commands use
handle_errors(json_mode)context manager — centralizes error handling, exit codes (1=user, 2=system, 130=interrupt), and JSON errors. Seereferences/helpers-module-example.py.Generation commands support
--wait,--retry N,--output path— CONVENTIONS.md §Exponential Backoff & Polling; code inreferences/polling-backoff-example.py.Windows UTF-8 fix — at the top of
<app>_cli.py, reconfigure BOTH stdout AND stderr to UTF-8 before any import that prints (CONVENTIONS.md §Windows UTF-8 Fix has the exact snippet).HTML table parsers MUST extract ALL visible columns — not just name/price, because missing fields in
--jsonoutput make the CLI useless for filtering and analysis. If the site shows version, club, nation, stats, skills, weak foot — parse all of them. Empty fields in--jsonoutput = incomplete parser.Entry point:
cli-web-<app>via setup.py console_scripts (CONVENTIONS.md §Naming Conventions)Namespace:
cli_web.*utils/repl_skin.py,utils/doctor.py, andutils/mcp_server.pyare all vendored by scaffold-cli.py (canonical source:cli-web-core/cli_web_core/, synced viacli-web-devkit resync) — never hand-edit the per-CLI copies. The entry point registers the fleet-standarddoctorandmcp-servecommands from the vendored adapters (register_doctor_command(cli, ...),register_mcp_command(cli, ...)); both derive from the Click tree, so no per-command wiring is needed.utils/helpers.py-- shared CLI helpers (generate for every CLI):resolve_partial_id(partial, items)— prefix-match UUIDs for get/rename/deletehandle_errors(json_mode)— context manager replacing try/except in all commandsrequire_notebook(notebook_arg)— gets notebook ID from arg or persistent contextsanitize_filename(name)— safe filenames from artifact titlespoll_until_complete(check_fn)— exponential backoff pollingget_context_value(key)/set_context_value(key, value)— persistent context.json Seereferences/helpers-module-example.pyfor the complete module.
Not all helpers apply to every CLI. Include only what the CLI uses:
handle_errorsandprint_jsonare always needed.resolve_partial_idonly for UUID-based apps.require_notebook/context helpers only for apps with persistent context.poll_until_completeonly for generation/async operations.
REPL Implementation Rules (Critical)
The four REPL rules are defined with code examples in
CONVENTIONS.md §REPL Rules — apply them as you wire up <app>_cli.py:
- Parse REPL lines with
shlex.split(line), neverline.split(). - Propagate
--jsonby PREPENDING it to the args list passed tocli.main(args=..., standalone_mode=False)— never**ctx.params. - Help-sync: every commit that adds a command/option updates
_print_repl_help()in the same commit. - Required single values are
@click.argumentpositionals, not@click.option(..., required=True).
These bugs appear in almost every generated REPL — read the §REPL Rules section before writing the entry point, not after the REPL breaks.
Parallel Implementation (dispatch independent modules as subagents)
When the CLI has 3+ command groups (e.g., notebooks, sources, chat, artifacts), dispatch parallel subagents -- one per command module. Each agent gets:
- The
<APP>.mdAPI spec for its resource - The
client.pyandauth.pyinterfaces it depends on - Clear scope: "Implement
commands/notebooks.pywith list, get, create, delete"
Parallelization opportunities:
| Independent from each other | Dispatch in parallel |
|---|---|
commands/notebooks.py, commands/sources.py, commands/chat.py |
Yes -- each command file only depends on client.py |
rpc/encoder.py and rpc/decoder.py |
Yes -- encoder doesn't depend on decoder |
auth.py and models.py |
Yes -- no shared logic |
client.py and commands/* |
No -- commands depend on client |
<app>_cli.py (entry point) |
Last -- imports all commands, write after they're done |
Implementation order (with maximum parallelism):
Phase A (sequential): Write core foundation
exceptions.py → client.py → auth.py (if needed) → models.py
Phase B (parallel): Dispatch ALL independent work simultaneously
┌─ Agent 1: commands/notebooks.py
├─ Agent 2: commands/sources.py
├─ Agent 3: commands/chat.py
├─ Agent 4: commands/artifacts.py
├─ Agent 5: rpc/encoder.py + rpc/decoder.py (if non-REST)
└─ Agent 6 (background): test_core.py (unit tests for core modules)
All run concurrently — each only depends on Phase A modules
Phase C (sequential): Wire everything together
utils/helpers.py → <app>_cli.py → __main__.py → setup.py
(repl_skin.py, doctor.py, mcp_server.py were already vendored by
scaffold-cli.py in Step B.0; the entry point registers doctor + mcp-serve)
Key parallelism rules:
- Dispatch independent command modules as parallel subagents (one per
commands/*.pyfile) - Start unit test writing as a background agent during command implementation
- Entry point (
<app>_cli.py,setup.py) must come last (depends on all commands)
Mandatory Smoke Check (Before Testing Phase)
Before invoking testing, install (pip install -e .) and verify:
cli-web-<app> --helploadscli-web-<app> auth status --jsonshows valid (if auth-required)cli-web-<app> <resource> list --jsonreturns real data- One WRITE command works (if applicable)
Red flags — fix before testing: the full table is CONVENTIONS.md
§Protocol-Leak Smoke Check (wrb.fr/af.httprm leaks, empty []/null,
parser index mismatches). One methodology-specific case: a null WRITE response
may mean the operation is client-side — see references/google-batchexecute.md
"Client-Side Operations".
Update phase state:
python ${CLAUDE_PLUGIN_ROOT}/scripts/phase-state.py complete <app> \
--phase methodology --output <app>/agent-harness/
Next Step
When implementation is complete and the smoke check passes, invoke the testing
skill to plan and write tests.
Do NOT skip testing -- every CLI must have comprehensive tests before publishing.
Companion Skills
| Skill | When it activates |
|---|---|
capture |
Phase 1 -- traffic recording (prerequisite for this skill) |
testing |
Phase 3 -- test writing, documentation |
standards |
Phase 4 -- publish, verify, smoke test |
Integration
| Relationship | Skill |
|---|---|
| Preceded by | capture (Phase 1) |
| Followed by | testing (Phase 3) |
| References | skills/shared/CONVENTIONS.md (all rules), skills/shared/RECOVERY.md (gate failures), traffic-patterns.md, auth-strategies.md, google-batchexecute.md, ssr-patterns.md, exception-hierarchy-example.py, client-architecture-example.py, polling-backoff-example.py, rich-output-example.py |
Reference Files
references/traffic-patterns.md-- Common API patterns (REST, GraphQL, RPC)references/auth-strategies.md-- Auth implementation strategiesreferences/google-batchexecute.md-- Google batchexecute RPC protocol specreferences/ssr-patterns.md-- SSR framework patterns and data extraction strategiesreferences/exception-hierarchy-example.py-- Complete exception hierarchy with HTTP status mappingreferences/client-architecture-example.py-- Namespaced sub-client pattern with auth retryreferences/polling-backoff-example.py-- Exponential backoff polling and rate-limit retryreferences/rich-output-example.py-- Rich progress bars, JSON error responses, table formatting