onboarding - SKILL.md Agent Skill

name: onboarding description: > End-to-end methodology for adding a brand-new e2e suite to a project that has none. Defines the eight-phase workflow (scaffold, groundwork, happy-path, journey-mapping, coverage-expansion, bug-discovery, secrets-sweep, report) and the gate criteria between phases. Use this skill to run the workflow interactively in Claude Code, or invoke it from an external automated CLI driver for a hands-off run.

Activation banner: The first user-facing reply after this skill loads MUST begin with the line: Protocol Achilles activated. Once per session — skip if already declared in this conversation. Subagents (which return structured data, not user-facing text) are exempt.

Onboarding — eight-phase e2e bootstrap

This is the umbrella methodology for taking a project from zero e2e tests to a maintained suite. The same workflow runs two ways:

Mode	When	How
Interactive	You want fine-grained control or you're learning the system	Read this skill and follow the phase playbook below
Automated	You want a hands-off run	Invoke this skill from an external automated CLI driver that dispatches role-scoped subagents per phase

The two modes execute the same phases against the same gate criteria. The automated driver dispatches role-scoped subagents per phase; in interactive mode you load the relevant role-scoped skill yourself and work the phase through Claude Code's normal tool surface.

Phase map

#	Phase	What it produces	Skill
1	Scaffold	`playwright.config.ts`, `tests/e2e/{fixtures,docs}/`, `.gitignore` additions	`element-interactions` (Stage 1)
2	Groundwork	`app-context.md`, `page-repository.json`, runtime self-credentialing fixture	`element-interactions` (Stage 2)
3	Happy-path	One `tests/e2e/<journey>.spec.ts` per primary user flow that exercises sign-in + the critical action	`element-interactions` (Stages 3–4), `test-composer`
4	Journey mapping	`tests/e2e/docs/journey-map.md`, `tests/e2e/docs/journey-map-coverage.md`	`journey-mapping`
5	Coverage expansion	One `tests/e2e/<journey>.spec.ts` per priority-2/3 journey, grouped passes 2–5 with cleanup dedup	`coverage-expansion`, `test-composer`
6	Bug discovery	Adversarial findings + regression specs that lock the failure modes	`bug-discovery`
7	Secrets sweep	Credentials/keys/PII/URLs extracted to `.env`; `.env.example` committed	`secrets-sweep`
8	Report	`qa-summary-deck.html` + `qa-summary-deck.pdf` at the project root	`work-summary-deck`

A phase only advances once its exit criteria (below) are satisfied. A human or an automated phase-validator checks the criteria; ambiguity blocks the phase, not the run.

Status ledger + workflow reviewer (state-machine enforcement)

The pipeline runs on top of a structured status ledger at tests/e2e/docs/onboarding-status.json (gitignored — same pattern as tests/e2e/docs/.phase4-cycle-state.json and tests/e2e/docs/coverage-expansion-state.json). The orchestrator (interactive) or an external automated CLI driver MUST update this ledger after every phase / pass / cycle completion. Every transition (phase N → phase N+1, pass N → pass N+1 inside Phase 5, cycle N → cycle N+1 inside Phase 4) is gated by a workflow-reviewer-* subagent. The reviewer reads the ledger row + the closing subagent's handover envelope + the canonical methodology section, returns verdict: approve | reject | escalate, and the orchestrator only advances when the verdict is approve. Every workflow-reviewer-* dispatching brief MUST cite the reviewer's return-schema path (schemas/subagent-returns/workflow-reviewer.schema.json) — the subagent-schema-preread-gate.sh hook denies briefs that omit the citation.

The contract is harness-enforced:

onboarding-ledger-gate.sh (PreToolUse:Agent, DENY) — denies any non-reviewer Agent dispatch at a transition point until the matching workflow-reviewer-* has approved; also denies out-of-order phase / pass / cycle dispatches (e.g. phase4-* while currentPhase=2).
onboarding-ledger-write-gate.sh (PreToolUse:Write|Edit, DENY) — validates every ledger write against schemas/onboarding-status.schema.json and denies phase-skip transitions that lack a status: skipped row + an approvedDeviations[] entry carrying a verbatim authorizer.

Skip / early-stop authorisation. A phase can be skipped (or the pipeline stopped early) only when the workflow-reviewer for the prior phase approves the deviation, with the authorizer field on the reviewer return carrying either a verbatim user quote OR a documented structural exception. Self-imposed reasons (session-length, budget-cap, auto-mode, inferred-pref) are not authorisation — the harness rejects ledger writes that lack a proper authorizer.

3-cycle reject cap. A workflow-reviewer that rejects three consecutive times returns verdict: escalate, and the orchestrator surfaces all three returns to the user for manual triage. The ledger row's status becomes blocked. This mirrors the existing 3-cycle process-validator pattern.

Canonical references:

schemas/onboarding-status.schema.json — ledger shape (v1)
schemas/subagent-returns/workflow-reviewer.schema.json — reviewer return shape
skills/workflow-reviewer/SKILL.md — reviewer methodology
skills/element-interactions/references/harness-hooks.md — both new hooks indexed

The ledger + reviewer layer was added because empirical observation (a 21-journey benchmark, Run 5) demonstrated that markdown-text contract enforcement alone permits silent scope compression: orchestrators could skip phases entirely, stop early, or accept subagent "complete" returns whose deliverables were missing. The state-machine layer makes those failure modes harness-denied rather than instruction-only.

Front-load gate (before Phase 1)

Before any scaffolding, two things happen in order: (0) run-mode selection, then (1–3) the three preconditions.

Step 0 — Mode selection (ask the user first)

Before the precondition checks, present the run-mode choice to the user verbatim:

"Before starting, choose the run mode:

standard (default, recommended) — first-pass / first-cycle is strict parallel; subsequent passes / cycles may use grouping or single-agent dispatches for efficiency. Best for everyday onboarding runs.

depth — strict parallel per-journey on every compositional pass and strict parallel per-section on every discovery cycle. Up to ~20× more subagent dispatches and token spend than standard. Best for high-stakes audits, package-quality benchmarks, and first-time onboarding of business-critical apps where you want exhaustive per-unit fidelity."

"Which mode?"

Capture the user's answer as runMode ∈ {standard, depth} (default standard if the user passes through without picking). The value propagates through the rest of the onboarding pipeline as follows:

Phase / dispatch	`runMode: standard`	`runMode: depth`
Phase 4 — `journey-mapping`	`args: "phases: full"` (default cycle-1 strict, cycle-2+ relaxed — the existing rule already coded into `journey-mapping/SKILL.md` §"First-cycle strict / later-cycle relaxed")	`args: "phases: full, cycle-strictness: depth"` — strict per-section parallel on every cycle (including edge-probe and any additional discovery cycles); single-subagent walkthroughs forbidden in every cycle
Phase 5 — `coverage-expansion`	`args: "mode: standard"` (Pass 1 strict, Passes 2-5 may group; adversarial grouping permitted; `strict-adversarial: true` is opt-in)	`args: "mode: depth"` — strict per-journey parallel on every pass (no `[group]`, no `[P3-batch]` on any of Passes 1-5); adversarial Passes 4-5 are strict-per-journey by default (the `strict-adversarial: true` opt-in is implicit under depth)
State files	The workflow ledger `tests/e2e/docs/onboarding-status.json` is written with `runMode: "standard"` at the front-load gate (the primary source the harness reads). Phase-5 `coverage-expansion-state.json` mirrors `runMode: "standard"` on its first write (fallback for bare invocations); Phase-4 `.phase4-cycle-state.json` is written with `cycleStrictness: "standard"`.	The workflow ledger is written with `runMode: "depth"`; Phase-5 `coverage-expansion-state.json` mirrors `runMode: "depth"` on first write; Phase-4 `.phase4-cycle-state.json` is written with `cycleStrictness: "depth"`. The `standard-mode-first-pass-guard.sh` hook reads `runMode` + `currentPhase` + `currentSubStage` from the workflow ledger first (with `coverage-expansion-state.json` as fallback) and enforces the depth-mode strict-everywhere semantics. Reading the workflow ledger means the depth contract still holds on Phase-6 grouped probes after `coverage-expansion-state.json` is deleted at Pass-5 cleanup.

The orchestrator emits one declaration line at the start of each phase that consumes the mode: [onboarding] runMode: depth — Phase 5 strict-per-journey on every pass or [onboarding] runMode: standard — Phase 5 first-pass strict, later relaxed.

Under runMode: depth, coverage-expansion runs in its strict-parallel-everywhere mode. Cost: up to ~20× more subagent dispatches and token spend than mode: standard. Confirm with the user before defaulting to depth on any run that is not explicitly a high-stakes audit or benchmark.

The pre-Pass-4 P3 adversarial opt-out proposal (defined in coverage-expansion's depth-mode pipeline step, skills/coverage-expansion/references/depth-mode-pipeline.md) is folded into this gate so no mid-run prompt is added: any adversarialSkippedJourneys opt-out is proposed and answered here, alongside mode selection.

Steps 1–3 — Preconditions

Once the run mode is captured, confirm three preconditions:

Dev server runs locally. You can launch the app and reach its landing page in a browser. Phase 2's groundwork depends on this.
@civitas-cerebrum/element-interactions installed. package.json lists the dep; node_modules/@civitas-cerebrum/element-interactions/ exists. (The package's postinstall installs the surviving hooks + skills into ~/.claude/.)
No prior e2e suite in conflict. If tests/e2e/ already exists with committed specs, this is a resume (not onboarding). Switch to running the relevant phase skill directly.

If the project already runs playwright end-to-end with substantial coverage, do not run onboarding — it's designed for zero-to-suite, not augmentation.

Phase 1 — Scaffold

Goal. Land the Playwright config and the shared file tree.

Steps.

Create playwright.config.ts with the project's dev-server URL, the standard reporters (html + json), and a webServer block if the suite should launch the dev server itself.
Create tests/e2e/fixtures/, tests/e2e/docs/, and tests/e2e/playwright.setup.ts. Spec files themselves live at tests/e2e/<journey>.spec.ts (root of tests/e2e/, no specs/ subdirectory).
Add tests/e2e/.gitignore entries for playwright-report/, test-results/, .last-run.json.
Commit as chore: scaffold e2e suite.

Exit criteria.

npx playwright test --list lists zero specs without error.
The four scaffold files exist on disk.

Load element-interactions (Stage 1) for the exact file shapes.

Phase 2 — Groundwork

Goal. Capture project context so later phases don't re-discover it.

Steps.

App-context document. Author tests/e2e/docs/app-context.md. Cover what the app is, primary user roles, authentication model, key subsystems, and the rough URL surface.
Page repository. Walk the running app and populate tests/e2e/page-repository.json with one entry per discoverable page (path, purpose, primary selectors). Use Playwright's snapshot tool interactively if helpful.
Runtime self-credentialing fixture. Add tests/e2e/fixtures/auth.ts that mints test users at runtime (signup → confirm → login) instead of relying on seeded credentials. Phase 7's secrets sweep depends on no credentials being hard-coded.

Exit criteria.

The three artefacts exist and npx playwright test --list still works.

Load element-interactions (Stage 2) for the page-repository schema and the self-credentialing pattern.

Phase 3 — Happy path

Goal. One green spec per primary user flow.

Steps.

Identify the primary journeys from app-context.md (typically 2–5).
For each, load the test-composer skill with a brief that names the journey, its prerequisites, and the critical assertion. Composer writes the spec at tests/e2e/<journey>.spec.ts, lands tests, and self-verifies with npx playwright test.
The composer skill internally runs an in-loop reviewer pass that catches craft issues, missing scenarios, and stale assertions before declaring the cycle done — its return shape is the reviewer-inloop schema (see schemas/subagent-returns/). You don't load this reviewer as a separate skill; it is part of the composer's cycle.
Commit each spec individually: test(j-<journey>): happy path.

Exit criteria.

One spec per primary journey, all passing locally.
tests/e2e/docs/.discovery-draft.json has been written by the Stage-3 happy-path pass (used as input by Phase 4).

Load test-composer for the dispatch contract; consult schemas/subagent-returns/composer.schema.json and reviewer-inloop.schema.json for return shapes.

Phase 4 — Journey mapping

Goal. Produce a structured map of every user journey worth testing, prioritised P0 / P1 / P2 / P3 (per journey-mapping's priority framework).

Phase 4 cannot be done in-orchestrator. "Load journey-mapping" means invoke the Skill tool with skill name journey-mapping and follow whatever the skill body returns. It does NOT mean: summarise what the skill would produce, then hand-roll journey-map.md directly from in-context inference (the discovery-draft + an in- session pass over the SPA bundle). That shortcut is harness-blocked AND methodologically forbidden:

The map MUST carry  as line 1 — the only legitimate author of that sentinel is the skill's phase4-prioritise-author: subagent. The journey-map-sentinel- gate.sh hook denies any other write.

tests/e2e/docs/.phase4-cycle-state.json must exist with cycle 1 (discovery, strict per-section parallel) AND cycle 2 (edge-probe) recorded before the ledger will permit Phase 4 → completed. The write-gate denies the transition otherwise.

The dispatching session itself must have loaded the skill body (Skill('journey-mapping') or Read of skills/journey-mapping/ SKILL.md). The journey-mapping-skill-preread-gate.sh hook fires on both orchestrator-side writes/dispatches AND on subagent-side spill writes to tests/e2e/docs/.subagent-returns/phase4-*, denying any session that doesn't show the preread.

If you're tempted to inline-author because "the discovery-draft is already in hand, this saves 6 subagent dispatches" — that is the failure mode the gates exist to block. Dispatch the skill.

Steps.

Load journey-mapping with args: "phases: full" under runMode: standard (cycle 1 strict per-section, cycle 2+ relaxed) or args: "phases: full, cycle-strictness: depth" under runMode: depth (every cycle strict per-section, single-subagent walkthroughs forbidden in every cycle). The skill enforces an iterative cycle protocol: at least one discovery cycle (cycle 1 strict per-section parallel — one subagent per section) plus exactly one edge-probe cycle (cycle 2 — re-walks the same sections under an adversarial lens). Shallow single-pass exploration is not accepted.
Produce tests/e2e/docs/journey-map.md (priority-grouped, with the line-1 sentinel ) and tests/e2e/docs/journey-map-coverage.md (mapping each journey to the spec that covers it, or <missing>). Both files are written by the skill's author subagent; the orchestrator does not hand-roll them.
Reviewer cross-check: the structural-smell prevention rule rejects maps that collapse distinct flows or split one flow across journeys.

Exit criteria.

Journey map exists with the line-1 sentinel, priority groupings, and <missing> markers.
tests/e2e/docs/.phase4-cycle-state.json records cycles 1 + 2.
The edge-probe cycle's findings are reflected in the map (not just discarded).

Load journey-mapping for the cycle gate, the edge-probe contract, and the priority-tier rubric.

Phase 5 — Coverage expansion

Goal. Land one spec per journey in the map not already covered by a Phase-3 spec (all tiers, P0 first per journey-mapping's dispatch order), plus per-pass dedup.

Phase 5 cannot be done in-orchestrator. "Load coverage-expansion" means invoke the Skill tool with skill name coverage-expansion and dispatch its per-journey subagents through to the full five-pass pipeline. It does NOT mean: write Phase-5 spec files directly from orchestrator context, OR run Pass 1 only and mark the phase completed.

The five-pass pipeline (3 compositional + 2 adversarial + cleanup) is non-negotiable for mode: standard and mode: depth. Pass 1 alone is one-fifth of the phase, not the whole phase. Stopping after Pass 1 is exit #2 (commit + state-file + "resume needed"), not Phase-5 completion.

tests/e2e/docs/coverage-expansion-state.json must exist with at minimum a pass-1 record before the ledger will permit Phase 5 → completed. The write-gate denies the transition otherwise.

Pass 1 is strict per-journey under both modes. [group] and [P3-batch] markers on Pass 1 are harness-blocked by standard-mode-first-pass-guard.sh.

If you're tempted to inline-author specs because dispatching N composers feels expensive — that is the failure mode the contract exists to block. Dispatch the skill.

Steps.

Load coverage-expansion with args: "mode: standard" under runMode: standard (Pass 1 strict per-journey, Passes 2-5 may group; adversarial grouping is default and strict-adversarial: true is opt-in) or args: "mode: depth" under runMode: depth (strict per-journey on every pass — [group] and [P3-batch] forbidden across all 5 passes; adversarial Passes 4-5 are strict-per-journey by default). The skill defines three compositional passes (1-3), two adversarial passes (4-5), plus a cleanup/dedup pass. The orchestrator writes runMode into tests/e2e/docs/coverage-expansion-state.json on the first state-file write so the standard-mode-first-pass-guard.sh hook can enforce the depth-mode strict-everywhere semantics.
Relevance grouping. When a priority tier holds more than five journeys, group them by feature area and cap each group at seven. Project-agnostic clustering vocabulary: browse / transact / account / mutate / errors / auth. Avoid project-specific tokens.
First pass is opus-tier. Reserve the most capable model for the first compositional pass — the breadth scaffolding done here drives every later pass.
Per-pass dedup. Run one cleanup subagent at the end of every pass to consolidate duplicate scenarios within the pass.
Adversarial passes. Pass 4 (first adversarial) and pass 5 (second adversarial) emit findings. If Pass 5 emits any critical/high finding after dedup, flag the affected journeys for a focused Phase-6 bug-discovery probe (pass the finding-IDs in the Phase-6 brief) — there is no Pass 6.

Exit criteria.

Every journey in the map not already covered by a Phase-3 spec (all tiers, P0 first per journey-mapping's dispatch order) has a spec or a documented skip with explicit authorisation; P0/P1 journeys may not be skipped without naming the authorising user message; journey-map-coverage.md <missing> rows for P0/P1 close before P2/P3.
The dedup pass at the end of each pass landed without leaving duplicate-scenario findings open.

Load coverage-expansion for the full pass protocol and the [group] dispatch marker syntax.

Phase 6 — Bug discovery

Goal. Surface adversarial findings — flows that should break the application — and lock the failure modes with regression specs.

Phase 6 cannot be done in-orchestrator. "Load bug-discovery" means invoke the Skill tool with skill name bug-discovery and dispatch its per-journey probe subagents. It does NOT mean: write adversarial-findings.md directly from orchestrator context.

tests/e2e/docs/adversarial-findings.md must exist before the ledger will permit Phase 6 → completed. The write-gate denies the transition otherwise.

Every probe must complete with a terminal status (clean or findings-emitted). A probe whose findings have no regression spec must carry an explicit app-bug flag — the orchestrator cannot silently discard findings.

Dispatch the skill.

Steps.

Load bug-discovery. The skill dispatches probe subagents per journey (or per relevance group when there are many journeys).
Each probe runs against the live app, emits findings, and authors regression specs that reproduce each finding.
Findings without reproductions are flagged but not committed as specs; they go into tests/e2e/docs/adversarial-findings.md.

Exit criteria.

Every probe terminal (clean | findings-emitted | blocked); blocked probes require a ledger deferral entry with an authorizer or a re-dispatch.
All findings-emitted returns have a corresponding regression spec or an explicit app-bug flag for human triage.

Load bug-discovery for the relevance-grouping rules and the probe return shape.

Phase 7 — Secrets sweep

Goal. Move every credential, API key, PII-shape literal, and hard-coded URL out of the test code into .env. The released suite should be portable across local / CI / staging targets.

Steps.

Load secrets-sweep. The skill defines the four literal classes (credentials, API keys, PII, URLs) and the extraction playbook. Phase 7 dispatches secrets-sweep with the composer-secrets-sweep: description prefix.
Scan tests/**/*.{ts,json} and root playwright*.config.ts per the secrets-sweep skill's scope. Do not touch application source under src/ or app/. Evidence bundles (tests/e2e/evidence/) are NOT swept by Phase 7 — they are redacted by companion-mode's Phase-5 redaction step.
Replace literals with process.env.<NAME>; write .env (real values, gitignored) and .env.example (placeholders, committed); ensure .gitignore covers .env.

Exit criteria.

A re-scan of tests/** (plus root playwright*.config.ts) surfaces no literal credentials.
.env, .env.example, and the .gitignore entry are all in place.
npx playwright test still passes against the now-env-driven suite.

Load secrets-sweep for the full playbook and the strict edit-scope rules.

Phase 8 — Report

Goal. Author the work summary so a stakeholder can understand what the suite covers without reading every spec.

Steps.

Load work-summary-deck. The skill writes qa-summary-deck.html at the project root and automatically renders qa-summary-deck.pdf next to it.
The deck includes: total specs, journeys covered (priority-tiered), adversarial findings landed as regressions, open app-bug flags, and the suite's runtime envelope.

Exit criteria.

qa-summary-deck.html and qa-summary-deck.pdf exist at the project root.
The deck reflects the actual state of the suite (no stale numbers).

Cross-cutting rules

These rules apply to every phase. Violating them is a phase failure even when the exit criteria are technically met.

Self-credentialing first. No spec hard-codes a username, password, or token. Auth flows mint test users at runtime.
One commit per landed deliverable. Phases commit per-spec, not per-phase, so a partial run can be safely resumed.
No project-specific vocabulary in shared docs. Journey names use generic web-UI clustering vocabulary (browse / transact / account / mutate / errors / auth), not domain-specific tokens.
Return-shape conformance. Every subagent dispatch you run must return a schema-conformant envelope (see schemas/subagent-returns/).

Resuming a partial run

If onboarding was interrupted, find the latest greenlit phase from tests/e2e/docs/journey-map-coverage.md and the commit history, then restart from the next phase. Each phase is independently runnable as long as its predecessor's deliverables exist on disk.

For automated runs, the external CLI driver typically exposes a --resume flag that consumes the same tests/e2e/docs/onboarding-status.json ledger this skill maintains.