name: onboarding description: > End-to-end methodology for adding a brand-new e2e suite to a project that has none. Defines the eight-phase workflow (scaffold, groundwork, happy-path, journey-mapping, coverage-expansion, bug-discovery, secrets-sweep, report) and the gate criteria between phases. Use this skill to run the workflow interactively in Claude Code, or invoke it from an external automated CLI driver for a hands-off run.
Activation banner: The first user-facing reply after this skill loads MUST begin with the line: Protocol Achilles activated. Once per session — skip if already declared in this conversation. Subagents (which return structured data, not user-facing text) are exempt.
Onboarding — eight-phase e2e bootstrap
This is the umbrella methodology for taking a project from zero e2e tests to a maintained suite. The same workflow runs two ways:
| Mode | When | How |
|---|---|---|
| Interactive | You want fine-grained control or you're learning the system | Read this skill and follow the phase playbook below |
| Automated | You want a hands-off run | Invoke this skill from an external automated CLI driver that dispatches role-scoped subagents per phase |
The two modes execute the same phases against the same gate criteria. The automated driver dispatches role-scoped subagents per phase; in interactive mode you load the relevant role-scoped skill yourself and work the phase through Claude Code's normal tool surface.
Phase map
| # | Phase | What it produces | Skill |
|---|---|---|---|
| 1 | Scaffold | playwright.config.ts, tests/e2e/{fixtures,docs}/, .gitignore additions |
element-interactions (Stage 1) |
| 2 | Groundwork | app-context.md, page-repository.json, runtime self-credentialing fixture |
element-interactions (Stage 2) |
| 3 | Happy-path | One tests/e2e/<journey>.spec.ts per primary user flow that exercises sign-in + the critical action |
element-interactions (Stages 3–4), test-composer |
| 4 | Journey mapping | tests/e2e/docs/journey-map.md, tests/e2e/docs/journey-map-coverage.md |
journey-mapping |
| 5 | Coverage expansion | One tests/e2e/<journey>.spec.ts per priority-2/3 journey, grouped passes 2–5 with cleanup dedup |
coverage-expansion, test-composer |
| 6 | Bug discovery | Adversarial findings + regression specs that lock the failure modes | bug-discovery |
| 7 | Secrets sweep | Credentials/keys/PII/URLs extracted to .env; .env.example committed |
secrets-sweep |
| 8 | Report | qa-summary-deck.html + qa-summary-deck.pdf at the project root |
work-summary-deck |
A phase only advances once its exit criteria (below) are satisfied. A human or an automated phase-validator checks the criteria; ambiguity blocks the phase, not the run.
Status ledger + workflow reviewer (state-machine enforcement)
The pipeline runs on top of a structured status ledger at
tests/e2e/docs/onboarding-status.json (gitignored — same pattern as
tests/e2e/docs/.phase4-cycle-state.json and
tests/e2e/docs/coverage-expansion-state.json). The orchestrator
(interactive) or an external automated CLI driver MUST update
this ledger after every phase / pass / cycle completion. Every
transition (phase N → phase N+1, pass N → pass N+1 inside Phase 5,
cycle N → cycle N+1 inside Phase 4) is gated by a
workflow-reviewer-* subagent. The reviewer reads the ledger row +
the closing subagent's handover envelope + the canonical methodology
section, returns verdict: approve | reject | escalate, and the
orchestrator only advances when the verdict is approve. Every
workflow-reviewer-* dispatching brief MUST cite the reviewer's
return-schema path
(schemas/subagent-returns/workflow-reviewer.schema.json) — the
subagent-schema-preread-gate.sh hook denies briefs that omit the
citation.
The contract is harness-enforced:
onboarding-ledger-gate.sh(PreToolUse:Agent, DENY) — denies any non-reviewer Agent dispatch at a transition point until the matchingworkflow-reviewer-*has approved; also denies out-of-order phase / pass / cycle dispatches (e.g.phase4-*whilecurrentPhase=2).onboarding-ledger-write-gate.sh(PreToolUse:Write|Edit, DENY) — validates every ledger write againstschemas/onboarding-status.schema.jsonand denies phase-skip transitions that lack astatus: skippedrow + anapprovedDeviations[]entry carrying a verbatimauthorizer.
Skip / early-stop authorisation. A phase can be skipped (or the
pipeline stopped early) only when the workflow-reviewer for the
prior phase approves the deviation, with the authorizer field on the
reviewer return carrying either a verbatim user quote OR a documented
structural exception. Self-imposed reasons (session-length,
budget-cap, auto-mode, inferred-pref) are not authorisation —
the harness rejects ledger writes that lack a proper authorizer.
3-cycle reject cap. A workflow-reviewer that rejects three
consecutive times returns verdict: escalate, and the orchestrator
surfaces all three returns to the user for manual triage. The ledger
row's status becomes blocked. This mirrors the existing
3-cycle process-validator pattern.
Canonical references:
schemas/onboarding-status.schema.json— ledger shape (v1)schemas/subagent-returns/workflow-reviewer.schema.json— reviewer return shapeskills/workflow-reviewer/SKILL.md— reviewer methodologyskills/element-interactions/references/harness-hooks.md— both new hooks indexed
The ledger + reviewer layer was added because empirical observation (a 21-journey benchmark, Run 5) demonstrated that markdown-text contract enforcement alone permits silent scope compression: orchestrators could skip phases entirely, stop early, or accept subagent "complete" returns whose deliverables were missing. The state-machine layer makes those failure modes harness-denied rather than instruction-only.
Front-load gate (before Phase 1)
Before any scaffolding, two things happen in order: (0) run-mode selection, then (1–3) the three preconditions.
Step 0 — Mode selection (ask the user first)
Before the precondition checks, present the run-mode choice to the user verbatim:
"Before starting, choose the run mode:
- standard (default, recommended) — first-pass / first-cycle is strict parallel; subsequent passes / cycles may use grouping or single-agent dispatches for efficiency. Best for everyday onboarding runs.
- depth — strict parallel per-journey on every compositional pass and strict parallel per-section on every discovery cycle. Up to ~20× more subagent dispatches and token spend than standard. Best for high-stakes audits, package-quality benchmarks, and first-time onboarding of business-critical apps where you want exhaustive per-unit fidelity."
"Which mode?"
Capture the user's answer as runMode ∈ {standard, depth} (default
standard if the user passes through without picking). The value
propagates through the rest of the onboarding pipeline as follows:
| Phase / dispatch | runMode: standard |
runMode: depth |
|---|---|---|
Phase 4 — journey-mapping |
args: "phases: full" (default cycle-1 strict, cycle-2+ relaxed — the existing rule already coded into journey-mapping/SKILL.md §"First-cycle strict / later-cycle relaxed") |
args: "phases: full, cycle-strictness: depth" — strict per-section parallel on every cycle (including edge-probe and any additional discovery cycles); single-subagent walkthroughs forbidden in every cycle |
Phase 5 — coverage-expansion |
args: "mode: standard" (Pass 1 strict, Passes 2-5 may group; adversarial grouping permitted; strict-adversarial: true is opt-in) |
args: "mode: depth" — strict per-journey parallel on every pass (no [group], no [P3-batch] on any of Passes 1-5); adversarial Passes 4-5 are strict-per-journey by default (the strict-adversarial: true opt-in is implicit under depth) |
| State files | The workflow ledger tests/e2e/docs/onboarding-status.json is written with runMode: "standard" at the front-load gate (the primary source the harness reads). Phase-5 coverage-expansion-state.json mirrors runMode: "standard" on its first write (fallback for bare invocations); Phase-4 .phase4-cycle-state.json is written with cycleStrictness: "standard". |
The workflow ledger is written with runMode: "depth"; Phase-5 coverage-expansion-state.json mirrors runMode: "depth" on first write; Phase-4 .phase4-cycle-state.json is written with cycleStrictness: "depth". The standard-mode-first-pass-guard.sh hook reads runMode + currentPhase + currentSubStage from the workflow ledger first (with coverage-expansion-state.json as fallback) and enforces the depth-mode strict-everywhere semantics. Reading the workflow ledger means the depth contract still holds on Phase-6 grouped probes after coverage-expansion-state.json is deleted at Pass-5 cleanup. |
The orchestrator emits one declaration line at the start of each phase
that consumes the mode:
[onboarding] runMode: depth — Phase 5 strict-per-journey on every pass
or [onboarding] runMode: standard — Phase 5 first-pass strict, later relaxed.
Under runMode: depth, coverage-expansion runs in its
strict-parallel-everywhere mode. Cost: up to ~20× more subagent
dispatches and token spend than mode: standard. Confirm with the user
before defaulting to depth on any run that is not explicitly a
high-stakes audit or benchmark.
The pre-Pass-4 P3 adversarial opt-out proposal (defined in
coverage-expansion's depth-mode pipeline step,
skills/coverage-expansion/references/depth-mode-pipeline.md) is folded
into this gate so no mid-run prompt is added: any
adversarialSkippedJourneys opt-out is proposed and answered here,
alongside mode selection.
Steps 1–3 — Preconditions
Once the run mode is captured, confirm three preconditions:
- Dev server runs locally. You can launch the app and reach its landing page in a browser. Phase 2's groundwork depends on this.
@civitas-cerebrum/element-interactionsinstalled.package.jsonlists the dep;node_modules/@civitas-cerebrum/element-interactions/exists. (The package's postinstall installs the surviving hooks + skills into~/.claude/.)- No prior e2e suite in conflict. If
tests/e2e/already exists with committed specs, this is a resume (not onboarding). Switch to running the relevant phase skill directly.
If the project already runs playwright end-to-end with substantial
coverage, do not run onboarding — it's designed for zero-to-suite, not
augmentation.
Phase 1 — Scaffold
Goal. Land the Playwright config and the shared file tree.
Steps.
- Create
playwright.config.tswith the project's dev-server URL, the standard reporters (html+json), and awebServerblock if the suite should launch the dev server itself. - Create
tests/e2e/fixtures/,tests/e2e/docs/, andtests/e2e/playwright.setup.ts. Spec files themselves live attests/e2e/<journey>.spec.ts(root oftests/e2e/, nospecs/subdirectory). - Add
tests/e2e/.gitignoreentries forplaywright-report/,test-results/,.last-run.json. - Commit as
chore: scaffold e2e suite.
Exit criteria.
npx playwright test --listlists zero specs without error.- The four scaffold files exist on disk.
Load element-interactions (Stage 1) for the exact file shapes.
Phase 2 — Groundwork
Goal. Capture project context so later phases don't re-discover it.
Steps.
- App-context document. Author
tests/e2e/docs/app-context.md. Cover what the app is, primary user roles, authentication model, key subsystems, and the rough URL surface. - Page repository. Walk the running app and populate
tests/e2e/page-repository.jsonwith one entry per discoverable page (path, purpose, primary selectors). Use Playwright's snapshot tool interactively if helpful. - Runtime self-credentialing fixture. Add
tests/e2e/fixtures/auth.tsthat mints test users at runtime (signup → confirm → login) instead of relying on seeded credentials. Phase 7's secrets sweep depends on no credentials being hard-coded.
Exit criteria.
- The three artefacts exist and
npx playwright test --liststill works.
Load element-interactions (Stage 2) for the page-repository schema and
the self-credentialing pattern.
Phase 3 — Happy path
Goal. One green spec per primary user flow.
Steps.
- Identify the primary journeys from
app-context.md(typically 2–5). - For each, load the
test-composerskill with a brief that names the journey, its prerequisites, and the critical assertion. Composer writes the spec attests/e2e/<journey>.spec.ts, lands tests, and self-verifies withnpx playwright test. - The composer skill internally runs an in-loop reviewer pass that
catches craft issues, missing scenarios, and stale assertions
before declaring the cycle done — its return shape is the
reviewer-inloopschema (seeschemas/subagent-returns/). You don't load this reviewer as a separate skill; it is part of the composer's cycle. - Commit each spec individually:
test(j-<journey>): happy path.
Exit criteria.
- One spec per primary journey, all passing locally.
tests/e2e/docs/.discovery-draft.jsonhas been written by the Stage-3 happy-path pass (used as input by Phase 4).
Load test-composer for the dispatch contract; consult
schemas/subagent-returns/composer.schema.json and
reviewer-inloop.schema.json for return shapes.
Phase 4 — Journey mapping
Goal. Produce a structured map of every user journey worth testing,
prioritised P0 / P1 / P2 / P3 (per journey-mapping's priority framework).
Phase 4 cannot be done in-orchestrator. "Load
journey-mapping" means invoke the Skill tool with skill namejourney-mappingand follow whatever the skill body returns. It does NOT mean: summarise what the skill would produce, then hand-rolljourney-map.mddirectly from in-context inference (the discovery-draft + an in- session pass over the SPA bundle). That shortcut is harness-blocked AND methodologically forbidden:
- The map MUST carry
<!-- journey-mapping:generated -->as line 1 — the only legitimate author of that sentinel is the skill'sphase4-prioritise-author:subagent. Thejourney-map-sentinel- gate.shhook denies any other write.tests/e2e/docs/.phase4-cycle-state.jsonmust exist with cycle 1 (discovery, strict per-section parallel) AND cycle 2 (edge-probe) recorded before the ledger will permit Phase 4 → completed. The write-gate denies the transition otherwise.- The dispatching session itself must have loaded the skill body (
Skill('journey-mapping')orReadofskills/journey-mapping/ SKILL.md). Thejourney-mapping-skill-preread-gate.shhook fires on both orchestrator-side writes/dispatches AND on subagent-side spill writes totests/e2e/docs/.subagent-returns/phase4-*, denying any session that doesn't show the preread.If you're tempted to inline-author because "the discovery-draft is already in hand, this saves 6 subagent dispatches" — that is the failure mode the gates exist to block. Dispatch the skill.
Steps.
- Load
journey-mappingwithargs: "phases: full"underrunMode: standard(cycle 1 strict per-section, cycle 2+ relaxed) orargs: "phases: full, cycle-strictness: depth"underrunMode: depth(every cycle strict per-section, single-subagent walkthroughs forbidden in every cycle). The skill enforces an iterative cycle protocol: at least one discovery cycle (cycle 1strict per-section parallel — one subagent per section) plus exactly one edge-probe cycle (cycle 2— re-walks the same sections under an adversarial lens). Shallow single-pass exploration is not accepted. - Produce
tests/e2e/docs/journey-map.md(priority-grouped, with the line-1 sentinel<!-- journey-mapping:generated -->) andtests/e2e/docs/journey-map-coverage.md(mapping each journey to the spec that covers it, or<missing>). Both files are written by the skill's author subagent; the orchestrator does not hand-roll them. - Reviewer cross-check: the structural-smell prevention rule rejects maps that collapse distinct flows or split one flow across journeys.
Exit criteria.
- Journey map exists with the line-1 sentinel, priority groupings, and
<missing>markers. tests/e2e/docs/.phase4-cycle-state.jsonrecords cycles 1 + 2.- The edge-probe cycle's findings are reflected in the map (not just discarded).
Load journey-mapping for the cycle gate, the edge-probe contract, and
the priority-tier rubric.
Phase 5 — Coverage expansion
Goal. Land one spec per journey in the map not already covered by a
Phase-3 spec (all tiers, P0 first per journey-mapping's dispatch
order), plus per-pass dedup.
Phase 5 cannot be done in-orchestrator. "Load
coverage-expansion" means invoke the Skill tool with skill namecoverage-expansionand dispatch its per-journey subagents through to the full five-pass pipeline. It does NOT mean: write Phase-5 spec files directly from orchestrator context, OR run Pass 1 only and mark the phase completed.
- The five-pass pipeline (3 compositional + 2 adversarial + cleanup) is non-negotiable for
mode: standardandmode: depth. Pass 1 alone is one-fifth of the phase, not the whole phase. Stopping after Pass 1 is exit #2 (commit + state-file + "resume needed"), not Phase-5 completion.tests/e2e/docs/coverage-expansion-state.jsonmust exist with at minimum apass-1record before the ledger will permit Phase 5 → completed. The write-gate denies the transition otherwise.- Pass 1 is strict per-journey under both modes.
[group]and[P3-batch]markers on Pass 1 are harness-blocked bystandard-mode-first-pass-guard.sh.If you're tempted to inline-author specs because dispatching N composers feels expensive — that is the failure mode the contract exists to block. Dispatch the skill.
Steps.
- Load
coverage-expansionwithargs: "mode: standard"underrunMode: standard(Pass 1 strict per-journey, Passes 2-5 may group; adversarial grouping is default andstrict-adversarial: trueis opt-in) orargs: "mode: depth"underrunMode: depth(strict per-journey on every pass —[group]and[P3-batch]forbidden across all 5 passes; adversarial Passes 4-5 are strict-per-journey by default). The skill defines three compositional passes (1-3), two adversarial passes (4-5), plus a cleanup/dedup pass. The orchestrator writesrunModeintotests/e2e/docs/coverage-expansion-state.jsonon the first state-file write so thestandard-mode-first-pass-guard.shhook can enforce the depth-mode strict-everywhere semantics. - Relevance grouping. When a priority tier holds more than five journeys, group them by feature area and cap each group at seven. Project-agnostic clustering vocabulary: browse / transact / account / mutate / errors / auth. Avoid project-specific tokens.
- First pass is opus-tier. Reserve the most capable model for the first compositional pass — the breadth scaffolding done here drives every later pass.
- Per-pass dedup. Run one cleanup subagent at the end of every pass to consolidate duplicate scenarios within the pass.
- Adversarial passes. Pass 4 (first adversarial) and pass 5 (second
adversarial) emit findings. If Pass 5 emits any critical/high finding
after dedup, flag the affected journeys for a focused Phase-6
bug-discoveryprobe (pass the finding-IDs in the Phase-6 brief) — there is no Pass 6.
Exit criteria.
- Every journey in the map not already covered by a Phase-3 spec (all
tiers, P0 first per
journey-mapping's dispatch order) has a spec or a documented skip with explicit authorisation; P0/P1 journeys may not be skipped without naming the authorising user message;journey-map-coverage.md<missing>rows for P0/P1 close before P2/P3. - The dedup pass at the end of each pass landed without leaving duplicate-scenario findings open.
Load coverage-expansion for the full pass protocol and the
[group] dispatch marker syntax.
Phase 6 — Bug discovery
Goal. Surface adversarial findings — flows that should break the application — and lock the failure modes with regression specs.
Phase 6 cannot be done in-orchestrator. "Load
bug-discovery" means invoke the Skill tool with skill namebug-discoveryand dispatch its per-journey probe subagents. It does NOT mean: writeadversarial-findings.mddirectly from orchestrator context.
tests/e2e/docs/adversarial-findings.mdmust exist before the ledger will permit Phase 6 → completed. The write-gate denies the transition otherwise.- Every probe must complete with a terminal status (
cleanorfindings-emitted). A probe whose findings have no regression spec must carry an explicitapp-bugflag — the orchestrator cannot silently discard findings.Dispatch the skill.
Steps.
- Load
bug-discovery. The skill dispatches probe subagents per journey (or per relevance group when there are many journeys). - Each probe runs against the live app, emits findings, and authors regression specs that reproduce each finding.
- Findings without reproductions are flagged but not committed as
specs; they go into
tests/e2e/docs/adversarial-findings.md.
Exit criteria.
- Every probe terminal (
clean|findings-emitted|blocked); blocked probes require a ledger deferral entry with anauthorizeror a re-dispatch. - All
findings-emittedreturns have a corresponding regression spec or an explicitapp-bugflag for human triage.
Load bug-discovery for the relevance-grouping rules and the probe
return shape.
Phase 7 — Secrets sweep
Goal. Move every credential, API key, PII-shape literal, and
hard-coded URL out of the test code into .env. The released suite
should be portable across local / CI / staging targets.
Steps.
- Load
secrets-sweep. The skill defines the four literal classes (credentials, API keys, PII, URLs) and the extraction playbook. Phase 7 dispatchessecrets-sweepwith thecomposer-secrets-sweep:description prefix. - Scan
tests/**/*.{ts,json}and rootplaywright*.config.tsper thesecrets-sweepskill's scope. Do not touch application source undersrc/orapp/. Evidence bundles (tests/e2e/evidence/) are NOT swept by Phase 7 — they are redacted bycompanion-mode's Phase-5 redaction step. - Replace literals with
process.env.<NAME>; write.env(real values, gitignored) and.env.example(placeholders, committed); ensure.gitignorecovers.env.
Exit criteria.
- A re-scan of
tests/**(plus rootplaywright*.config.ts) surfaces no literal credentials. .env,.env.example, and the.gitignoreentry are all in place.npx playwright teststill passes against the now-env-driven suite.
Load secrets-sweep for the full playbook and the strict edit-scope
rules.
Phase 8 — Report
Goal. Author the work summary so a stakeholder can understand what the suite covers without reading every spec.
Steps.
- Load
work-summary-deck. The skill writesqa-summary-deck.htmlat the project root and automatically rendersqa-summary-deck.pdfnext to it. - The deck includes: total specs, journeys covered (priority-tiered),
adversarial findings landed as regressions, open
app-bugflags, and the suite's runtime envelope.
Exit criteria.
qa-summary-deck.htmlandqa-summary-deck.pdfexist at the project root.- The deck reflects the actual state of the suite (no stale numbers).
Cross-cutting rules
These rules apply to every phase. Violating them is a phase failure even when the exit criteria are technically met.
- Self-credentialing first. No spec hard-codes a username, password, or token. Auth flows mint test users at runtime.
- One commit per landed deliverable. Phases commit per-spec, not per-phase, so a partial run can be safely resumed.
- No project-specific vocabulary in shared docs. Journey names use generic web-UI clustering vocabulary (browse / transact / account / mutate / errors / auth), not domain-specific tokens.
- Return-shape conformance. Every subagent dispatch you run must
return a schema-conformant envelope (see
schemas/subagent-returns/).
Resuming a partial run
If onboarding was interrupted, find the latest greenlit phase from
tests/e2e/docs/journey-map-coverage.md and the commit history, then
restart from the next phase. Each phase is independently runnable as
long as its predecessor's deliverables exist on disk.
For automated runs, the external CLI driver typically exposes a
--resume flag that consumes the same tests/e2e/docs/onboarding-status.json
ledger this skill maintains.