e2e-guide - SKILL.md Agent Skill

name: e2e-guide description: Build and refactor Playwright end-to-end tests for SETI using only real user flows and real backend behavior. Use when creating or fixing E2E cases for auth, lobby, room, and game interaction, especially when replacing shortcut tests (token injection, debug endpoints, direct WS driving) with production-like browser paths.

E2E Guide (No Mock, No Bypass)

Implement E2E as a real user journey. Do not simulate server behavior, do not inject auth state, and do not use debug-only control surfaces.

Hard Rules

Never mock network or websocket behavior.
Never inject auth state into localStorage/sessionStorage.
Never call debug endpoints (/debug/*) from E2E specs.
Never drive gameplay by sending raw websocket actions from tests when a real UI action exists.
If a step fails, keep the failure visible. Do not replace the step with a shortcut.

Coverage Discipline

Passing related unit tests or broad E2E smoke tests is not proof that a reported behavior is covered.
For every bug fix or regression report, first write or identify an assertion that fails on the old behavior and passes after the fix.
Do not count login/lobby/start-game/panel-visible smoke tests as coverage for specific card semantics, rendering modes, or conditional UI sections.
For UI mode regressions, enumerate the relevant matrix before asserting:
- viewer/actor perspective, such as human player vs synthetic rival
- rendering mode, such as text mode vs image mode
- action/card kind, such as scan, probe, tech, or mission
- expected visible outcome
When server/common data drives card, objective, reward, rule, or action rendering, verify both text mode and image mode unless the feature is explicitly single-mode.
In text mode, assert that visible labels/effects are derived from the same server/common ID and definition being displayed. In image mode, assert that the rendered asset path, alt text, or test id is derived from that same server-projected ID.
If a smoke test cannot pin a specific ID without becoming brittle, assert consistency between the visible ID and the rendered text/image output. Put deterministic ID assertions in a focused deterministic spec.
For conditional UI hiding, assert both sides of the condition in the same test or suite: the section that should remain visible and the section that should be hidden.
For rule rewards, assert user-observable semantics, not only that the engine step completed. Example: an any-card reward should expose the allowed sources and the selected result should be visible in hand or the relevant row.
Before marking verification complete, ask: would this test fail if the reported bug still existed? If the answer is no, add a more specific assertion.

Allowed Setup

Use Playwright webServer to start real client/server.
Use unique test users and room names per test run.
Use helper functions for repeated UI actions (register/login/create/join/start), but helpers must still perform real UI interactions.
Use API-only tests in dedicated API spec blocks when the test target is explicitly API behavior.

Required Test Structure

Readiness:
- Wait for server readiness endpoint before workflow tests.
Auth:
- Register by UI (/auth -> Register tab -> submit).
- Login by UI (/auth -> login form -> submit).
Lobby/Room:
- Create room from lobby UI dialog.
- Join room using a second browser context via room UI.
- Start game from room UI as host.
Game:
- Enter game via room UI or redirect after start.
- Interact through visible UI controls (action-menu-*, tabs, inputs).
- Verify on another browser context that state changed.

Assertion Guidance

Prefer observable user outcomes:
- URL changes (/auth, /lobby, /room/:id, /game/:id)
- Visible UI sections (bottom-dashboard, bottom-actions, event-log)
- Action availability/disabled states.
Avoid asserting engine internals in smoke tests:
- No hardcoded card IDs, probe coordinates, or deterministic internals.
For deterministic behavior tests:
- Keep them separate from smoke tests and still avoid debug endpoints unless explicitly running a debug suite.

Anti-Flake Guidance

Prefer event-driven waits:
- waitForResponse, waitForURL, expect(locator).toBeVisible().
Avoid fixed sleeps:
- Do not use waitForTimeout unless there is no reliable observable signal.
Keep one test focused on one journey or behavior boundary.

Refactor Checklist (Shortcut -> Real Flow)

Remove imports/usages of:
- injectAuth
- createDebugSession, debugGetState, debugMainAction, debugFreeAction, debugInput, debugGetPendingInput
- direct WsTestClient action-driving for UI-covered behavior
Replace with:
- UI register/login helpers
- Multi-context host/guest room/game interactions
Keep failure explicit:
- If UI cannot complete a step, assert and fail at that exact step.