name: om-integration-tests description: Run and create QA integration tests (Playwright TypeScript), including executing the full suite, converting optional markdown scenarios, and generating new tests from specs or feature descriptions. Use when the user says "run integration tests", "test this feature", "create test for", "convert test case", "run QA tests", or "integration test".
Integration Tests Skill
This skill generates executable Playwright tests in module-local __integration__ directories (for example packages/core/src/modules/sales/__integration__/TC-SALES-*.spec.ts) by exploring the running application. It also covers running existing integration tests after feature/bug implementation and reporting failures with artifact-based diagnosis. It optionally produces a markdown scenario (.ai/qa/scenarios/TC-*.md) for documentation — the scenario is not required.
Quick Reference
| Action | Command |
|---|---|
| Run all tests | yarn test:integration |
| Run single test | npx playwright test --config .ai/qa/tests/playwright.config.ts <path> |
| Run in ephemeral containers | yarn test:integration:ephemeral |
| Run interactive ephemeral mode | yarn test:integration:ephemeral:interactive |
| Start ephemeral app only (for MCP exploration, tests development, and debugging) | yarn test:integration:ephemeral:start |
| Run standalone create-app integration parity from monorepo | yarn test:create-app:integration |
| View report | yarn test:integration:report |
| Test files location | <module>/__integration__/TC-XXX.spec.ts |
| Scenario sources (optional) | .ai/qa/scenarios/TC-XXX-*.md |
| Reusable env state file | .ai/qa/ephemeral-env.json |
Runtime Policy
Default QA runtime policy:
- Keep global settings in
.ai/qa/tests/playwright.config.ts:timeout: 10_000expect.timeout: 10_000retries: 1
- Do not add per-test timeout or retry overrides in
.spec.tsfiles (test.setTimeout,test.describe.configure({ retries }),test.retry).
Debug/development policy (fail fast while authoring/fixing tests):
- Override retries at command level with
--retries=0. - Do not edit global config just to debug a single test.
Rendering and Performance Gates
When a feature touches Next.js routes, generated frontend, Client Islands, shared providers, loading/error boundaries, or heavy widgets, plan tests beyond CRUD correctness:
- verify the server-rendered shell loads before client-only interaction is required,
- exercise each changed Client Island interaction (table/form/dialog/editor/calendar/graph),
- cover loading and error boundaries for changed routes,
- include accessibility assertions for labels, roles, focus, keyboard submit/cancel, and icon-only buttons,
- add a regression E2E for critical flows,
- record a smoke performance signal when feasible (cold load timing, Web Vitals/Lighthouse, or process/RSS note from the agreed profiling script).
If performance evidence is not feasible in the environment, state the blocker and the exact command/check that should be run before merge.
Workflow
Phase 1 — Identify What to Test
Determine the feature scope from one of these sources (in priority order):
- Spec file: If a spec is referenced or was just implemented, read it from
.ai/specs/*.mdor.ai/specs/enterprise/*.md. Prefer the new{YYYY-MM-DD}-{slug}.mdfilenames, but tolerate legacy numbered names while the repo is being normalized. Extract testable scenarios from the API Contracts, UI/UX, and Data Models sections. - User description: If the user describes a feature ("test the company creation flow"), map it to the relevant module and pages.
- Recent changes: If triggered after implementation, use
git diffor recent commits to identify changed endpoints, pages, and components.
For each feature, identify:
- Which category it belongs to (AUTH, CAT, CRM, SALES, ADMIN, INT, API-*)
- Whether it's a UI test or API test
- The priority (High for CRUD operations, Medium for settings/config, Low for edge cases)
- The prerequisite role (superadmin, admin, or employee)
Phase 2 — Find the Next TC Number
List existing test cases in the target category to determine the next sequential number:
ls .ai/qa/scenarios/TC-{CATEGORY}-*.md 2>/dev/null | sort | tail -1
find apps packages -type f -path "*/__integration__/*" -name "TC-{CATEGORY}-*.spec.ts" 2>/dev/null | sort | tail -1
Use the highest number found across both directories, then increment. For example, if the last scenario is TC-CRM-011 but the last test is TC-CRM-013, use TC-CRM-014.
Phase 3 — Reuse Existing Ephemeral Environment First
Before starting any new ephemeral app, read .ai/qa/ephemeral-env.json.
- If it exists and contains
status: running, usebase_urlfrom that file. - If it does not exist (or cannot be reused), start:
yarn test:integration:ephemeral:start
Default ephemeral app port is 5001 when available; fallback port is recorded in .ai/qa/ephemeral-env.json.
Phase 4 — Explore the Feature via Playwright MCP
Use the active base URL from .ai/qa/ephemeral-env.json for MCP navigation, then discover the actual UI:
- Login with the appropriate role
- Navigate to the relevant page
- Take snapshots to identify exact element labels, button text, form fields
- Walk through the happy path to discover the actual flow
- Note any validation messages, success states, redirects
For API tests, use cURL to discover:
- The exact endpoint path and method
- Required request headers and body shape
- The actual response structure
- Error responses for invalid inputs
Phase 5 — Write the Playwright Test
Create the test in the module where the behavior lives:
- Core/shared module:
packages/<package>/src/modules/<module>/__integration__/TC-{CATEGORY}-{XXX}.spec.ts - App-specific module:
apps/mercato/src/modules/<module>/__integration__/TC-{CATEGORY}-{XXX}.spec.ts - Create-app template module:
packages/create-app/template/src/modules/<module>/__integration__/TC-{CATEGORY}-{XXX}.spec.ts - Enterprise overlay test:
packages/enterprise/modules/<module>/__integration__/TC-{CATEGORY}-{XXX}.spec.ts- Only create enterprise overlay tests as additions to modules that already have base module tests.
- Do not add dependencies from base code to the enterprise package.
- Subfolders inside
__integration__are supported.
Use the locators discovered in Phase 3 (not guessed). If a scenario was written, reference it in a comment. Do not hardcode entity IDs in routes, payloads, or assertions. Resolve entities dynamically at runtime by creating fixtures through API/UI steps or by selecting existing rows via stable UI text/role locators.
Metadata for conditional test enablement:
Helpers:
- Put shared helpers in
packages/core/src/helpers/integration/(importable as@open-mercato/core/helpers/integration/*). - Module-local
__integration__/helpers/files should re-export central helpers where possible. - Standalone app developers: import helpers from
@open-mercato/core/helpers/integration/*(included in the npm package).
- Put shared helpers in
Folder-level metadata:
- Add
meta.tsorindex.tsanywhere under__integration__/. - Supported module keys:
dependsOnModules,requiredModules,requiresModules. - Supported env keys:
requiredEnvVars,requiresEnvVars,requiredAnyEnvVars,requiresAnyEnvVars. - Example:
- Add
export const integrationMeta = {
description: 'Billing integration coverage',
dependsOnModules: ['sales', 'currencies'],
}
- Per-test metadata:
- Add metadata directly inside the
.spec.tsfile using the same keys, or create sibling fileTC-XXX.meta.ts. - Example sibling file:
- Add metadata directly inside the
export const integrationMeta = {
dependsOnModules: ['catalog'],
}
- Evaluation model:
- Dependencies inherit from
__integration__/root through nested subfolders and then per-test metadata is applied. - If any required module is not enabled in the app, matching tests are skipped automatically (excluded from discovery/run).
- If any
requiredEnvVarsentry is missing or blank, matching tests are skipped automatically (excluded from discovery/run). - If
requiredAnyEnvVarsis set and none of the listed env vars is configured, matching tests are skipped automatically. - Only env-gate tests that truly require external services. If an AI/LLM flow can be stubbed or can skip only the live model-backed subcase, keep the test runnable without secrets.
- Dependencies inherit from
Phase 6 — Optionally Write the Markdown Scenario
If documentation is desired, create .ai/qa/scenarios/TC-{CATEGORY}-{XXX}-{slug}.md using the template:
# Test Scenario [NUMBER]: [TITLE]
## Test ID
TC-{CATEGORY}-{XXX}
## Category
{Category Name}
## Priority
{High/Medium/Low}
## Type
{UI Test / API Test}
## Description
{What this test validates — derived from spec or feature description}
## Prerequisites
- User is logged in as {role}
- {Other prerequisites from spec}
## Test Steps
| Step | Action | Expected Result |
|------|--------|-----------------|
| 1 | {Discovered action} | {Observed result} |
| 2 | {Discovered action} | {Observed result} |
## Expected Results
- {Derived from spec's API Contracts or UI/UX section}
## Edge Cases / Error Scenarios
- {Derived from spec's Risks section or discovered during exploration}
Fill steps with actual actions and results observed during Phase 3, not hypothetical ones.
This step is optional — skip it if the user only wants the executable test.
Phase 7 — Verify
Run the new test to confirm it passes:
npx playwright test --config .ai/qa/tests/playwright.config.ts <path-to-test-file>
When developing/debugging the test, run fail-fast with no retries:
npx playwright test --config .ai/qa/tests/playwright.config.ts <path-to-test-file> --retries=0
If it fails, fix it. Do not leave broken tests.
Create-App / Standalone Parity
When the change affects packages/create-app, standalone scaffolding, or CLI behavior consumed by scaffolded apps, prefer the monorepo parity command:
yarn test:create-app:integration
What it does:
- builds the local monorepo package artifacts
- scaffolds a fresh temporary standalone app with the local
create-mercato-app - installs local packed
@open-mercato/*tarballs into that app - runs the standalone app's own ephemeral integration command via the local CLI
Use this instead of plain yarn test:integration when the risk is specifically "works in monorepo, breaks in scaffolded standalone app".
Shared — Failure Analysis and User Reporting (Mandatory on Failures)
After any failed test run (single test or suite), analyze failure artifacts before responding. This shared section applies both when:
- writing/updating tests
- only running existing tests after implementing features or bug fixes
- Parse terminal output to capture the failing test names and first error stack/assertion.
- Inspect Playwright artifacts for each failed test from
test-results/and the HTML report:error-context.md- screenshots (expected/actual/diff where available)
- trace/video attachments if present
- Classify each failure into one primary reason:
- Product regression / real app bug
- Test issue (stale locator, brittle assertion, bad fixture/cleanup)
- Environment / data issue (service unavailable, auth/session drift, shared-state collision)
- Decide ownership per failing test:
User/Product teamwhen behavior looks like a real regression or requirement mismatchAgent/QAwhen failure is test-code quality, selector drift, or fixture instabilitySharedwhen both product behavior and test assumptions need adjustment
- Respond with a table (required format) before any optional narrative:
| Failing test | Evidence used | Reasoning (why it failed) | Suggested owner | Next action |
|---|---|---|---|---|
<path>::<test name> |
stdout + screenshot + error-context |
Concise technical diagnosis |
User/Product team / Agent/QA / Shared |
Concrete fix recommendation |
Do not provide a generic "tests failed" summary without per-test reasoning.
Running-Only Mode (No New Test Authoring)
If the user asks only to run integration tests (full suite/category/single file), skip authoring phases and execute the requested run directly.
If the run fails, apply the shared failure-analysis section above.
Rules
- MUST explore the running app before writing — never guess selectors or flows
- MUST check
.ai/qa/ephemeral-env.jsonfirst and reuse existing environment when available - MUST use the active URL from
.ai/qa/ephemeral-env.json(never assumelocalhost:3000) - MUST NOT hardcode record IDs (UUIDs/PKs) in generated tests
- MUST discover or create test entities at runtime, then navigate using discovered links/URLs
- MUST NOT rely on seeded/demo data for prerequisites
- MUST create required fixtures per test (prefer API fixture setup for stability)
- MUST clean up any data created by the test in
finally/teardown - MUST keep tests deterministic and isolated from run order or retries
- MUST NOT add per-test timeout/retry overrides in
.spec.ts; rely on global Playwright config (timeout: 10s,expect.timeout: 10s,retries: 1) - MUST create the
.spec.ts— the markdown scenario is optional - MUST use actual locators from Playwright MCP snapshots (
getByRole,getByLabel,getByText) - MUST verify the test passes before finishing
- MUST analyze failed test artifacts (
stdout,error-context.md, screenshots/report) before reporting failures - MUST report failures in a per-test table that includes reason, evidence, and suggested owner
- MUST apply the same failure-analysis and table-reporting rules when only running existing tests after implementation work
- MUST place executable tests in module-local
__integration__directories; never add.spec.tsfiles under.ai/qa/tests/ - MUST keep module-specific helper utilities next to tests under
<module>/__integration__/helpers/; for shared/cross-module helpers, import from@open-mercato/core/helpers/integration/* - MUST treat
packages/enterprise/modules/<module>/__integration__/as an optional overlay and keep base code independent from enterprise - MUST use
meta.tsorindex.tsdependency metadata for module-gated folders and per-test.meta.ts(or in-file metadata) for individual gating - When deriving from a spec, focus on the happy path first, then add edge cases as separate test cases if they warrant it
- Each test file covers one scenario — create multiple files for multiple scenarios
Deriving Scenarios from a Spec
When reading a spec, extract test scenarios from these sections:
| Spec Section | Generates |
|---|---|
| API Contracts — each endpoint | One API test per endpoint (CRUD) |
| UI/UX — each user flow | One UI test per flow |
| Edge Cases / Error Scenarios | One test per significant error path |
| Risks & Impact Review | Regression tests for documented failure modes |
Typical spec produces 3-8 test cases. Prioritize:
- High: CRUD happy paths, authentication, authorization
- Medium: Validation errors, edge cases with business impact
- Low: Cosmetic, minor UX edge cases
Example
Given SPEC-017 (Version History Panel), the skill would produce:
packages/core/src/modules/admin/__integration__/TC-ADMIN-011.spec.ts— UI: open history panel on an entitypackages/core/src/modules/admin/__integration__/TC-API-AUD-007.spec.ts— API: fetch audit logs for entitypackages/core/src/modules/admin/__integration__/TC-ADMIN-012.spec.ts— UI: restore a previous version- Optionally: matching
.ai/qa/scenarios/TC-ADMIN-011-*.mdfiles for documentation
Running Existing Tests
# Run all integration tests headlessly (zero token cost)
yarn test:integration
# Run tests matching a module/category path fragment
npx playwright test --config .ai/qa/tests/playwright.config.ts sales
# Run a single test
npx playwright test --config .ai/qa/tests/playwright.config.ts packages/core/src/modules/auth/__integration__/TC-AUTH-001.spec.ts
# Run fail-fast in local debugging
npx playwright test --config .ai/qa/tests/playwright.config.ts packages/core/src/modules/auth/__integration__/TC-AUTH-001.spec.ts --retries=0
# Run in ephemeral containers (Docker required)
yarn test:integration:ephemeral
# Preferred for short local loops (reused ephemeral app + DB)
yarn test:integration:ephemeral:interactive
Batch Conversion
When converting multiple scenarios at once:
- List unconverted scenarios by comparing
.ai/qa/scenarios/vs discovered**/__integration__/**/*.spec.ts - Convert one category at a time
- Run the full suite after each category to catch cross-test issues
- Report summary: total converted, passed, failed