name: testing-strategy
description: "Use when planning tests for a bug fix, feature, or refactor — deciding what deserves a test, at which level, with what evidence. Covers test-scope decisions, test-level selection (unit / integration / contract / e2e), effort-to-risk matching, regression targeting, evidence quality, and failure-case coverage. Do NOT use for chasing a known failure (that is debugging), for pure doc writing (that is documentation), or for conceptual architecture discussion with no verification target (no dedicated skill — treat as strategy, not testing). Do NOT use for my existing test is failing — why? Do NOT use for write a testing-patterns guide for the contributor docs. Do NOT use for clean up this duplicated test setup across three files."
license: MIT
compatibility: "Markdown, Git, any codebase"
allowed-tools: Read Grep Bash
metadata:
relations: "{"related":["microcopy","middleware-patterns","debugging","refactor","test-coverage-strategy","integration-test-design","contract-testing","e2e-test-design"],"suppresses":["integration-test-design","performance-engineering"],"verify_with":["debugging","contract-testing","test-coverage-strategy","integration-test-design","e2e-test-design"],"depends_on":["refactor"]}"
subject: quality-assurance
scope: "Portable test-scope and test-level decision-making for software changes: choosing what behavior or regression target deserves verification, selecting unit, integration, contract, or end-to-end evidence by risk and coupling, and avoiding tests whose maintenance cost exceeds their failure signal. Excludes root-cause debugging of active failures, implementation mechanics for a chosen test type, documentation writing, and architecture discussion without a concrete verification target."
public: "true"
taxonomy_domain: quality/testing-strategy
stability: experimental
keywords: "["testing strategy","what to test","test level selection","test scope","effort vs risk","regression target","failure case coverage","unit integration contract e2e","test plan","verification evidence"]"
triggers: "["testing-skill"]"
examples: "["do I need a unit test for this pure formatter or is integration enough?","what's the right test level for a webhook handler that talks to Stripe?","the feature passes manual QA — does it need an automated test?","pin this regression so the same bug can't slip through again"]"
anti_examples: "["my existing test is failing — why?","write a testing-patterns guide for the contributor docs","clean up this duplicated test setup across three files"]"
grounding: "{"subject_matter":"Test-scope and test-level selection for software changes","grounding_mode":"universal","truth_sources":["https://martinfowler.com/articles/practical-test-pyramid.html\",\"https://testing.googleblog.com/2010/12/test-sizes.html\",\"https://playwright.dev/docs/best-practices\",\"https://docs.pact.io/getting_started/how_pact_works\"],\"failure_modes\":[\"test_level_selected_by_file_type_not_risk\",\"mocked_unit_test_hides_cross_boundary_regression\",\"end_to_end_suite_expands_until_slow_and_flaky\",\"coverage_percentage_replaces_behavior_target\",\"production_regression_pinned_at_wrong_level\"],\"evidence_priority\":\"equal\"}"
mental_model: "Testing strategy is the choice of verification target, test level, and evidence quality for a specific change. Its primitives are the behavior or regression target, the risk of failure, the coupling boundary that could hide the bug, the cheapest test level that can observe the behavior honestly, the expected failure signal, and the maintenance cost the suite will carry afterward."
purpose: "Test suites become expensive when they chase coverage percentage, duplicate low-risk paths, or mock away the boundary that can actually fail. This skill exists to spend verification effort where it can prevent real regressions, while avoiding tests that will never produce useful signal."
concept_boundary: "This skill owns the decision about what behavior to test and at which level for a planned change or regression target. It does not own root-cause debugging of a failing test, detailed implementation patterns for one chosen test type, documentation writing, performance-load verification, or architecture strategy without a concrete verification target."
analogy: "A test strategy is placing sensors in a system: each sensor should sit where a meaningful failure can be observed with the least noise and maintenance cost."
misconception: "The common mistake is choosing tests by edited file type or coverage percentage. The better choice starts from the behavior that could regress, then selects the cheapest level that can observe that failure without mocking it away."
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
skill_graph_project: Skill Graph
skill_graph_canonical_skill: skills/quality-assurance/testing-strategy/SKILL.md
skill_graph_export_description_projection: anti_examples
Concept of the skill
What it is: Testing strategy is the discipline of choosing which behavior needs verification, which test level can observe it honestly, and what evidence proves the change is protected.
Mental model: A strategy decision has six parts: the behavior or regression target, the risk of failure, the boundary where the failure would appear, the cheapest test level that still observes that boundary, the expected failure signal, and the future maintenance cost of keeping the test.
Why it exists: Test suites fail when they optimize for visible effort instead of useful signal. They add tests that never catch regressions, mock the only boundary that could fail, or expand end-to-end coverage until feedback becomes slow and flaky. This skill keeps verification effort tied to production risk.
What it is NOT: It is not debugging an active failing test, implementing integration-test fixtures, designing contract-test provider states, writing contributor documentation, performance-load testing, or architecture planning without a concrete verification target.
Adjacent concepts: Test coverage strategy, integration test design, contract testing, end-to-end test design, debugging, refactor verification, performance engineering, and code review.
One-line analogy: A test strategy is placing sensors in a system: each sensor should sit where a meaningful failure can be observed with the least noise and maintenance cost.
Common misconception: Coverage percentage or the edited file type does not decide the right test. The behavior that could regress decides the target, and the cheapest honest observation point decides the level.
Testing Strategy
Coverage
- Test scope: deciding what behavior actually needs a test, and what does not earn the maintenance cost
- Test level selection: choosing between unit, integration, contract, and end-to-end tests based on risk and coupling
- Effort-to-risk matching: investing verification effort where regressions are most likely and most damaging
- Regression targeting: writing tests that pin the specific behavior a change risks breaking, not generic coverage
- Evidence quality: preferring concrete, reproducible verification over assumed or manual checks
- Failure-case coverage: ensuring boundary conditions and error paths are tested, not only the happy path
Philosophy of the skill
Most test suites fail the effort-to-risk test: they exercise code that will never break and skip code that breaks in production. The correct target is the behavior that ships to users, not the code you happen to have written last. Coverage percentage is a proxy, and every proxy eventually gets gamed — the real signal is regressions caught before release. A test that never fails is noise; a test that fails without isolating the cause is worse than no test at all because it wastes the next engineer's time.
Test-Level Selection
Pick the test level by the risk of the change and the coupling of the behavior, not by the file you happen to be editing. Unit tests are cheap to write and cheap to pass; integration and contract tests are where real production bugs are actually caught.
| Situation | Test level | Why |
|---|---|---|
| Pure function, single-owner, no I/O | Unit | Fast, deterministic, zero setup. If you cannot unit-test it, the function is doing too much |
| Logic that composes multiple units inside one service | Integration (in-process) | Unit tests of each piece will miss composition bugs; integration test catches real wiring |
| Behavior that crosses a service / process / network boundary | Contract | Both sides need a shared verifiable agreement; a unit test on either side misses the real failure mode |
| User-visible flow end-to-end | E2E (one or two per critical path) | Proves the full path works at least once; too expensive to run for every code path |
| Bug fix for a bug that reached production | Regression at the level where the bug slipped through | If it slipped past unit tests, a unit test won't catch it next time; write the test at the level the bug exposed |
| Behavior that is "obviously correct," unchanged for a year, no external pressure | No new test | The test would never fail; it would only add maintenance cost. Every test is a liability until it catches a bug |
Level-selection anti-patterns
- Unit testing what should be an integration test — mocking the only thing that could actually break. Fix: test the real integration, or admit the unit test proves nothing.
- Integration testing what should be a unit test — slow setup for a function that has no dependencies. Fix: extract the pure logic and unit-test it.
- E2E-testing every code path — fragile, slow, flaky. Fix: one E2E per critical user journey, unit/integration for the rest.
- Adding a test because coverage dropped — test has no regression target and never fails meaningfully. Fix: either find a real regression to pin, or delete the uncovered code if it has no value.
Evals
This skill ships local eval artifacts at skills/skills/quality-assurance/testing-strategy/evals/comprehension.json and skills/skills/quality-assurance/testing-strategy/evals/application.json. The Verification checklist below is the authoring gate for a completed test plan; the eval files are audit-loop evidence for whether the concept can be understood and applied. Do not conflate them — the checklist is for the test author, the evals are for the grader.
Verification
- The test type matches the change risk
- A behavior or regression target is explicit
- Verification evidence is concrete, not assumed
- Failure cases and boundaries are covered, not only the happy path
Do NOT Use When
| Use instead | When |
|---|---|
documentation |
The task structures explanation for a reader, not verification for a change |
debugging |
The task is chasing a known failure — strategy is planned before the failure, not after |
refactor |
The task is restructuring code; any test work is to preserve existing behavior, which belongs to the refactor skill's verification step |
Skill Graph context
Classification
- Subject:
quality-assurance - Public:
true - Domain:
quality/testing-strategy - Scope: Portable test-scope and test-level decision-making for software changes: choosing what behavior or regression target deserves verification, selecting unit, integration, contract, or end-to-end evidence by risk and coupling, and avoiding tests whose maintenance cost exceeds their failure signal. Excludes root-cause debugging of active failures, implementation mechanics for a chosen test type, documentation writing, and architecture discussion without a concrete verification target.
When to use
- do I need a unit test for this pure formatter or is integration enough?
- what's the right test level for a webhook handler that talks to Stripe?
- the feature passes manual QA — does it need an automated test?
- pin this regression so the same bug can't slip through again
- Triggers:
testing-skill
Not for
- my existing test is failing — why?
- write a testing-patterns guide for the contributor docs
- clean up this duplicated test setup across three files
Related skills
- Depends on:
refactor - Verify with:
debugging,contract-testing,test-coverage-strategy,integration-test-design,e2e-test-design - Related:
microcopy,middleware-patterns,debugging,refactor,test-coverage-strategy,integration-test-design,contract-testing,e2e-test-design
Concept
- Mental model: Testing strategy is the choice of verification target, test level, and evidence quality for a specific change. Its primitives are the behavior or regression target, the risk of failure, the coupling boundary that could hide the bug, the cheapest test level that can observe the behavior honestly, the expected failure signal, and the maintenance cost the suite will carry afterward.
- Purpose: Test suites become expensive when they chase coverage percentage, duplicate low-risk paths, or mock away the boundary that can actually fail. This skill exists to spend verification effort where it can prevent real regressions, while avoiding tests that will never produce useful signal.
- Boundary: This skill owns the decision about what behavior to test and at which level for a planned change or regression target. It does not own root-cause debugging of a failing test, detailed implementation patterns for one chosen test type, documentation writing, performance-load verification, or architecture strategy without a concrete verification target.
- Analogy: A test strategy is placing sensors in a system: each sensor should sit where a meaningful failure can be observed with the least noise and maintenance cost.
- Common misconception: The common mistake is choosing tests by edited file type or coverage percentage. The better choice starts from the behavior that could regress, then selects the cheapest level that can observe that failure without mocking it away.
Grounding
- Mode:
universal - Truth sources:
https://martinfowler.com/articles/practical-test-pyramid.html,https://testing.googleblog.com/2010/12/test-sizes.html,https://playwright.dev/docs/best-practices,https://docs.pact.io/getting_started/how_pact_works
Keywords
testing strategy,what to test,test level selection,test scope,effort vs risk,regression target,failure case coverage,unit integration contract e2e,test plan,verification evidence