name: refactoring-discipline
description:
Run a post-green refactor pass that lowers CRAP and removes duplication
without changing behaviour. Use after tests pass and gates are green, when
a function's CRAP score is high but well-covered, or when the same logic is
duplicated across files. Pairs with the refactorer persona and the
opt-in delivery.refactorStage checkpoint; complements
core/code-simplification (clarity) by focusing on the measured
CRAP/duplication axes.
Refactoring Discipline
Policy Capsule
- Post-green only. Run this pass when the suite is already green and the quality gates already pass. You never refactor from red, and you never make a failing test pass under the banner of "refactoring" — that is a behaviour change.
- Preserve behaviour exactly. Inputs, outputs, side effects, error semantics, and ordering MUST be identical before and after. Existing tests MUST keep passing without modification; if a test had to change, the refactor changed behaviour — revert it.
- Lower CRAP by lowering complexity, not by adding coverage. CRAP =
complexity² × (1 − coverage)³ + complexity. The refactorer's lever is the complexity factor: extract functions, flatten nesting with guard clauses, replace nested ternaries and boolean-flag params, and split multi-responsibility functions. Adding tests to a tangled function is the test author's job, not this pass. - Remove duplication at the root. Extract one well-named helper for repeated logic instead of leaving near-copies that drift apart. Dedup the behaviour, not just the text — collapse copies only when they are genuinely the same responsibility.
- Target by measurement, not by smell. Work from the baselines (
check-baselines.jsCRAP/maintainability) and pick the highest-CRAP, well-covered functions and the largest verbatim duplications first. Prove each number moved the right way after the change. - Do no harm to the gates. After every change, CRAP must not rise and maintainability must not fall for any touched file. A refactor that regresses a baseline is not done — back it out. Never retune a gate, floor, or coverage threshold to make the pass "succeed".
- Comprehend before you touch (Chesterton's Fence). Read the call sites and the tests first; never refactor code you do not fully understand. An "unnecessary" abstraction may exist for testability or extensibility.
- Incremental and isolated. One behaviour-preserving change at a time, re-running the affected tests after each. Keep refactors in their own commits, separate from feature or fix work, so diffs stay reviewable and revertible.
- Scope discipline. Refactor only the targeted functions/duplications. No drive-by rewrites of untargeted modules — unscoped churn adds regression risk and noise.
Overview
CRAP (Change Risk Anti-Patterns) flags functions that are both complex and
under-tested; duplication multiplies the cost of every future change. This
skill is the disciplined, behaviour-preserving pass that drives those two
metrics down after the code is green — the post-green stage owned by the
refactorer persona and wired in as the
opt-in delivery.refactorStage checkpoint. It complements
core/code-simplification: simplification
optimises for human comprehension; this skill optimises for the measured
CRAP and duplication axes the baselines ratchet.
When to Use
- A baseline report flags a function with high CRAP that is already well-covered (so the lever is complexity, not coverage).
- The same logic is duplicated across two or more files and the copies are starting to drift.
- An opt-in
delivery.refactorStagecheckpoint runs after gates pass and before a story closes.
When NOT to use: before tests are green (write/fix tests first); when a
high-CRAP function is high because it is uncovered (that is a missing-test
problem — use core/test-driven-development or
core/mutation-survivor-remediation); or for pure clarity tweaks with no
metric impact (use core/code-simplification).
The Refactor Loop
Confirm green + baseline → pick highest-CRAP / largest dup
▲ │
│ ▼
re-run gates, confirm ◄── refactor ◄── comprehend the target
CRAP↓ / maint↑, tests pass (one change) (call sites + tests)
- Baseline. Confirm the suite is green and capture current CRAP and
maintainability (
node .agents/scripts/check-baselines.js). This is your do-no-harm reference. - Target. Pick the highest-CRAP well-covered function, or the largest verbatim duplication. Work worst-first.
- Comprehend. Read the call sites and the tests that pin the behaviour before touching anything.
- Refactor one step. Extract a function, flatten nesting into guard clauses, replace a nested ternary / boolean-flag param, or extract a shared helper for the duplication.
- Verify. Re-run the affected tests (unmodified) and the baselines. CRAP must not rise, maintainability must not fall, tests must pass. If a test needed editing, revert — behaviour changed.
CRAP & Duplication Targeting
- CRAP lever is complexity. Reduce branches and nesting per covered line. Extracting a focused helper moves complexity out of the hot function and lowers its CRAP without inflating the suite.
- Duplication lever is a single source. Replace N near-copies with one helper, then delete the now-dead copies within the touched files.
- Equivalent-by-coincidence is not duplication. Two blocks that look alike but encode different responsibilities should stay separate; merging them couples unrelated change reasons.
Red Flags
- Editing a test to keep it passing after a "refactor" (behaviour changed).
- CRAP or maintainability regressing on a touched file and shipping anyway.
- Retuning a gate, floor, or coverage threshold to make the pass succeed.
- Refactoring an uncovered function to lower CRAP — that needs tests, not a restructure.
- Merging look-alike blocks that encode different responsibilities.
- Drive-by rewrites of modules the pass did not target.
Verification
- The suite was green before the pass and is green after, with no test modifications.
- CRAP did not rise and maintainability did not fall on any touched file (baselines re-run and compared).
- No gate, floor, or coverage threshold was loosened.
- Duplication was removed at the root (one helper), not just locally patched.
- Each refactor is an isolated, reviewable commit, separate from feature or fix work.
- No dead code left behind after extractions (unused imports, orphaned helpers).