ai-generated-code-hardening - SKILL.md Agent Skill

mustflow_doc: skill.ai-generated-code-hardening locale: en canonical: true revision: 2 lifecycle: mustflow-owned authority: procedure name: ai-generated-code-hardening description: Apply this skill when AI-generated, assistant-authored, vibe-coded, or broad code changes need evidence-backed hardening against symptom-only fixes, pinpoint hardcoding, duplicate helpers or shapes, hidden coupling, weak tests, swallowed errors, excessive complexity, and boundary drift. metadata: mustflow_schema: "1" mustflow_kind: procedure pack_id: mustflow.core skill_id: mustflow.core.ai-generated-code-hardening command_intents: - changes_status - changes_diff_summary - test_related - test_audit - lint - build - docs_validate_fast - test_release - mustflow_check

AI-Generated Code Hardening

Purpose

Use this skill to turn agent-written code from "looks plausible" into maintainable repository code. The goal is not style cleanup. The goal is to catch the failure modes that appear when an agent forgets the repository between sessions: symptom-only patches, pinpoint hardcoding, duplicate helpers, duplicate shapes, hidden coupling, weak tests, swallowed errors, accidental public surfaces, and oversized functions or files.

Use When

The current diff was substantially generated by an AI agent, copied from an outside suggestion, or produced through broad assistant edits.
A bug fix appears to handle only the reported example, failing fixture, exact string, exact path, exact ID, exact platform, or current test input instead of the same class of defect.
A new helper, function, type, schema, DTO, fixture, adapter, or shape appears and the repository may already have a single source of truth.
The change may introduce hidden coupling through import direction, initialization order, shared globals, side effects, circular dependencies, barrel exports, or re-export drift.
Tests were added or changed and may only check strings, function names, snapshots, mocks, or implementation details instead of observable behavior.
Error handling, fallback paths, defensive guards, or default values were added in multiple callers instead of one boundary.
New or changed functions/files are becoming hard to review because of excessive nesting, god function growth, god file growth, fan-in concentration, or fan-out sprawl.

Do Not Use When

The task is a tiny mechanical edit with no new behavior, helpers, shapes, tests, imports, or error handling.
The task is purely a code review with no edits; use code-review as the main skill and add this skill only when AI-generated-code failure modes are concrete.
The task is deciding the first architecture boundary for new behavior; use structure-first-engineering first, then this skill as a hardening pass if code is generated or broad.
The task is adopting an outside tool, plugin, command, or CI rule; use external-skill-intake, command-intent-mapping-gate, or command-contract-authoring as appropriate before adding commands or rules.

Required Inputs

User goal and changed files or expected edit scope.
Current diff, or the files likely to be changed.
Existing helpers, public types, schemas, DTOs, fixtures, adapters, utilities, and domain shapes near the change.
Import and export paths for touched modules, including barrels and public package entrypoints when present.
Existing error-handling conventions and the boundary where failures are meant to become user-facing, logged, retried, or returned.
The reported symptom, failing fixture, user example, or scanner finding that caused the fix, plus nearby inputs that should follow the same rule.
Existing test style, edge-case coverage, and configured verification intents.
Existing static analysis or lint rules, if any. Missing rules may be reported or proposed, but this skill does not authorize adding new dependencies, CI gates, or command contracts by itself.

Preconditions

The task matches the Use When conditions and does not match the Do Not Use When exclusions.
Higher-priority repository instructions, selected main-route skills, and .mustflow/config/commands.toml have been checked for the current scope.
The hardening concern is backed by current repository evidence or the current diff, not by generic best-practice advice alone.
The work can stay bounded to the touched code, directly synchronized surfaces, focused tests, or report-only findings.

Allowed Edits

Replace duplicate new helpers, shapes, fixtures, or validators with existing sources of truth.
Replace symptom-only conditions, literal-value patches, one-off string replacements, and caller-by-caller guards with the smallest rule, parser, boundary, or data owner that covers the defect class.
Extract or move code only around a clear local responsibility boundary.
Flatten excessive nesting when behavior is unchanged and tests cover the path.
Remove accidental re-exports or update synchronized public-surface contracts.
Consolidate error handling into the owning boundary.
Add focused behavior tests and edge-case tests tied to the current change.
Update directly synchronized docs, templates, or route metadata when the hardening change affects installed or public surfaces.

Procedure

Baseline the diff.
- Identify new functions, helpers, classes, types, schemas, DTOs, fixtures, adapters, exported names, and test helpers.
- Identify any literal values, branch conditions, special-case IDs, one-off regexes, fixture names, paths, or platform checks that were added because they match the reported symptom.
- Separate intentional new concepts from possible duplicates.
- Note which files are production code, test code, generated code, public contract files, or installed templates.
Check for symptom-only patches.
- Name the reported symptom and the broader defect class before accepting the fix. Examples: path traversal, stale cache authority, duplicate side effect, invalid state transition, malformed command fence, newline handling, missing cleanup, wrong ownership check, or non-atomic write.
- Search for sibling inputs, adjacent parsers, other callers, parallel fixtures, and same-shaped branches that would fail for the same reason.
- Reject fixes that only say "if this exact value, return the expected answer" unless the value is a documented product rule with an owner and test.
- Prefer changing the parser, validator, state transition, normalization rule, ownership boundary, or shared helper that owns the whole defect class.
- If a narrow patch is the only safe current change, document why the broader rule is deferred and add a regression guard that names the remaining class risk.
Check repository memory discontinuity.
- Search for existing helpers, types, schemas, shapes, fixtures, and validators before accepting a new one.
- Prefer the existing single source of truth when it already covers the behavior.
- If the existing source is incomplete, extend it narrowly instead of creating a parallel shape.
- If both old and new concepts must remain, name the boundary and explain why they are not duplicates.
Check coupling and dependency direction.
- Look for hidden coupling through import direction, initialization order, shared mutable state, environment reads, caches, file-system state, timers, module-level side effects, and ambient globals.
- Look for circular dependencies or a new dependency from lower-level code into higher-level orchestration, UI, CLI, or framework code.
- Inspect fan-in and fan-out. A new utility imported everywhere, or a high-level module imported from many low-level places, needs a boundary explanation.
Check public surface and re-export discipline.
- Treat a barrel export, wildcard export, package entrypoint, generated client, CLI JSON field, or template-installed file as a public surface until proven otherwise.
- Remove accidental re-exports, or update the matching contract, tests, and docs when the export is intentional.
- Do not hide a breaking public-surface change behind "internal cleanup" wording.
Check error handling and fallbacks.
- Reject empty catch, generic fallback values, silent null returns, and duplicated defensive guards unless they are the repository's explicit boundary pattern.
- Move failure interpretation to the boundary that owns it: parser, adapter, handler, command, job, or UI state.
- Preserve useful error context without leaking secrets or personal data.
- Prefer explicit result shapes or typed errors when they already exist locally.
Check complexity before it calcifies.
- Challenge nesting deeper than three levels, especially inside loops, async flows, parsers, command handlers, and test setup.
- Flatten unnecessary nesting with guard clauses or early returns when that keeps the invariant clearer.
- Split a god function only around a real responsibility boundary. Do not create tiny helpers that hide control flow or duplicate a local convention.
- Split a god file only when the new module has a stable owner, direction, and verification path.
Harden tests against fake confidence.
- Verify behavior and side effects, not only function names, strings, snapshots, or the presence of files.
- For bug fixes, add at least one nearby negative or sibling case when feasible so the guard proves the generalized rule, not only the reported literal input.
- Add or preserve edge cases for empty input, missing input, null or undefined where the language permits it, boundary values, duplicate requests, failure paths, and concurrency or ordering when relevant.
- Mock only external boundaries: network, clock, filesystem, process, database, queue, browser, or third-party service. Avoid mocking internal collaborators just to make the implementation easier to assert.
- If production code gained defensive fallback only to satisfy a brittle test, fix the test or move the fallback to the owning boundary.
Use enforcement evidence without inventing enforcement.
- Run existing configured lint, build, and related test intents when they cover the risk.
- If the repository lacks dependency-boundary, complexity, max-depth, no-floating-promise, no-explicit-any, or circular-dependency checks, report the missing guard as future hardening unless the user explicitly asked to add it.
- Never paste outside plugin commands, package-manager commands, or CI recipes into mustflow command contracts without command-contract review.
Decide fix now, defer, or report.

Fix now when the issue is in the touched code, small, directly tied to the current behavior, and verifiable with configured commands.
Defer when the issue needs broad architecture work, new dependencies, new CI gates, package-wide lint policy, or a migration across unrelated modules.
Report only when evidence is real but fixing it would exceed the current task boundary.

Postconditions

New helpers, shapes, fixtures, validators, and exports have been checked against existing sources of truth.
Symptom-only patches, pinpoint hardcoding, literal-value branches, and one-off string replacements have been replaced by a defect-class rule or explicitly reported with the remaining class risk.
Hidden coupling, dependency direction, circular dependencies, fan-in, fan-out, re-export drift, and public-surface changes have been fixed, deferred, or explicitly reported.
Error handling is either centralized at the owning boundary or the remaining duplicated guard/fallback risk is documented.
Tests exercise behavior and meaningful side effects, with edge and failure paths covered where relevant to the change.

Verification

Use changes_status and changes_diff_summary to keep the hardening scope visible.
Run test_related for changed behavior, tests, or installed template surfaces.
Run test_audit when test quality, mock boundaries, or missing edge cases are the primary concern and the intent is configured.
Run lint or build only when the repository already configures those intents and they cover the touched language or package.
Run docs_validate_fast, test_release, or mustflow_check when the hardening change touches workflow docs, templates, version metadata, route metadata, or release-sensitive files.

Failure Handling

If a configured command fails, switch to failure-triage before broadening the fix.
If repository search cannot prove whether a helper or shape is duplicate, report the evidence gap instead of deleting or merging concepts speculatively.
If the same defect class may exist in uninspected callers, parsers, fixtures, or sibling inputs, report the uninspected surface instead of claiming the root cause is fixed.
If the hardening issue requires new dependencies, new CI gates, package-wide lint policy, or a cross-module migration, defer it with a concrete recommendation.
If the current diff is too broad to inspect in one pass, use heuristic-candidate-selection to choose the highest-risk batch first.
If external advice suggests commands or plugin behavior, use external-skill-intake and command-intent-mapping-gate before preserving it in mustflow.

Output Format

Include:

Existing sources of truth reused or extended.
Symptom-only patches, pinpoint hardcoding, and same-defect-class sibling surfaces fixed, ruled out, or reported.
Duplicate helpers, duplicate shapes, or repeated guards removed.
Coupling, circular-dependency, fan-in, fan-out, public-surface, and re-export risks found or ruled out.
Error-handling and fallback decisions.
Test hardening: behavior assertions, side effects, edge cases, and mock boundary.
Configured verification intents run and their result.
Deferred enforcement, lint, or CI guard suggestions, clearly marked as not added.