improve - SKILL.md Agent Skill

name: improve description: | Autonomous improvement loop — repeatedly runs QA, Fix, and Refactor cycles to continuously improve code quality. Each round runs tests and auto-reverts refactoring commits that break tests. Leverages SuperClaude commands and MCP servers (serena, sequential-thinking, context7, playwright, tavily). Uses agent teams for parallel QA when CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS is enabled. arguments:

name: rounds description: Maximum number of improvement rounds (early termination if 0 issues found) default: "5"
name: focus description: Scope of improvement (api|frontend|all) default: all
name: dry-run description: If true, run QA only without fixes or refactoring default: "false"

Autonomous Improvement Loop

Repeats QA → Fix → Refactor → E2E Safety → Reflection → Self-Learning for up to {{rounds}} rounds. Terminates early when open issues reach 0, or when abort conditions are triggered.

Toolchain

This skill combines the following tools:

SuperClaude commands:

/sc:analyze — Phase 1: Code and architecture structural analysis
/sc:troubleshoot — Phase 2: Debug and root-cause test failures
/sc:cleanup — Phase 3: Structured refactoring
/sc:reflect — Phase 5: Structured retrospective

MCP servers:

serena — Phase 1/3: Semantic code understanding, dependency graph analysis
sequential-thinking — Phase 1/6: Multi-step reasoning for complex problems
context7 — Phase 2: Official documentation for Hono, Drizzle, Vitest, etc.
playwright — Phase 1/4: Browser-based E2E test execution (via make test-e2e)
tavily — Phase 6: Web research for best practices

Fallback rule: If any MCP server or /sc: command is unavailable, log a warning and continue without it. MCPs and SuperClaude enhance the loop but are NOT required.

Critical Safety Rules

All work MUST be done on a feature branch. NEVER modify the main branch.
Auto-revert refactoring commits when tests break after refactoring.
Record results of each phase in .improvement-state/.
Follow Conventional Commits format.
NEVER weaken or delete tests to make them pass. Fix the implementation instead.

Abort Conditions (Loop Stops Entirely)

The loop MUST stop immediately and report to the user if ANY of these occur:

Git conflict: Any git operation (revert, merge) fails with a conflict.
Net regression: Issue count INCREASES for 2 consecutive rounds.
Recurring failure: The same file/test fails in 2 consecutive rounds after being "fixed".
Phase 2 regression: A fix in Phase 2 causes NEW test failures that did not exist before.
Consecutive reverts: Phase 4 auto-revert triggers in 2 consecutive rounds.
Test runner crash: npm test exits non-zero but produces no FAIL lines AND no test summary line (infrastructure failure, not test failure).
Disk space critical: Less than 500MB free disk space.

When aborting, output:

=== LOOP ABORTED ===
Reason: {specific abort condition}
Round: N / {{rounds}}
Branch: {branch name}
Last stable state: {git tag name}
Action required: {what the user should do}

Phase 0: Setup (first round only)

Verify the working directory is a git repository.
Run git status to confirm the working tree is clean.
- If not clean, warn the user and abort.
Confirm the current branch is main.

Create a feature branch:

BRANCH="improve/$(date +%Y%m%d-%H%M%S)"
if git rev-parse --verify "$BRANCH" >/dev/null 2>&1; then
  echo "ERROR: Branch $BRANCH already exists. Aborting."
  exit 1
fi
git checkout -b "$BRANCH"

Create the state directory:
```
mkdir -p .improvement-state
```

Initialize the run log:

echo "# Run Log — $BRANCH" > .improvement-state/run.log
echo "Started: $(date -Iseconds)" >> .improvement-state/run.log
echo "Parameters: rounds={{rounds}}, focus={{focus}}, dry-run={{dry-run}}" >> .improvement-state/run.log

Load .improvement-config.json if it exists. Otherwise use default values.

Capture test baseline — run unit tests and E2E tests to record the initial state:

# Unit tests
cd api && timeout --kill-after=10s ${TEST_TIMEOUT:-120}s npm test 2>&1 | tee .improvement-state/test-baseline.log
BASELINE_UNIT_EXIT=$?

# E2E tests
cd $(git rev-parse --show-toplevel) && timeout --kill-after=30s ${E2E_TIMEOUT:-300}s make test-e2e 2>&1 | tee .improvement-state/e2e-baseline.log
BASELINE_E2E_EXIT=$?

Extract baseline test counts from both outputs (see "Reading Test Output" section). Record: BASELINE_UNIT_TEST_COUNT, BASELINE_UNIT_FAIL_COUNT, BASELINE_E2E_EXIT. These baselines are used throughout the loop to detect regressions.

Use serena MCP to understand project structure — Query serena for module structure, dependency graph, and key entry points. This improves analysis accuracy in subsequent phases.

Reading Test Output

All phases that run tests MUST use this standardized procedure.

Running tests with timeout

ALWAYS wrap test commands with a timeout:

cd api && timeout --kill-after=10s ${TEST_TIMEOUT:-120}s npm test 2>&1 | tee /tmp/test-output.log
TEST_EXIT=$?

Classifying the exit code

If TEST_EXIT is non-zero, determine the cause:

Timeout (TEST_EXIT=124): Log as [HIGH] Test execution timeout. Check Docker status.
Infrastructure failure: Check output for these patterns:
- Cannot connect to Docker daemon → Docker not running. Skip Vitest, continue with lint only.
- EADDRINUSE or port already in use → Port conflict. Log warning and skip.
- no space left on device → Trigger abort condition #7.
- Cannot find module.*vitest → Config error. Abort loop. If ANY of these match, this is NOT a test failure. Do NOT file test issues.
Actual test failure: Output contains FAIL lines or the test summary shows failed > 0.

Parsing test failures

Search the output for ALL of these patterns:

Standard failures: Lines starting with FAIL (e.g., FAIL tests/file.ts > Test Name)
Compilation errors: Lines matching error TS[0-9] or SyntaxError or ParseError
Import errors: Lines matching Cannot find module
Setup failures: beforeAll or beforeEach errors in stack traces

Create a separate issue for EACH failure. Each issue MUST include:

Test file name and line number
Test name (if available)
Error message
Severity: always HIGH

Verifying the test summary

Find the summary line (handles multiple Vitest formats):

Tests  2 failed | 144 passed (146)        ← Vitest 1.x
Test Files  1 failed | 12 passed (13)     ← Vitest 2.x

Extract failed and passed counts. If no summary line exists AND exit code is non-zero, treat as infrastructure failure.

Test count regression check

Compare current total test count against BASELINE_TEST_COUNT:

If current total < baseline: ABORT. A test file may have been deleted. Log the difference and investigate.
If current total >= baseline: OK, proceed.

Main Loop: Round 1 ~ {{rounds}}

Log [Round N/{{rounds}}] at the start of each round.

Phase 1: QA (Issue Detection)

Scope: {{focus}}

Create a savepoint before this round:

git tag "savepoint-round-$ROUND_NUM"

If agent teams are available (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1), run QA checks in parallel using 4 agent teams:

Team A: Static analysis (1-1 Biome + 1-2 TypeScript)
Team B: Unit test execution (1-3 Vitest)
Team C: E2E test execution (1-4 make test-e2e)
Team D: Code analysis (1-5 /sc:analyze + serena + sequential-thinking)

Otherwise, run sequentially.

1-1. Biome Lint (when {{focus}} is api or all)

cd api && npx biome check src/ 2>&1

Extract warnings/errors from output and record as issues.

1-2. TypeScript Type Check (when {{focus}} is api or all)

cd api && npx tsc --noEmit 2>&1

Record type errors as issues (severity: HIGH — type errors mean compilation failure).

1-3. Vitest Tests (when {{focus}} is api or all)

Run tests using the standardized procedure in "Reading Test Output" section. File each failure as a separate issue.

1-4. E2E Tests

Run the full E2E test suite. make test-e2e handles environment startup, test execution, and cleanup automatically.

cd $(git rev-parse --show-toplevel) && timeout --kill-after=30s ${E2E_TIMEOUT:-300}s make test-e2e 2>&1 | tee /tmp/e2e-output.log
E2E_EXIT=$?

Apply the same "Reading Test Output" procedure to E2E output:

If E2E_EXIT=124: timeout. Log as [HIGH] E2E test execution timeout.
If E2E_EXIT is non-zero: parse output for failure patterns. File each E2E failure as a separate issue with Source: e2e and severity: HIGH.
If E2E_EXIT=0: no E2E issues.

E2E regression check: If BASELINE_E2E_EXIT was 0 (E2E passed at baseline) but E2E_EXIT is now non-zero, mark these failures as regressions introduced by previous rounds.

If make test-e2e fails due to infrastructure (Docker not running, port conflicts), log as "E2E infrastructure failure" and continue with other QA checks. Do NOT file test issues for infrastructure failures.

1-5. Code Analysis with /sc:analyze

Use /sc:analyze for structural analysis of the codebase:

/sc:analyze "Analyze {{focus}} codebase for code quality issues:
- Type safety problems (any types, missing type guards)
- Error handling gaps (async without try-catch)
- Files exceeding 300 lines
- Functions exceeding 50 lines
- Missing input validation (Zod schemas)
- Hardcoded strings/config values
- Circular dependencies
- CLAUDE.md convention violations"

Use serena MCP alongside: Leverage serena's semantic analysis to check module dependency issues, unused exports, and circular call graphs.

Use sequential-thinking MCP alongside: For complex architectural problems, use multi-step reasoning to identify root causes at the design level.

Review target selection: Prioritize files modified in the previous round or files with many lint issues. In round 1, focus on the service layer (api/src/services/) and route layer (api/src/routes/).

Issue Aggregation

Save all QA results to .improvement-state/issues-round-N.md in this format:

# Issues - Round N

**Found**: X issues | **Severity**: CRITICAL=0, HIGH=0, MEDIUM=0, LOW=0
**Baseline test count**: {BASELINE_TEST_COUNT} | **Current test count**: {current}

## Issues

### [SEVERITY] Short description
- **File**: `path/to/file.ts:line`
- **Source**: lint | typecheck | vitest | e2e | sc:analyze | serena
- **Detail**: Description of the problem
- **Suggestion**: Proposed fix (if any)

Decision:

Issue count is 0 → exit the loop and proceed to Phase 7 (Finalize).
Net regression check: If this round's issue count > previous round's issue count, increment REGRESSION_COUNTER. If REGRESSION_COUNTER >= 2, trigger abort condition #2.

Phase 2: Fix (Issue Resolution)

Skip this phase if {{dry-run}} is true.

2-1. Auto-fix with Tools

Apply Biome auto-fixes first:

cd api && npx biome check --write src/

2-2. Fix Test Failures (HIGHEST PRIORITY)

Test failure issues MUST be fixed before all other issues.

Use /sc:troubleshoot for debugging:

/sc:troubleshoot "Test failure in {test file name}:
Error: {error message}
at line {line number}

Analyze the root cause and suggest a fix."

Use context7 MCP alongside: Look up official documentation for frameworks (Vitest, Playwright, testcontainers) via context7. Verify correct API usage, especially mock lifecycle (vi.spyOn, mockImplementation, mockReset, restoreAllMocks).

Fix procedure:

Read the source code of both the failing test file and the implementation under test.
Identify the cause using /sc:troubleshoot + context7.
Decide the fix strategy:
- Bug in implementation → fix the implementation (NEVER weaken tests)
- Mock/setup issue in test → fix the test code (this is a legitimate fix)
- Outdated test (snapshot, expected values changed) → update the test
After fixing, use serena to check impact: If you changed a helper/util function, check all callers via serena. If impact is large, consider adding a NEW test.
Verify the fix by re-running only that test file:
```
cd api && npx vitest run tests/{test file name} 2>&1
```
If failures remain, retry the fix (maximum 3 attempts per test). If still failing after 3 attempts, log as "unresolvable" and proceed.

2-3. Fix /sc:analyze Findings

Prioritize HIGH severity and above. Only fix MEDIUM and below if safe and low-risk.

Use context7 MCP to verify correct patterns for Hono / Drizzle / Zod before applying fixes.

2-4. Commit

git add -A
git commit -m "fix: resolve N QA issues [round $ROUND_NUM]"

Update issues-round-N.md to reflect fixed issues (set status to fixed).

2-5. Post-fix Regression Check (MANDATORY)

After committing Phase 2 fixes, run the full test suite to verify fixes did not introduce new failures:

cd api && timeout --kill-after=10s ${TEST_TIMEOUT:-120}s npm test 2>&1 | tee /tmp/postfix-output.log
POSTFIX_EXIT=$?

All tests pass: Proceed to Phase 3.
New failures appear (failures that were NOT in Phase 1 issue list): Trigger abort condition #4. Revert the Phase 2 commit:
```
git revert --no-edit HEAD
```
Log: "Phase 2 fix caused regression. Reverted. Round $ROUND_NUM marked as failed." Skip to Phase 5.
Same failures as Phase 1 remain: Log as "partially fixed" and proceed to Phase 3.

Phase 3: Refactor (Quality Improvement)

Skip this phase if {{dry-run}} is true.

Pre-condition gate (MANDATORY): Run the full test suite:

cd api && timeout --kill-after=10s ${TEST_TIMEOUT:-120}s npm test 2>&1

If ANY test fails:

Attempt fix (same procedure as Phase 2-2, maximum 2 attempts).
If fixed: commit as fix: repair failing tests before refactor [round $ROUND_NUM]
Re-run full test suite to confirm ALL tests pass.
If still failing after 2 attempts: skip Phase 3 entirely, proceed to Phase 5. Log: "Refactoring skipped — tests not stable."

Check refactor blocklist: Load .improvement-state/refactor-blocklist.json. Skip any file/strategy combination that previously caused a revert.

Use /sc:cleanup for refactoring:

/sc:cleanup "{target file path} — strategy: {refactoring strategy}"

Use serena MCP alongside: Before refactoring, query serena to check:

All modules that reference the target file (impact analysis)
All callers of the target function
Whether the change would introduce circular dependencies

Use serena to verify test coverage: Before refactoring a file, check if it has corresponding test files. If test coverage is low, skip this refactoring candidate.

Also refer to references/refactor-patterns.md for safe refactoring patterns.

Maximum 3 refactorings per round (configurable via refactor.max_per_round in .improvement-config.json).

Refactoring candidate selection criteria:

Files where Phase 1 found issues
Files exceeding 300 lines
Functions exceeding 50 lines
Complex conditionals (nesting 3+ levels)
Duplicate code
Exclude files in refactor-blocklist

Commit each refactoring individually:

git add -A
git commit -m "refactor: {specific description} [round $ROUND_NUM]"

Phase 4: E2E Safety Check

Only run if refactoring was performed in Phase 3.

Run both unit tests and E2E tests. Both MUST pass for the round to be considered safe.

# Unit tests
cd api && timeout --kill-after=10s ${TEST_TIMEOUT:-120}s npm test 2>&1 | tee /tmp/safety-unit-output.log
SAFETY_UNIT_EXIT=$?

# E2E tests
cd $(git rev-parse --show-toplevel) && timeout --kill-after=30s ${E2E_TIMEOUT:-300}s make test-e2e 2>&1 | tee /tmp/safety-e2e-output.log
SAFETY_E2E_EXIT=$?

Run the unit test count regression check (current total >= BASELINE_UNIT_TEST_COUNT).

On both tests passing (SAFETY_UNIT_EXIT=0 AND SAFETY_E2E_EXIT=0)

Proceed to Phase 5. Reset CONSECUTIVE_REVERT_COUNT to 0.

On either test failing — Auto-Revert

If E2E passed at baseline but fails now, the refactoring broke end-to-end behavior — auto-revert is required even if unit tests pass.

On test failure — Auto-Revert

Identify and revert Phase 3 refactoring commits using stable SHAs:

# Collect refactor commit SHAs (newest first)
REFACTOR_SHAS=$(git log --oneline "savepoint-round-$ROUND_NUM"..HEAD | grep "refactor:.*\[round $ROUND_NUM\]" | awk '{print $1}')

# Revert each (newest first)
for SHA in $REFACTOR_SHAS; do
  if ! git revert --no-edit "$SHA" 2>&1; then
    echo "ERROR: Revert conflict on $SHA. Aborting loop."
    git revert --abort
    # Trigger abort condition #1 (git conflict)
    exit 1
  fi
done

After revert, re-run tests:

cd api && timeout --kill-after=10s ${TEST_TIMEOUT:-120}s npm test 2>&1

Tests pass: Log "Refactoring reverted, tests recovered." Add the refactored file + strategy to .improvement-state/refactor-blocklist.json. Increment CONSECUTIVE_REVERT_COUNT.
Tests still fail: Phase 2 fixes also have problems. Revert to the savepoint:
```
git revert --no-edit "savepoint-round-$ROUND_NUM"..HEAD
```
Log: "Full round revert. Both fixes and refactoring reverted."

Consecutive revert check: If CONSECUTIVE_REVERT_COUNT >= 2, trigger abort condition #5.

Phase 5: Reflection (Record Results)

Use /sc:reflect for a structured retrospective:

/sc:reflect "Improvement Loop Round $ROUND_NUM retrospective:
- QA found X issues (sources: lint, typecheck, vitest, e2e, sc:analyze, serena)
- Fixed Y issues (auto-fix: p, troubleshoot: q)
- Refactored Z files (reverted: W)
- E2E Safety: PASSED/REVERTED/SKIPPED
Analyze what went well, what didn't, and patterns to watch."

Append output to .improvement-state/reflection-log.md:

## Round N - YYYY-MM-DD HH:MM

| Phase | Result |
|-------|--------|
| QA | X issues found (H:a, M:b, L:c) |
| Fix | Y/X issues fixed (auto: p, troubleshoot: q) |
| Post-fix regression | PASSED / REVERTED |
| Refactor | Z refactorings applied / SKIPPED |
| E2E Safety | PASSED / REVERTED / SKIPPED |

### Tool Usage
- serena: {what dependency analysis revealed}
- context7: {which docs were referenced}
- sequential-thinking: {which problems used multi-step reasoning}

### Modified Files
- `path/to/file1.ts` - summary of changes

### Observations
(1-3 sentence summary from /sc:reflect analysis)

---

Also append a summary line to .improvement-state/run.log.

Phase 6: Self-Learning (Improve the Improvement Process)

Run ONLY after the final round (not during intermediate rounds).

Use sequential-thinking MCP for pattern analysis: Analyze all rounds in .improvement-state/reflection-log.md:

Step 1: Organize trends in issue count, fix count, and revert rate across rounds
Step 2: Identify recurring issue category patterns
Step 3: Analyze correlation between tool usage and fix success rate
Step 4: Generate prioritized improvement suggestions

Use tavily MCP to research best practices: For recurring problem categories, search for latest best practices:

e.g., "vitest mock lifecycle best practices 2025"
e.g., "hono error handling patterns"

Save output to .improvement-state/self-learning-suggestions.md:

# Self-Learning Suggestions

Generated: YYYY-MM-DD HH:MM
Rounds analyzed: 1-N

## Suggestions

### [IMPACT: HIGH/MEDIUM/LOW] Suggestion title
- **Current**: Current setting/behavior
- **Proposed**: Proposed change
- **Rationale**: Evidence-based reasoning
- **Reference**: Best practice URL found via tavily (if any)
- **Action**: Which parameter in `.improvement-config.json` to change

These suggestions are NOT auto-applied. The user reviews and manually updates the config.

Phase 7: Finalize

After all rounds complete (or early termination):

Clean up Docker containers from testcontainers:

docker ps --filter "label=org.testcontainers=true" --format "{{.ID}}" | xargs -r docker stop 2>/dev/null
docker ps -a --filter "label=org.testcontainers=true" --format "{{.ID}}" | xargs -r docker rm 2>/dev/null

Output a final summary:

=== Improvement Loop Summary ===
Rounds completed: X / {{rounds}}
Total issues found: Y
Total issues fixed: Z
Refactorings applied: W (reverted: V)
Tools used: serena, context7, sequential-thinking, tavily, playwright
Branch: $BRANCH

=== Changes Summary ===
{output of: git log main..$BRANCH --oneline}
{output of: git diff main...$BRANCH --stat}

[MANUAL GATE] Ask the user whether to push the branch:
```
Push $BRANCH to origin? The summary above shows all changes.
```
- If confirmed: git push -u origin "$BRANCH". Check exit code. If push fails, report error.
- If not confirmed: Keep branch local.
Suggest creating a PR (do not auto-create).

Error Handling

Situation	Action
Docker not running (testcontainers fail)	Skip Vitest and E2E, continue QA with lint + sc:analyze only
`make test-e2e` infrastructure failure	Log warning, continue without E2E. Do NOT file test issues
Biome not found	Skip lint if `npx biome` is unavailable
Git conflict	ABORT the loop (abort condition #1)
Test timeout (exit code 124)	Treat as HIGH-severity issue. If 3+ timeouts in one run, ABORT
0 issues in all rounds	Report "codebase is in good shape"
MCP server not connected	Log warning, continue without that MCP
/sc: command not installed	Log warning, continue without SuperClaude
Disk space < 500MB	ABORT (abort condition #7)
Test count decreased	ABORT — potential test file deletion