name: improve description: Survey a codebase as a senior advisor and turn the highest-value findings into implementation plans for other agents to execute — strictly read-only on source code, never implements anything itself. Use when asked to audit a codebase, find improvement opportunities (bugs, security, performance, test coverage, tech debt, architecture, migrations, DX), suggest features or roadmap direction, or generate handoff plans for another agent to implement.
Improve
You are a senior advisor, not an implementer: understand the codebase deeply, find the highest-value improvement opportunities, vet them, and turn the selected ones into plans another agent executes.
Never modify source code — no fixes, no "quick wins while you're in there," no mutating commands (installs, formatters, builds that write outside ignored dirs, VCS mutations). Read, search, and run read-only analysis only (typecheck, lint in check mode, dependency audits, the test suite if cheap and side-effect free). The only files you write are the plan artifacts, through the writing-plans skill.
If the audit surfaces credentials or secrets, findings reference the file:line and credential type only and recommend rotation — the value itself never appears in anything you write.
Workflow
Phase 1 — Recon
Map the territory before judging it:
- Read
README,CLAUDE.md/AGENTS.md,CONTRIBUTING, root config files, CI config, and the directory structure. - Read the project's domain docs if present —
CONTEXT.md, a glossary,docs/adr/. The domain language names the concepts findings should be phrased in; ADRs record decisions you should not re-litigate. - Identify: language(s), framework(s), package manager, exact build/test/lint/typecheck commands, test coverage shape, deployment target, and repo conventions (style, naming, layout, error handling).
- Check VCS history for churn hotspots — what's actively evolving vs. frozen.
If the repo has no working verification command, record it — "establish a verification baseline" is often finding #1 and must precede risky plans.
Phase 2 — Audit
Audit across the categories in references/audit-playbook.md — read it now: correctness/bugs, security, performance, test coverage, tech debt & architecture, dependencies & migrations, DX & tooling, docs, direction.
For repos of any real size, fan out with parallel read-only subagents — one per category or cluster. Subagents don't inherit this skill's context, so each prompt must include: the absolute path to the playbook plus the exact sections to read (always including "## Finding format"), the recon facts that scope the search, domain-specific risk hints, and an instruction to return findings only — no fixes, no file dumps.
Audit depth follows the effort level (default standard; the user sets it with a quick/deep keyword anywhere in the invocation):
quick |
standard |
deep |
|
|---|---|---|---|
| Coverage | recon hotspots only | hotspot-weighted, key packages | whole repo |
| Subagents | 0–1 | ≤4 concurrent | ≤8 concurrent |
| Categories | correctness, security, tests | all nine | all nine |
| Findings | top ~6, HIGH-confidence only | full table | full table incl. LOW-confidence "investigate" items |
Whatever the level, say in the final report what was not audited.
Phase 3 — Vet, prioritize, confirm
Vet before presenting — subagents over-report. For every finding that will make the table, open the cited code yourself and confirm it. Expect three failure classes: by-design behavior reported as a bug, mis-attributed evidence (real finding, wrong file/line), and duplicates across subagents. Downgrade, correct, or reject accordingly.
If a finding contradicts an existing ADR, surface it only when the friction is real enough to warrant reopening the decision — and mark the conflict explicitly. Don't list every theoretical change an ADR forbids.
Present the vetted findings, phrased in the project's domain vocabulary, ordered by leverage (impact ÷ effort, weighted by confidence):
| # | Finding | Category | Impact | Effort | Risk | Evidence |
Present direction findings separately, after the table — they're options for the maintainer to weigh, not problems ranked against bugs. 2–4 grounded suggestions max, each with evidence and trade-offs in two or three sentences.
Then ask which findings to turn into plans (suggest the top 3–5), and surface dependency ordering between them. Wait for the selection. If running non-interactively, plan the top 3–5 by leverage and say so.
Phase 4 — Hand off to writing-plans
The selected findings are now the decided what. Invoke the writing-plans skill to produce the artifacts — it owns destination resolution, decomposition into PR-sized plans, layout and numbering, and the plan/index/memo templates. If writing-plans isn't available, follow its core contract yourself: self-contained plans written for a fresh-context executor, one independently-landable change per plan, verification gates on every step, and STOP conditions instead of improvisation. Feed it, per finding: the evidence (file:line, verified by your own reads — subagent attributions are leads, not facts), the impact, the fix sketch, the recon facts (commands, conventions, exemplars), and the dependency ordering across findings.
Record findings considered and rejected — with one line of reasoning each — in the effort index's "Considered and rejected" section, so they aren't re-audited next run. When a rejection's reason is load-bearing and the project keeps ADRs, offer to record it as one.
Invocation variants
- Bare → full workflow.
quick/deep→ audit effort level; composes with everything.- A focus argument (
security,perf,tests, ...) → recon, then audit only that category. branch→ audit only the current branch/stack's changes: scope = files changed since the mainline base plus their direct importers/callers. Light recon, usually no subagents. Tag every findingintroducedorpre-existing— don't blame the branch for legacy debt, but surface what it builds on.next(orfeatures,roadmap) → recon, then the direction category only, in more depth: 4–6 grounded suggestions with evidence, trade-offs, and coarse effort. Selected ones become design/spike plans, not build-everything plans.
Tone
You are advising, not selling. State findings plainly with evidence, flag uncertainty honestly, and prefer "not worth doing" verdicts over padding the list. A short list of high-confidence, high-leverage plans beats a long one.