name: health-check description: "Run a comprehensive codebase health check — tests, linting, coverage, complexity, and dependency audit. Compare with previous runs to detect regressions." license: MIT compatibility: "Requires git." metadata: author: specrails version: "1.0"
Run a full health check for this project (read from CLAUDE.md or package.json): detect available tools, execute each quality check, compare results against the previous run, detect regressions, compute a health grade, and store a snapshot for future comparison.
Input: $ARGUMENTS — optional flags:
--since <date>— use the report from this date (ISO format: YYYY-MM-DD) as the comparison baseline instead of the most recent--only <checks>— comma-separated subset to run. Valid values:tests,coverage,lint,complexity,deps,perf,static--save— always save the snapshot even when--onlyis used (default: skip save for partial runs)
Phase 0: Argument Parsing
Parse $ARGUMENTS to set runtime variables.
Variables to set:
COMPARE_DATE— string (ISO date) or empty string. Default:""(use most recent report).CHECKS_FILTER— array of check names or the string"all". Default:"all".SAVE_SNAPSHOT— boolean. Default:truewhenCHECKS_FILTER="all",falsefor partial runs unless--saveis present.
Parsing rules:
- Scan
$ARGUMENTSfor--since <date>. If found, setCOMPARE_DATE=<date>. Strip from arguments. - Scan for
--only <checks>. If found:- Split
<checks>on commas to produce an array. - Validate each entry against the allowed set:
tests,coverage,lint,complexity,deps,perf,static. - If any unknown value is found: print
Error: unknown check "<value>". Valid checks: tests, coverage, lint, complexity, deps, perf, staticand stop. - Set
CHECKS_FILTER=<validated-array>. - Set
SAVE_SNAPSHOT=false(partial run — snapshot may be incomplete).
- Split
- Scan for
--save. If found, setSAVE_SNAPSHOT=trueregardless ofCHECKS_FILTER.
Print active configuration:
Running checks: <all | comma-separated list> | Static analysis: <enabled|disabled> | Comparing to: <COMPARE_DATE or "latest">
Phase 1: Toolchain Detection
Detect available tools for each check category. Run all detections simultaneously (in parallel). For each category, try tools in the order listed — use the first one found.
If CHECKS_FILTER is not "all", skip detection for categories not in the filter.
For each category, set two variables:
TOOL_<CHECK>— the tool name or command string (e.g.,"jest","eslint")TOOL_<CHECK>_AVAILABLE— boolean (true/false)
Detection sequences:
tests: Try in order:
jest,vitest,mocha,pytest,go test,cargo test,rspec,dotnet test. If none found, check whetherauto-detect from CLAUDE.md or package.json scriptsprovides a test command — use it as fallback and setTOOL_TESTS="ci-commands".coverage: Try in order:
nyc,c8,pytest-cov,coverage(Python),go test -cover,cargo tarpaulin,lcov.lint: Try in order:
eslint,pylint,flake8,ruff,golangci-lint,rubocop,cargo clippy.complexity: Try in order:
lizard,radon,gocyclo,plato. If none found, setTOOL_COMPLEXITY_AVAILABLE=false— complexity will be estimated from linter output if lint ran.deps: Try in order:
npm audit,pip-audit,govulncheck,cargo audit,bundle audit.perf: Look for a performance entry point at these paths in order:
scripts/perf.shscripts/benchmark.sh- A
"perf"or"benchmark"script key inpackage.json - A
perforbenchmarktarget inMakefile
If found, set
TOOL_PERF=<path-or-command>andTOOL_PERF_AVAILABLE=true. Otherwise setTOOL_PERF_AVAILABLE=false.
Detection summary table (print after all probes complete):
| Category | Available | Tool |
|------------|-----------|-------------------|
| tests | yes/no | <tool or N/A> |
| coverage | yes/no | <tool or N/A> |
| lint | yes/no | <tool or N/A> |
| complexity | yes/no | <tool or N/A> |
| deps | yes/no | <tool or N/A> |
| perf | yes/no | <tool or N/A> |
| static | yes | ai-analysis |
The static check is always available (AI-assisted analysis requires no external tools).
Phase 2: Load Previous Report
Read .claude/health-history/ to find the comparison baseline.
Variables to set:
PREV_REPORT_PATH— absolute file path ornullIS_FIRST_RUN— booleanPREV_REPORT— parsed JSON object ornull
Logic:
Check whether
.claude/health-history/exists and contains.jsonfiles.- If directory is absent or empty: set
IS_FIRST_RUN=true,PREV_REPORT_PATH=null,PREV_REPORT=null. Print:First run — no previous report found. Regression comparison is not available.Proceed.
- If directory is absent or empty: set
If reports exist and
COMPARE_DATEis empty: select the most recently modified.jsonfile.If
COMPARE_DATEis set: find the report whose filename date component is closest toCOMPARE_DATE(without exceeding it). If no report matches within 7 days, print:Warning: no report found near <COMPARE_DATE>. Falling back to most recent.Then use the most recent.Set
IS_FIRST_RUN=false,PREV_REPORT_PATH=<path>, load file content intoPREV_REPORT.Print one line:
- First run:
Baseline: first run (no comparison) - Report found:
Comparing to: <YYYY-MM-DD> (<short-sha from filename>)
- First run:
Phase 3: Run Checks
Run checks sequentially in this order: tests, coverage, lint, complexity, deps, perf. Sequential execution avoids resource contention that would skew timing and coverage metrics.
For each check, follow this pattern:
Skip condition: If TOOL_<CHECK>_AVAILABLE=false OR check is excluded by CHECKS_FILTER:
- Set
RESULT_<CHECK> = { status: "skipped", tool: null, metrics: null } - Print
<check>: SKIPPED - Continue to next check.
Run: Execute the tool with the command shown below. If the tool exits non-zero: set status: "fail", capture the error message, record whatever partial metrics are available, and continue — do NOT abort the command.
Store: Set RESULT_<CHECK> to a structured object with status, tool, and metrics fields.
Check: tests
Run command (first tool that matches):
jest:jest --json 2>/dev/null— parse JSON stdout fornumPassedTests,numFailedTests,numPendingTests,testResults[].durationvitest:vitest run --reporter=json 2>/dev/null— parse JSON for equivalent fieldsmocha:mocha --reporter json 2>/dev/null— parsestatsobjectpytest:pytest --tb=no -q 2>&1— extract pass/fail/skip counts from summary linego test:go test ./... -v 2>&1— count--- PASS,--- FAIL,--- SKIPlinescargo test:cargo test 2>&1— parsetest result:summary linerspec:rspec --format json 2>/dev/null— parse JSONdotnet test:dotnet test --logger "console;verbosity=normal" 2>&1— extract summaryci-commandsfallback: runauto-detect from CLAUDE.md or package.json scriptsand extract pass/fail counts from output using best-effort parsing
Metrics to extract: tests_total, tests_passed, tests_failed, tests_skipped, pass_rate (0.0–100.0), duration_seconds.
Set RESULT_TESTS.
Check: coverage
Run command:
nyc/c8:nyc report --reporter=text-summary 2>/dev/nullorc8 report --reporter=text-summary 2>/dev/null— parse "Statements" or "Lines" coverage percentagepytest-cov: re-run aspytest --cov --cov-report=term-missing -q 2>&1or read.coverageviacoverage report 2>/dev/nullcoverage(Python):coverage report 2>/dev/null— extractTOTALline percentagego test -cover:go test -cover ./... 2>&1— extractcoverage: N.N%from each package, compute meancargo tarpaulin:cargo tarpaulin --out Stdout 2>/dev/null— extract coverage percentagelcov:lcov --summary coverage.info 2>/dev/null— extract lines-found/lines-hit
Metrics to extract: coverage_pct (float), coverage_type ("line" / "branch" / "statement").
Set RESULT_COVERAGE.
Check: lint
Run command:
eslint:eslint . --format json 2>/dev/null— countseverity: 2(errors) andseverity: 1(warnings) across all results; count files analyzedpylint:pylint --output-format json <src-dir-or-.> 2>/dev/null— count messages by type (error,warning)flake8:flake8 --format default . 2>&1— count lines withEprefix (errors) vsWprefix (warnings)ruff:ruff check --output-format json . 2>/dev/null— parse JSON array, count by severitygolangci-lint:golangci-lint run --out-format json 2>/dev/null— count issues by severityrubocop:rubocop --format json 2>/dev/null— parse offenses by severitycargo clippy:cargo clippy --message-format json 2>/dev/null— count"level":"error"and"level":"warning"messages
Metrics to extract: lint_errors, lint_warnings, lint_files_checked. Compute lint_score = max(0, 100 - lint_errors * 5 - lint_warnings * 1).
Set RESULT_LINT.
Check: complexity
If TOOL_COMPLEXITY_AVAILABLE=false:
- If
RESULT_LINTis available (lint ran and has output): setcomplexity_source: "estimated"— use Claude's reasoning to estimate complexity signals from lint output (e.g., complexity-related lint rules fired). Set numeric metrics tonull. - Otherwise: set
complexity_source: "unavailable", all metricsnull,status: "skipped".
Run command (if tool available):
lizard:lizard . --csv 2>/dev/null— parse CSV, compute average CCN, max CCN, count functions with CCN > 10radon:radon cc . -a -j 2>/dev/null— parse JSON, useaverage_complexityfield; count items with complexity > 10gocyclo:gocyclo -over 10 . 2>/dev/nullandgocyclo -avg . 2>/dev/null— extract average and functions exceeding thresholdplato:plato -r -d /tmp/plato-report . 2>/dev/null && cat /tmp/plato-report/report.json— extractsummary.average.maintainability
Metrics to extract: avg_cyclomatic_complexity (float), max_cyclomatic_complexity (int), high_complexity_functions (int, count with CCN > 10), complexity_source ("measured" / "estimated" / "unavailable").
Set RESULT_COMPLEXITY. Status is "measured" when a tool ran, "estimated" when inferred, "skipped" when neither is possible.
Check: deps
Run command:
npm audit:npm audit --json 2>/dev/null— parsevulnerabilitiesobject; count byseverityfieldpip-audit:pip-audit --format json 2>/dev/null— parse JSON array; count byfix_versionspresence and severitygovulncheck:govulncheck ./... 2>&1— extract vulnerability counts from summary; if no JSON flag, use Claude's reasoning on text outputcargo audit:cargo audit --json 2>/dev/null— parsevulnerabilities.list, count byadvisory.severitybundle audit:bundle audit check 2>&1— parse output forInsecure SourceandUnpatched versions; classify by severity from advisory text
Metrics to extract: vuln_critical, vuln_high, vuln_moderate, vuln_low, vuln_total.
Set RESULT_DEPS.
Check: perf
If TOOL_PERF_AVAILABLE=false: set RESULT_PERF = { status: "skipped", tool: null, metrics: null }, print perf: SKIPPED, continue.
Run the detected entry point. After it completes, attempt to parse its stdout for these standard keys:
p50,p50_ms,median_ms→perf_p50_msp95,p95_ms→perf_p95_msp99,p99_ms→perf_p99_ms- Any remaining numeric key-value pairs →
perf_custom
If the script output does not contain recognizable keys, set all latency fields to null and store the raw output in perf_custom.raw.
Metrics to extract: perf_p50_ms, perf_p95_ms, perf_p99_ms, perf_custom.
Set RESULT_PERF.
Phase 3 summary (print after all checks):
tests: <PASS|FAIL|SKIPPED> (<tool>)
coverage: <PASS|FAIL|SKIPPED> (<tool>)
lint: <PASS|FAIL|SKIPPED> (<tool>)
complexity: <MEASURED|ESTIMATED|SKIPPED> (<tool>)
deps: <PASS|FAIL|SKIPPED> (<tool>)
perf: <PASS|FAIL|SKIPPED> (<tool>)
Phase 3.5: Static Code Analysis
Perform static code inspection to detect issues that tools cannot surface: missing documentation, broken imports, and unused exports. These checks are language-aware and use AI-assisted code reading rather than external tooling.
Always exclude the following directories from analysis:
node_modules/,.git/,dist/,build/,vendor/,.claude/,coverage/
For each finding, record a structured object:
{ severity: "critical"|"warning"|"info", check: string, file: string, line: number, description: string, action: string }
Set STATIC_FINDINGS = [] before starting. Append each finding to this list.
Check: documentation
Scan public-facing code units (exported functions, classes, interfaces, modules) for missing documentation comments.
Language-specific rules:
- TypeScript/JavaScript: Identify functions/classes with
exportkeyword. Check whether a JSDoc block (/** ... */) appears immediately above the declaration. Missing JSDoc on exported symbols = finding. - Python: Identify top-level functions, classes, and methods. Check for a docstring (a string literal as the first statement in the body). Missing docstring on public (non-underscore-prefixed) symbols = finding.
- Ruby: Identify public methods and classes. Check for a YARD/RDoc comment (
# @param,##, or any comment block) immediately above the definition. Missing comment on public methods in lib/ = finding. - Go: Identify exported types and functions (capitalized names). Check for a doc comment (line starting with
// <ExportedName>) immediately above. Missing comment on exported symbols = finding.
Severity assignment:
critical: Main entry points (e.g.,index.ts,main.go,app.rb) with no module-level documentationwarning: Exported/public function or class with no documentationinfo: Non-exported function over 20 lines with no documentation
Scan all source files matching the detected language(s). Skip test files (*.test.*, *.spec.*, _test.go, spec/**).
Append each finding to STATIC_FINDINGS.
Check: broken_imports
Detect import/require statements that reference non-existent paths. Focus on local relative imports (skip node_modules, stdlib, installed packages — these are validated by the runtime).
Language-specific rules:
- TypeScript/JavaScript: Scan all
import ... from '...'andrequire('...')statements. For each import path starting with.or/:- Resolve the path relative to the source file.
- Check whether the resolved file exists (try: exact path,
.ts,.tsx,.js,.jsx,/index.ts,/index.js). - If no match found: add a
criticalfinding.
- Python: Scan all
from . import,from .. import, andimport Xwith relative markers. Resolve relative paths from the package root. If the resolved module file does not exist: add acriticalfinding. - Ruby: Scan
require_relative '...'statements. Resolve paths from the source file's directory (try.rbextension if missing). If the resolved path does not exist: add acriticalfinding. - Go: Skip — Go's compiler validates imports; broken imports would fail the build.
Severity: Always critical — a broken import is a runtime error.
Append each finding to STATIC_FINDINGS.
Check: unused_exports
Detect exported symbols that are never imported anywhere else in the codebase. Limit to the project source — do not flag exports intended as a library's public API (i.e., symbols re-exported from a barrel file like index.ts).
Language-specific rules:
- TypeScript/JavaScript:
- Build a list of all exported symbols: scan
export const,export function,export class,export type,export interface,export defaultacross all non-test.ts/.tsx/.jsfiles. - For each exported symbol, search the rest of the codebase for imports of that symbol by name. Exclude the barrel/index file itself.
- If no import found anywhere: add a finding with severity
warning(may be an intentional API export) unless the file is notindex.tsand not undersrc/root — in that case severity isinfo.
- Build a list of all exported symbols: scan
- Python:
- Scan for symbols listed in
__all__or public top-level definitions (non-underscore). - Search for usages across the codebase. If the symbol appears only in its definition file: add a
infofinding.
- Scan for symbols listed in
- Ruby:
- Scan for public constants, classes, and modules defined in
lib/. Search forrequireor direct references elsewhere. If a class/module inlib/is never referenced outside its own file: add awarningfinding.
- Scan for public constants, classes, and modules defined in
Severity: warning for clearly unexported contexts; info when the symbol could be part of a public API.
Append each finding to STATIC_FINDINGS.
Phase 3.5 summary
After all three static checks complete, print:
documentation: <N findings> (<N critical, N warning, N info>)
broken_imports: <N findings> (<N critical>)
unused_exports: <N findings> (<N warning, N info>)
Static findings total: <N>
Phase 4: Build Health Report
Using all RESULT_<CHECK> values and PREV_REPORT (if IS_FIRST_RUN=false), compute the final health report object.
Step 1: Compute per-metric deltas
For each numeric metric, compute delta = current_value - prev_value. If IS_FIRST_RUN=true, set all deltas to "N/A (first run)".
Delta notation convention:
- For metrics where higher is better (pass_rate, coverage_pct, lint_score): positive delta = improvement, negative delta = regression.
- For metrics where lower is better (lint_errors, vuln_*, high_complexity_functions): positive delta = regression, negative delta = improvement.
Step 2: Detect regressions
A regression is triggered when any of the following thresholds is crossed vs. the previous report:
| Check | Threshold |
|---|---|
| tests | pass_rate drops by more than 1% |
| coverage | coverage_pct drops by more than 2 percentage points |
| lint | lint_errors increases vs. previous |
| lint | lint_score drops by more than 5 points |
| complexity | high_complexity_functions increases vs. previous |
| deps | vuln_critical increases vs. previous |
| deps | vuln_high increases vs. previous |
| perf | perf_p50_ms increases by more than 10% vs. previous |
If IS_FIRST_RUN=true: set REGRESSIONS=[] (no regression detection possible on first run).
Build REGRESSIONS as a list of objects: { check, metric, previous, current, delta }.
Step 3: Assign health grade
Evaluate criteria in order from F to A; assign the first grade whose criteria are met:
| Grade | Criteria |
|---|---|
| F | Test suite fails to run (RESULT_TESTS.status = "fail") OR pass_rate < 50% |
| D | Multiple regressions detected (len(REGRESSIONS) >= 2) OR pass_rate < 80% OR vuln_critical > 2 |
| C | One regression detected OR pass_rate 80–89% OR vuln_critical > 0 |
| B | No critical regressions. Any one of: pass_rate 90–94%, OR coverage_pct 70–79%, OR lint_errors 1–5, OR vuln_high <= 2 |
| A | No regressions. pass_rate >= 95%. coverage_pct >= 80% (if measured). lint_errors == 0 (if measured). vuln_critical == 0 AND vuln_high == 0 (if measured). |
When IS_FIRST_RUN=true, regressions cannot be detected — base the grade on absolute metric thresholds only (no regression criteria apply).
When a check is SKIPPED, omit its metric from grade criteria (do not penalize for unavailable tools).
Step 4: Assemble HEALTH_REPORT
Build HEALTH_REPORT as a structured object matching the JSON storage schema exactly:
HEALTH_REPORT = {
schema_version: "1",
project: "this project (read from CLAUDE.md or package.json)",
timestamp: <ISO 8601 current datetime>,
git_sha: <full SHA from `git rev-parse HEAD` or "unknown">,
git_short_sha: <7-char SHA from `git rev-parse --short HEAD` or "unknown">,
git_branch: <branch from `git rev-parse --abbrev-ref HEAD` or "unknown">,
checks: {
tests: { status, tool, metrics: { tests_total, tests_passed, tests_failed, tests_skipped, pass_rate, duration_seconds } },
coverage: { status, tool, metrics: { coverage_pct, coverage_type } },
lint: { status, tool, metrics: { lint_errors, lint_warnings, lint_score, lint_files_checked } },
complexity: { status, tool, metrics: { avg_cyclomatic_complexity, max_cyclomatic_complexity, high_complexity_functions, complexity_source } },
deps: { status, tool, metrics: { vuln_critical, vuln_high, vuln_moderate, vuln_low, vuln_total } },
perf: { status, tool, metrics: { perf_p50_ms, perf_p95_ms, perf_p99_ms, perf_custom } }
},
static_findings: <STATIC_FINDINGS array — each: { severity, check, file, line, description, action }>,
grade: <"A"|"B"|"C"|"D"|"F">,
regressions: <REGRESSIONS array>,
comparison_report: <PREV_REPORT_PATH basename or null>
}
Phase 5: Display Report and Store Snapshot
Action 1: Display
Render the health report to the terminal using Markdown formatting:
## Codebase Health Report — this project (read from CLAUDE.md or package.json)
Date: <ISO date> | Commit: <git_short_sha> | Compared to: <previous report date or "first run">
Overall Grade: <A/B/C/D/F> (<one-line summary>)
### Test Suite [<PASS/FAIL/SKIPPED>]
Tests: N passed, N failed, N skipped (N total)
Pass rate: N% <delta: (+N%) or (-N%) or N/A (first run)>
Duration: Xs
### Code Coverage [<PASS/FAIL/SKIPPED/ESTIMATED>]
Coverage: N% <delta vs previous>
Type: line/branch/statement
### Linting [<PASS/FAIL/SKIPPED>]
Score: N/100 <delta vs previous>
Errors: N Warnings: N
### Complexity [<MEASURED/ESTIMATED/SKIPPED>]
Avg CCN: N Max CCN: N
High-complexity functions: N (>10 CCN) <delta vs previous>
### Dependencies [<PASS/FAIL/SKIPPED>]
Vulnerabilities: N critical, N high, N moderate, N low
### Performance [<PASS/FAIL/SKIPPED>]
p50: Nms p95: Nms p99: Nms <delta vs previous>
---
Regressions detected: N
<if N > 0, list each:>
- <check>: <metric> changed from X to Y (<delta>)
<if N == 0:>
No regressions detected.
---
### Static Analysis Findings [<N total: N critical, N warning, N info>]
<if STATIC_FINDINGS is empty:>
No static findings detected.
<if STATIC_FINDINGS is non-empty:>
| Severity | Check | File | Line | Finding | Action |
|----------|-------|------|------|---------|--------|
<for each finding in STATIC_FINDINGS, sorted by severity (critical first, then warning, then info):>
| <CRITICAL|WARNING|INFO> | <check> | <file> | <line> | <description> | <action> |
<if any critical findings:>
#### Critical Findings (require immediate attention)
<for each critical finding:>
**[CRITICAL] <check>** — `<file>:<line>`
> <description>
> **Action:** <action>
<if any warning findings (limit to top 5 by file breadth — prefer findings spread across more files):>
#### Warnings (recommended fixes)
<for each warning finding, up to 5:>
**[WARNING] <check>** — `<file>:<line>`
> <description>
> **Action:** <action>
<if more than 5 warning findings:>
> ... and N more warnings. Run with `--only static` to see all.
<if any info findings:>
#### Info (low-priority improvements)
Summary: N info-level findings across N files. Run with `--only static` to see full list.
For delta display: wrap positive deltas on error/failure metrics in (+N) to indicate regression; wrap negative deltas on pass-rate/coverage in (-N%) styled as improvement. For terminal rendering, use plain notation — the sign alone conveys direction.
Action 2: Store snapshot
Only store if SAVE_SNAPSHOT=true.
- Determine filename:
<YYYY-MM-DD>-<git_short_sha>.jsonwhere the date is today's ISO date. If git is unavailable, use<YYYY-MM-DD>-unknown.json. - Create
.claude/health-history/if it does not exist (idempotent — no error if already present). - Write
HEALTH_REPORTserialized as JSON to.claude/health-history/<filename>. - Print:
Stored: .claude/health-history/<filename>
Housekeeping notice
After writing (or after checking the directory if SAVE_SNAPSHOT=false), count .json files in .claude/health-history/. If count > 30, print:
Note: .claude/health-history/ has N reports. Consider pruning old ones with:
ls -t .claude/health-history/ | tail -n +31 | xargs -I{} rm .claude/health-history/{}
.gitignore suggestion
Check whether .claude/health-history appears in .gitignore (if .gitignore exists). If it does not appear, print:
Tip: health history reports are local artifacts. Add to .gitignore:
echo '.claude/health-history/' >> .gitignore