name: coverage-report
description: Use when the user asks to run, summarize, compare, or interpret SuperContext KG coverage metrics for a repo or fleet snapshot. Produces a deterministic coverage-run.json + coverage-run.md from the existing CLI and surfaces the smallest set of actionable findings (blocking contract flags, weakest cells, narrow next-PR recommendation). NOT for code-coverage tools like pytest-cov.
KG Coverage Report
Produces consistent reports from KG snapshots using the metrics pipeline that landed in Debate-19. Every report runs deterministic CLIs; no hand-edited numbers.
When to use
Triggers: "run coverage", "coverage report", "show metric snapshot", "what's covered for repo X", "how does the fleet score", "compare snapshot A and B", "what changed since last run", "metric drift", "is M_X partial because of Y".
When NOT to use
- The user asks about test-line-coverage (pytest-cov, jacoco) — different tool, different domain.
- The user asks about the Debate-14 metric design (definitions, weights, schema) — answer from
docs/evaluation/COVERAGE-METRICS-IMPLEMENTATION-PLAN.mddirectly without running the CLI. - The user asks to change a metric formula or add a new metric — that's a debate, not a skill.
Pre-flight (fail fast)
Verify the three artifacts exist before running anything:
test -f source/scripts/coverage_metrics.py # metric CLI lives here
test -f source/scripts/coverage_report.py # report renderer
test -f source/kg/metrics/config.yaml # metric config
If any is missing, stop and tell the user: "Debate-19 metric infrastructure not installed; see docs/evaluation/COVERAGE-METRICS-IMPLEMENTATION-PLAN.md."
Rules
metrics.jsonl,coverage-run.json, andcoverage-run.mdare generated artifacts. Never hand-edit a value or a reason string.- If
metrics.jsonlexists and is newer thanentities.jsonl/facts.jsonl/coverage.jsonl, skip recomputation — re-run only the report step. - For fleet runs always pass
--expected-repos Nwhen N is known (otherwiseM_inventoryfalls back to "what was ingested," which is tautologically 1.0). - For incremental fleet runs (repo added to an existing fleet without re-running
build_multi_kg), check for thelinker_stalecontract flag onM_cross_repo_linkage; if set, the right fix issupercontext-relink, not adding more resolvers.
Standard workflow
Step 1 — Snapshot
Single repo:
python -m source.scripts.build_kg --repo <repo-path> --out <snapshot-dir>
Fleet (one-shot batch build):
python -m source.scripts.build_multi_kg --repo <r1> --repo <r2> ... --out <snapshot-dir>
Fleet (incremental — adding repo N+1 to existing fleet):
python -m source.scripts.build_kg --repo <new-repo> --out <fleet-dir>/<new-repo>
python -m source.scripts.relink --snapshot-dir <fleet-dir> # refresh _fleet/cross_repo_links.jsonl
Step 2 — Compute metrics
python -m source.scripts.coverage_metrics --snapshot <snapshot-dir> --expected-repos <N>
This writes <snapshot-dir>/metrics.jsonl (one record per (repo, dimension)).
For delta against a prior run:
python -m source.scripts.coverage_metrics --compare <snapshot-A> <snapshot-B>
Step 3 — Render report
python -m source.scripts.coverage_report \
--snapshot <snapshot-dir> \
--out docs/evaluation/runs/<run-id> \
--run-id <run-id> \
--tenant <tenant-or-org> \
--expected-repos <N> \
--metric-config source/kg/metrics/config.yaml
Produces docs/evaluation/runs/<run-id>/coverage-run.json + coverage-run.md.
Step 4 — Summarize (the actual judgment step)
Read coverage-run.json directly. Write a chat-message summary that surfaces the smallest actionable set:
- Fleet score (one number)
- Blocking contract flags — any cell with
linker_stale=true, orM_evidence_grounding < 1.0on surfaced facts, orM_silent_gap > 0on safety-critical predicates - Coverage gaps from
coverage_gaps, especiallyunsupported_language,no_adapter_for_known_stack, stale, and partial coverage rows - Lowest-scoring
(repo, dimension)cell with the dominantpartial/n_areason quoted verbatim - Worst metric across the fleet (by value, not by state)
- Recommended next PR — exactly one, derived from the decision tree below
Stay terse. Do not paginate every cell; the JSON file is authoritative.
How to read MetricValue.state
| State | Meaning | Counts toward cell_score? |
|---|---|---|
usable |
Real measurement, value is meaningful | Yes |
partial |
Measured but missing inputs (e.g., one detector implemented out of three) | Yes, but flag reason |
n_a |
Cannot measure this cell at all (e.g., no anchor entities present) | No — sets cell_score = None |
A cell where one metric is n_a produces cell_score: null. That's correct behavior, not a bug — the cell is unscorable until the n_a cause is fixed.
How to read contract flags
| Flag | Meaning | Action |
|---|---|---|
linker_stale on M_cross_repo_linkage |
_fleet/manifest.json predates per-repo snapshots OR repo_commit_sha_set mismatch |
Run supercontext-relink --snapshot-dir <fleet>; do NOT propose adding resolvers |
evidence_grounding_violation on a fact |
Surfaced fact lacks bytes_ref |
File a fact-level fix; do NOT propose lowering metric threshold |
silent_gap on a safety-critical predicate |
M_silent_gap > 0 on blast_radius / deploy_blockers_for input |
Tool refuses if missing scope is relevant; do NOT mark cell unmeasurable |
Decision tree — narrow next-PR recommendation
Apply in order; first match wins. Output one recommendation only.
- Any
linker_stale=true→ "Runsupercontext-relink --snapshot-dir <fleet>against the fleet directory. Re-render." M_evidence_grounding < 1.0on surfaced facts → "Fix per-fact citations in the offending adapter at<adapter-path>; not a metric problem."- Any
coverage_gaps[].reason == "unsupported_language"→ "Add language support or an explicit matcher/refusal rule for<language>; start with the repos incoverage_gapsand the reported sample paths." - Any
coverage_gaps[].reason == "no_adapter_for_known_stack"→ "Add or wire the extractor adapter for<scope_ref.category>/<scope_ref.import_root>in<language>." - Any stale or partial
coverage_gaps[]row → "Fix the producer named bysource_systemso this coverage row becomes fresh and fully instrumented, preserving the reportedscope_ref." M_cross_repo_linkage.state == "partial"with reason mentioningpackage_resolver→ confirm PRs 9–10 (PyPI/npm resolvers) are merged in current branch; if not, point at them.M_extractor_opportunity.state == "partial"for a specific predicate × dim → recommend the next extractor PR perdocs/graph-building/TYPED-CLIENT-EXTRACTOR-ALLOWLIST.mdregistry.M_useful_edge.state == "partial"for a specific dim → checksource/kg/metrics/useful_edges.yamlfor the dim's allowlist; recommend adding the missing predicate adapter.M_dimension_classification.value < 0.8→ unclassified LOC; recommend adding a framework signature tosource/kg/languages/<lang>/dimension_rules.yaml.M_identity_health.value < 0.95→ likely an entity kind without per-kind URN support; recommend extendingurn_for_kindinsource/kg/core/models.py.- Otherwise → "No blocking findings. Lowest cell is
<repo>/<dim>at<score>; root cause is<reason>(non-blocking)."
Output shape — coverage-run.json (excerpt)
{
"run_id": "fleet-2026-05-18",
"tenant": "mercury-ml",
"repo_count_expected": 12,
"repo_count_indexed": 11,
"summary": {"fleet_score": 0.71, "coverage_gap_count": 1},
"coverage_gaps": [
{
"repo": "team-api",
"language": "java",
"predicate": "LANGUAGE_SUPPORT",
"state": "uninstrumented",
"reason": "unsupported_language",
"file_count": 4
}
],
"cells": [
{
"repo": "team-api",
"dimension": "backend",
"cell_score": 0.68,
"contract_flags": ["linker_stale"],
"commit_sha_set": ["abc123..."],
"metrics": {
"M_cross_repo_linkage": {"value": 0.42, "state": "partial", "reason": "package_resolver() returns None for Python"},
...
}
}
]
}
The chat summary should NOT enumerate all per-cell metrics. Quote 2–3 dominant partial/n_a reasons.
Skill is over when
- A
coverage-run.mdexists atdocs/evaluation/runs/<run-id>/, AND - The chat-message summary names exactly one recommended next action (or "no blocking findings"), AND
- All contract flags from the report appear in the summary.
If the user asked for delta (snapshot-A vs snapshot-B), also include: which metrics moved by ≥0.05 and the direction.
Verification (when modifying this skill or its underlying CLI)
python -m compileall -q source
python -m unittest tests.metrics.test_report tests.metrics.test_persistence tests.test_packaging_metadata
python -m unittest discover -s tests
Related
docs/evaluation/COVERAGE-METRICS-IMPLEMENTATION-PLAN.md— Debate-19 contract; 11 metrics;CellMetricsschemadocs/evaluation/COVERAGE-METRICS-INCREMENTAL-AND-LINKING-GAPS.md— whenlinker_staleis the right diagnosissource/kg/metrics/config.yaml— metric weights, freshness windows, contract flag thresholdssource/kg/metrics/useful_edges.yaml— per-dim allowlist drivingM_useful_edgesource/kg/metrics/tool_predicates.yaml— MCP tool → predicate map drivingM_meta_coverage