bench-triage

name: bench-triage metadata: internal: true description: > Triage one nu-only fixture from tests/external/snapshots/diff/nu-only.json by reading the spec, then drive its verdict to match-error, match-clean, or nu-over by either fixing markuplint or recording an excluded-ids.json entry. The core operation of the nu-validator coverage benchmark. Use when reducing the nu-only backlog, when checking a coverage-claim ("markuplint misses X" / "over-detects Y") against the bench, or when classifying a specific fixture. Trigger keywords: nu-only, ml-only, coverage gap, bench triage, verdict, match-error, match-clean, nu-over, excluded-ids, declare nu over-detection, claim audit, audit fixture, reduce nu-only, mark-up valid per spec, spec-cited exclusion.

nu-validator Bench Triage Skill

Take one nu-only fixture and drive its verdict to a confirmed state. Repeat to reduce the nu-only backlog.

Prerequisite: the bench must be runnable on this machine. If commands in this skill fail with "no snapshots found" / Docker errors, run the bench-setup skill first.

Verdict definitions

Verdict	Meaning
`match-error`	Both tools detected a violation.
`match-clean`	Neither detected a violation (and no nu errors were excluded).
`ml-only`	Only markuplint detected.
`nu-only`	Only nu-validator detected, and `excluded-ids.json` does not cover the messages.
`nu-over`	Only nu-validator detected, but every message is covered by `excluded-ids.json`.

nu-only is what this skill drives. ml-only is informational and not this skill's target — but if you need to understand it, see "Note: ml-only readings" at the end.

Step 1: Pick a fixture

Slice tests/external/snapshots/diff/nu-only.json (entries[] with category and path) by path or category. When auditing a coverage claim instead, slice coverage.json by the claim's pattern and read each entry's verdict.

Step 2: Read nu-validator messages for the fixture

Read tests/external/snapshots/nu-validator/<path>.json (nuValidator.messages[]). The raw tree is gitignored — regenerate with yarn bench:update --target nu if missing. Each message has a stable id (nv-<hex12>, optionally -N on collisions); that's the key for excluded-ids.json.

Step 3: Read markuplint output for the fixture

Read tests/external/snapshots/markuplint/<path>.json (markuplint.violations[]). For a nu-only fixture, expect zero violations here. If markuplint already detected something, the verdict computation may be stale — re-run yarn bench:compare.

Step 4: Read the spec

Open the raw HTML at tests/external/validator/tests/<path> and identify the relevant spec paragraph. Authoritative sources:

HTML LS — https://html.spec.whatwg.org/multipage/
DOM LS — https://dom.spec.whatwg.org/
URL LS — https://url.spec.whatwg.org/
WAI-ARIA 1.3 — https://www.w3.org/TR/wai-aria-1.3/
ARIA in HTML — https://w3c.github.io/html-aria/
Microdata (HTML LS §5.7) — https://html.spec.whatwg.org/multipage/microdata.html

MDN is not authoritative — quote WHATWG / W3C when they disagree. Living standards change; recent normative revisions often explain why nu (slow) and markuplint (tracks @markuplint/html-spec) drift.

Quote the exact sentence verbatim into the issue / PR / excluded-ids.json#reason — never a paraphrase.

Step 5: Decide and act

For a nu-only fixture, the spec verdict gives a binary action:

Spec on the markup	Conclusion	Action
Forbidden (HTML LS / ARIA / URL LS)	nu correct, markuplint has a coverage gap.	Add or extend a markuplint rule. Open an Issue if the work is non-trivial. After fix, `yarn bench:update:ml` — fixture should flip to `match-error`.
Forbidden, but spec is outside markuplint's reference scope (e.g. WICG draft, vendor extension)	nu is enforcing a spec that markuplint deliberately does not track. Open an Issue for future coverage AND record the messages in `excluded-ids.json` so the bench can focus on actionable HTML LS gaps.	Issue + `excluded-ids.json` pattern. Reason field must explicitly note `deferred-WICG / deferred-<spec>` so future readers can distinguish from regular nu-over. Tracking Issue # MUST be in the reason.
Permitted by HTML LS	nu over-detecting.	Record in `excluded-ids.json` (per-ID or pattern; see below). After edit, `yarn bench:compare` — fixture should flip to `nu-over`.
Ambiguous / under discussion	Spec issue or PR ongoing.	Note the spec-tracker URL in `snapshots/diff/summary.md` follow-up. Do not silently close.

markuplint's reference scope is HTML Living Standard + WAI-ARIA + URL Living Standard. Anything nu enforces from a WICG draft, a vendor extension, or any other spec outside that set is treated as deferred coverage — eligible for excluded-ids.json only if an Issue tracks the future implementation.

When the spec disagrees with both tools (recent normative revision neither has adopted), open one Issue per tool but pursue only the markuplint side from this repo — nu upstream reports are not part of this project's workflow.

How to record nu over-detection

Follow the existing entry shapes in excluded-ids.json (per-id entries[] keyed by the nu message id; message-substring patterns[] keyed by messageContains). Every entry needs a reason containing the verbatim spec quote, plus addedAt / addedBy.

The verdict flips to nu-over only when every active nu message on the fixture is covered. Partial coverage stays nu-only.

When the same diagnostic hits many fixtures, use patterns[] instead of dozens of per-id entries. specUrl is required on patterns — they are the most load-bearing exclusion. If you cannot cite a paragraph, use a per-id entry.

Patterns trade compactness for stability: per-id entries pin the nu message-ID hash, so a wording shift in nu surfaces as a stale entry on the next bench refresh (the entry stops matching and the fixture reappears in nu-only). Patterns key on message text, so a wording shift silently drops them out of effect. For deferred-spec batches (10+ fixtures driven by an Issue), prefer patterns but record the expected nu-over headcount in the reason field so pre-release bench refreshes can spot drift.

After editing excluded-ids.json:

yarn bench:compare
yarn bench:generate-spec
yarn bench:report

Step 6: Pin against `--concurrency 1` before filing

nu-validator is non-deterministic under parallel load. Before landing a coverage Issue or an excluded-ids.json entry, confirm the verdict survives a deterministic run:

yarn bench:update --target nu --concurrency 1 --filter '<the/fixture>'
yarn bench:compare

If the verdict flipped, the original observation was parallel-run flicker, not a real signal.

Step 7: Fact-check the Issue body before filing

When the verdict points at "open or extend an Issue" and the Issue body cites specific repository assets — file paths, package names, spec data files, helper libraries — every reference MUST be verified to exist in the current tree before the Issue is filed. Implementers read the Issue first; a wrong path sends them to a dead end.

Required pre-filing checks:

File paths: every quoted path resolves (ls <path> or open in editor).
"Add new file" claims: confirm the file is actually missing (find packages/... -name '<pattern>'). If a file with the same role already exists, change the wording to "extend" instead of "add" and list the existing files explicitly.
Recommended npm libraries: package exists and is currently maintained (npm view <pkg> or check the npm/registry page). Do not write (or similar) placeholders.
Spec section numbers: dereference the cited URL once before pasting; section numbers shift between drafts.
bench-xref registration: when the Issue is primary (i.e., bench fixtures back its claim), add a mapping in tests/external/bench/issue-xref.config.ts so bench-xref keeps the body in sync on each release-prep cycle.

Skipping any of these is the same failure mode as filing without a spec quote: it pollutes the inventory with stale or false references that other agents and humans will then act on. Treat it as a hard gate, not a polish step.

Audit log of message-substring decisions

Each row is a conclusion reached by reading the cited paragraph directly. Do not add a row without a verbatim spec quote and source URL.

Message substring	Verdict	Source
`Fragment is not allowed for data: URIs according to RFC 2397`	nu over-detection — excluded in `patterns[]`	URL LS §4.3: a `valid URL string` may end in a fragment for any scheme.
`must be less than or equal to` (meter / progress / input min/max)	nu correct — NOT excluded	HTML LS §4.10.14: "minimum ≤ value ≤ maximum; minimum ≤ low ≤ maximum (if low is specified); …" — explicit `must`.
`URL includes credentials`	nu correct — NOT excluded	URL LS §1.1 `invalid-credentials`. HTML LS requires a valid URL string, so a URL validation error is a conformance error.
`Expected a slash` (special-scheme URLs missing `//`)	nu correct — NOT excluded	URL LS `special-scheme-missing-following-solidus`.
`Backslash used as path segment delimiter`	nu correct — NOT excluded	URL LS `invalid-reverse-solidus`.
`Illegal character in …` (path / fragment / domain / port)	nu correct — NOT excluded	URL LS `invalid-URL-unit` covers non-URL code points and malformed percent-encoding.
`Windows drive letter uses …`	nu correct — NOT excluded	URL LS `file-invalid-Windows-drive-letter` / `file-invalid-Windows-drive-letter-host`.
`Expected a space character` / `Expected an unquoted URL` (`<meta http-equiv="refresh">` content)	nu over-detection — excluded per-ID in `entries[]`	HTML LS §4.2.5.3 Refresh grammar: clause 3.2 makes whitespace after `;`/`,` optional; clause 3.3 alt 2 accepts any valid URL. nu's wording overlaps with legitimate refresh errors, so substring-match is unsafe — per-ID.
`<script type=importmap>` scope key that fails a "looks-like-URL" check (e.g. `scope1_not_url`)	nu over-detection — excluded per-ID in `entries[]`	HTML LS § Sorting and normalizing scopes step 2: scopePrefix is URL-parsed with baseURL. Relative strings parse successfully against any base, so step 3's "URL parse failure" warning never fires. nu requires the key to look URL-like (scheme or `/`/`./`/`../`); spec doesn't.
`<script type=module … defer>` or any non-external script with `blocking`	nu correct — markuplint coverage extended in `spec.script.jsonc`	HTML LS §4.12.1 attribute applicability table: `defer` is "Yes" only for external classic; `blocking` is "Yes" only for external classic + external module. Other script kinds (any module + defer, inline scripts + blocking, importmap, speculation rules, data block) are "·" (not applicable). markuplint now flags these via `invalid-attr` instead of relying on `ineffective-attr`'s warning.
`<script>` with `crossorigin`/`referrerpolicy`/`fetchpriority`/`src`/`nomodule` on importmap / speculationrules / data block, or `fetchpriority` on inline scripts	nu correct — markuplint coverage extended in `spec.script.jsonc`	HTML LS §4.12.1: "Which other attributes may be specified on a given script element is determined by the following table" — the table permits `crossorigin`/`referrerpolicy` only for classic + module scripts (external or inline), `fetchpriority`/`integrity`/`blocking` only for external classic + external module, `nomodule` only for classic; `src` "must only be specified for classic scripts and JavaScript module scripts". Classic-script detection enumerates the 16 JavaScript MIME type essence strings (mimesniff) plus omitted/empty `type`, because "data block" (any other `type` value) is not expressible as a finite negative selector list. The old `:not([type='importmap' i])`-style conditions could not catch data blocks. The global-attr override that #3648 reverted is safe now: the per-element merge in `ml-spec` `get-attr-specs-spec.ts` (`{...current, ...attr}`) preserves the enum type when the element entry specifies only `condition` — pinned by tests `invalid-attr-issue-3631-032`/`-033`.
`<source srcset="…w">` inside `<picture>` without a `sizes` attribute (and no lazy fallback)	nu correct — markuplint coverage extended in `srcset-sizes-constraint` Check 5b	HTML LS § source: with width descriptors, `sizes` "may" be present but must be present unless the following sibling `<img>` supports auto-sizes (`loading="lazy"`). Previously the rule's Check 5 only handled `<img>`.
`<img srcset="http: 1x">` and similar URL-LS-invalid candidate URLs	nu correct — markuplint coverage extended in `@markuplint/types` Srcset	URL LS rejects bare special-scheme fragments missing `//` (`special-scheme-missing-following-solidus`). The Srcset checker now parses each candidate's URL via WHATWG URL with a dummy `https://example.com/` base.
`sizes="-1px"` / `sizes="(min-width: 600px) -100px"` and similar negative `<source-size-value>`	nu correct — markuplint coverage extended in `@markuplint/types` SourceSizeList	HTML LS § sizes: `<source-size-value>` must be a non-negative `<length>`. css-tree's `<length>` grammar accepts negatives, so a post-syntax regex catches them at boundaries (start-of-list, after `,`, after the `)` closing a `<media-condition>`).
`A "source" element that has a following sibling "source" element or "img" element with a "srcset" attribute must have a "media" attribute and/or "type" attribute` / `Value of "media" attribute here must not be "all"`	nu correct — markuplint coverage extended in `srcset-sizes-constraint` Check 6	HTML LS § the source element: "When a source element has a following sibling source element or img element with a srcset attribute specified, it must have at least one of the following: A media attribute specified with a value that, after stripping leading and trailing ASCII whitespace, is not the empty string and is not an ASCII case-insensitive match for the string 'all'. A type attribute specified." An always-matching first source shadows the following candidates. Applies even to a srcset-less source. Flipped 7 `picture/always-matching-*-novalid` fixtures nu-only → match-error.

The remaining nu-only bulk (URL parsing) is not for exclusion; it represents real markuplint gaps for future coverage work. Any substring not in the table is unclassified — do not exclude without first adding a row with a spec quote.

Note: ml-only readings (informational)

When you encounter ml-only while triaging, classify by both spec verdict and markuplint rule intent:

Rule intends strict spec-conformance + spec forbids the markup → markuplint correct, nu lax. Informational only (no upstream nu reports from this repo).
Rule intends strict spec-conformance + spec permits the markup → markuplint false positive. Fix the rule.
Rule intends to be stricter than the spec by design (best-practice / anti-pattern, e.g. flagging spec-permitted but discouraged markup) → working as intended. nu just doesn't share the stance. No action.

The bench config (bench/config.ts) curates a rule subset that maps onto nu-validator capability. It is not guaranteed to be strict spec-conformance only; some enabled rules legitimately go beyond the spec letter (e.g. link-types defaults to a narrower rel set than HTML LS registers). Always read the rule's documentation / implementation before classifying an ml-only.

Concurrency caveat

Parallel nu runs flicker on aria-owns and similar fixtures (state shared across requests in nu's runtime). File-level verdict counts stay stable across runs; individual messages do not. Use --concurrency 1 whenever you need a single fixture's output to reproduce reliably.