name: prd-quality-rubric description: "Critic scoring rubric for drafted specs. Defines dimensions D1–D4 (both modes) and D5–D8 (update mode only), per-dimension scoring guidance, weighted aggregation, verdict thresholds (pass/revise/block). Replaces literal-diff-based rubric. Triggers on: score the spec, critic rubric, D1 mandatory coverage, D2 gated appropriateness, verdict thresholds."
PRD Quality Rubric
This skill is the source of truth for @prd-critic's scoring. It
defines the dimensions, scoring guidance per dimension, weighted
aggregation, and verdict rules.
When to Use This Skill
Load this skill when:
- Scoring a drafted spec (
@prd-critic). - Authoring or maintaining the eval
spec.yamlrubric list (the rubric IDs in the eval spec mirror the dimension IDs here).
Domain neutrality
Dimensions are domain-neutral. The critic does NOT penalise the absence of an industry-specific section the user did not request, and does NOT reward the presence of one. User-approved Stop A overrides for section names are honoured at face value (D3).
Dimensions
D1 — mandatory-coverage (both modes)
Question: Are all mandatory sections from the
prd-template catalogue present and non-empty (or marked [TBD]
with a corresponding OQ-NN entry in "Open Questions")?
Scoring: count present / total mandatory; partial credit for
[TBD] entries that have a matching OQ entry.
Common deductions: missing "Open Questions" section entirely;
mandatory section present but empty without [TBD].
D2 — gated-appropriateness (both modes)
Question: For each complexity-gated section, is the
include/omit decision justified by the §heuristic given the inputs
AND the user-chosen spec_kind (see prd-template)?
Scoring:
- Penalise bloat: gated section present without a triggering
axis being detectable in the inputs, OR without
spec_kindpermitting it. - Penalise underspecification: axis present in the inputs AND
spec_kindpermits inclusion, but the section is omitted. - Reward correctly omitted sections with a clear rationale in
section-decisions-json(includinggated-omitted-by-spec-kind). - In
spec_kind: product, do NOT penalise omission of implementation-shaped sections (Data Model, API Contract, Capacity & Performance Targets, Threat Model Summary, Versioning & Deprecation Policy, NFR↔FR Traceability) regardless of axis.
D3 — naming-consistency (both modes)
Question: Do section names match the neutral catalogue, OR a Stop-A-approved override?
Scoring: any drift from the catalogue without an override deducts. Industry-specific labels without explicit Stop A approval deduct heavily.
D4 — content-quality (both modes)
Question: Is the prose clear; are FRs in EARS shape; are acceptance criteria testable and nested under their FR; do NFRs (when present) trace to FRs; is there fabrication; is evidence discipline upheld; is format hygiene upheld?
Scoring: penalise:
- Non-EARS FR statements (multiple
shalls, missing<system> shall, intention-shaped triggers, passive-voice responses). Seespec-driven-prd-best-practices§4a. - Non-testable acceptance criteria; ACs not nested under their FR;
ACs without
AC-<FR>.<n>IDs. - Untraceable claims (no footnote and not derivable from the context pack).
- Vague goals without baseline + target + measurement window.
- Missing P0/P1/P2 priority on FRs.
- Evidence-discipline violations. Apply the following severity
schedule (overrides the default-to-minor rule):
- blocker — any footnote whose URL resolves to a
gitignored / session-local / local-scratch path (per
spec-driven-prd-best-practices§8 and.gitignore). Verdict goes toblockregardless of weighted score. The drafter is required to remove the footnote (not relabel it, not move it to a "References" appendix). - blocker — any footnote pointing at a non-authoritative or non-primary source (e.g. a third-party blog post quoting the standard instead of the standard itself). The drafter must replace it with the primary source or drop it.
- major — a footnote that adds no value (decorative or redundant; a competent reader could act on the section without it). The drafter must drop it; verdict revises.
- major —
S1, S2numeric scheme; "Citations" appendix table present. - minor — formatting issues (bare section reference instead of an anchored link, missing URL on a footnote that is otherwise valid).
- blocker — any footnote whose URL resolves to a
gitignored / session-local / local-scratch path (per
- Format-hygiene violations: hard-wrapped body paragraphs;
bold-as-header; structure carried by bolded lines instead of
###/####headers. - Upper-section isolation violations (per
prd-template§"Per-section isolation contract"):- Solution Summary restates problem narrative, or names owners / metrics / FRs / rollout detail → major per offending section.
- Goals restates the problem, or names owners, or describes solution mechanics → major per offending section.
- Problem Statement preempts the solution ("we will add…"), or names owners, or pre-states goals → major per offending section.
- Ownership / stakeholder content surfacing in any non-Ownership upper section → major.
- Repeat offence across three or more upper sections → escalate one finding to blocker.
D9 — scope-discipline (both modes when spec_kind is product or mixed; null otherwise)
Question: In product / mixed mode, is implementation
content kept out of FRs and confined to a "Technical
Considerations" appendix (mixed) or omitted entirely (product)?
Scoring: penalise:
- Any FR that names an internal component, library, datastore,
framework, language, or specific API in
product/mixedmode. - Inline implementation content next to FRs in
mixedmode (it belongs in "Technical Considerations"). - Boilerplate "implementation is out of scope" / "technical design choices are out of scope" entries in Out of Scope.
In spec_kind: technical, D9 is reported as null (not 0) — the
dimension does not apply.
D5 — changelog-completeness (publish-transition turns only)
Question: When the spec transitioned Status: draft → published
this turn (V8 — version-bump-json.kind ∈ {publish-initial, publish-redraft}), does CHANGELOG.md exist, use the
Keep-a-Changelog categories, and account for every change visible
between prior and revised?
Scope: D5 applies only on publish-transition turns. On any
turn where Status: draft at end-of-turn (initial draft OR
re-draft window edit without publish intent), D5 is reported as
null, not 0 — drafts produce no CHANGELOG (V3 / V17 / OQ-5).
Scoring: any diff that does not appear in at least one category deducts. Wrong category placement is a minor deduction.
D6 — id-stability (update mode only)
Question: Do all prior requirement IDs (FR-, NFR-, AC-, R-, OQ-, TS-) still resolve in the revised spec?
Scoring: silent deletion of an ID is a blocker finding. Renumbering is a blocker finding.
Sub-rubric d6.removal-by-status (severity: blocker; defined
by prd-evolution §3 + versioning-discipline §V9):
- In
Status: draft(initial OR re-draft window — applies to any item removed regardless of whether it was newly added in-window or carried over from a prior published version): a removed item that left a[Deprecated]/[Removed]stub heading, or any prose narrating the removal ("formerly FR-07…", "removed in this revision…"), in the draft body is a blocker finding. Drafts delete cleanly perprd-evolution§3a / §3b (interpretation (a)). - In
Status: draft(any flavour): a removed in-window FR/NFR/R/ OQ/TS that did NOT trigger renumbering of successors (leaving a numbering gap) is a blocker finding. Prior-published IDs are NOT renumbered (V9 over the ID is permanent) — gaps in the prior-published number space are expected and not a finding. - In
Status: draft(re-draft window): a removed prior-published ID for whichstate.json:pending_published_id_deletionsdoes NOT contain a matching entry is a blocker finding. The drafter must queue the deletion for publish-time re-materialisation; silent drops break V9 at the next publish. - In
Status: publishedend-of-turn (publish-transition turns only): a deleted prior-published item with no[Deprecated]/[Removed]stub heading in the published artefact is a blocker finding (V9 published-contract violation). At publish, every entry that was inpending_published_id_deletionsMUST have been re-materialised in the published artefact body (V8 step 4a). - In
Status: publishedend-of-turn: a renumbered prior-published item is a blocker finding (V9 numbering immutability).
D7 — versioning-correctness (update mode only)
Question: Does the version bump match the prd-evolution
rule (MAJOR/MINOR/PATCH)? Is the Updates: / Obsoletes: header
present?
Scoring: wrong bump or missing header is a major finding.
Sub-rubric d7.draft-no-change-tracking (severity: blocker;
defined by prd-evolution §5):
Applies in any turn where the spec ends Status: draft (initial
draft OR re-draft window with no publish intent). The drafted
spec body MUST NOT contain ANY of:
- A
## Changes sinceheading (any version suffix or "since vN" variant). - A
## Revision History,## Changelog, or## What's Changedheading. - An inline
[Changed in v...]marker (anywhere — heading, list item, or paragraph). - An HTML / markdown comment narrating what changed
(
<!-- changed: ... -->,<!-- added in v... -->). - Prose narration of revision history ("Added in this revision", "Removed since v1.0", inline parenthetical "(updated)" tags).
- A
CHANGELOG.mdmutation this turn (overlap withd7.publish-changelog— re-stated here so the rubric is self-contained when the critic runs against a draft turn).
Any single occurrence is a blocker finding. The verdict
escalates to block regardless of weighted score. The fix is
to delete the offending construct (not move it, not rename it).
D8 — section-stability (update mode only)
Question: No silent renumbering. No removed gated section without a changelog rationale.
Scoring: silently removed gated sections deduct heavily.
D10 — edit-minimalism (update mode only)
Question: Did the drafter make the smallest set of edits that fully reflects (a) the user's stated feedback, (b) any Stop-A-approved missing context, and (c) the ID-stability mechanics those edits trigger — and NOTHING else?
Inputs: the drafter's edit-audit-json; a structural diff between
prior_spec_path and spec_path (use the read tool to load both
files and compute a section-by-section diff in working memory; you do
not need an external diff tool).
Scoring (start at 1.0; deduct):
- For each modified or reordered span present in the diff but NOT
listed in
edit-audit-json: deduct 0.2 and emit amajorfinding. An edit the drafter cannot account for is by definition unjustified. - For each edit whose recorded
reasondoes not survive scrutiny — e.g.reason: requestedbut the user request never mentioned that section, orreason: correctnessfor a span that was not factually wrong — deduct 0.2 and emit amajorfinding. - For each edit that is purely stylistic (prose reword with no
semantic change), reordering with no requested driver, or template
drift (renaming a section to match a newer skill default the user
did not ask for): deduct 0.3 and emit a
majorfinding. If three or more such edits are present, escalate ONE finding toblocker. - If the drafter omitted
edit-audit-jsonentirely in update mode: emit a singleblockerfinding and score D10 = 0. - Upper-section edit ratchet. Any modified or reordered span
whose location is in Document Information (excluding version
mechanics), Problem Statement, Goals & Success Metrics, Users
& Personas, Stakeholders & Reviewers, or Solution Summary
carries a 2× weight against D10. An upper-section edit whose
justificationdoes not quote / directly paraphrase a user request: deduct 0.4 (not 0.2) and emit amajorfinding. Two or more such edits in one turn: escalate ONE finding toblocker. Rationale: the user explicitly named upper-section stability as a hard discipline; D10's standard schedule under-weights it. - Per-statement scoring. D10 evaluates sentence-level diffs,
not span-level diffs. For each sentence that differs between
prior_spec_pathandspec_path:- If listed as a sentence-level entry in
edit-audit-jsonwith a surviving reason → no deduction. - If not listed, OR listed only at paragraph/section
granularity → deduct 0.2 and emit
major. - If the only difference is whitespace, punctuation,
capitalisation, grammar, or stylistic word-substitution
(i.e. the kind of change §0's per-statement gate routes to
REVERT) → deduct 0.3 and emit
major. Three or more such sentences → escalate ONE finding toblocker.
- If listed as a sentence-level entry in
- Default disposition in findings. D10 findings whose
recommended
fixwould normally be "drop / re-justify the edit" MUST instead read "revert the sentence to the prior wording at<prior_spec_path>:<line>". The fix wording matters: it tells the next drafter pass that REVERT — not re-edit — is the required remediation.
Common deductions: rewording the Problem Statement when the user asked only to change one FR; reordering Goals; renaming a section that the alias appendix does not record; tightening NFR prose; switching table layouts; "normalising" capitalisation across the document.
Out of scope for D10: ADD-only edits that the user requested (e.g. "add an FR for X") — those are the legitimate change. D10 polices edits to the prior content, not new content.
Weighted aggregation
Equal-weight mean of applicable dimensions only. Dimensions not
applicable in the current mode are reported as null, not 0,
in scores-json. D10 follows the same null-not-zero rule as D5–D8:
it is included in the weighted mean only when mode == update.
weighted = mean(dim for dim in [D1..D10] if dim is not null)
Verdict rules
- block — any
blockerfinding (e.g. silent ID deletion, fabricated facts, mandatory section missing AND no OQ entry). - revise —
weighted < 0.7OR anymajorfinding. - pass — otherwise.
Severity guidance
| Severity | When to use |
|---|---|
| blocker | The spec must not ship as-is. Examples: silent ID deletion in update mode; mandatory section missing without OQ; fabricated facts; any footnote citing a gitignored / non-authoritative / non-primary source (per spec-driven-prd-best-practices §8). |
| major | Significant quality issue that should block this draft. Examples: NFR section present without any traceable axis; non-testable AC across most requirements; wrong version bump kind. |
| minor | Polish issue. Examples: a single non-testable AC; missing citation on one claim; naming drift on one heading. |
Threshold tuning
The 0.7 threshold for revise is tunable. Consumers who want to
lower the bar for first drafts can raise it; consumers in
high-trust pipelines can lower it. Adjust here, not in the
.agent.md of @prd-critic.
Quality checklist (for the critic itself)
- Every applicable dimension scored in
[0, 1]. - Inapplicable dimensions reported as
null. -
weightedmatches the formula above. - Verdict matches the rules above.
- Findings sorted by severity (blocker > major > minor).
- Every finding has
dimension,severity,issue,fix.