name: certify-skill version: 1.0.1 description: This skill should be used when the user asks to "certify a skill", "validate this skill", "is this skill ready", "check skill quality", "grade this skill", or "run quality gate". Runs every quality check on a skill directory — structural validation, content audit, systemic coherence, and 10-criterion rubric scoring — then produces a certification report with pass/fail per checkpoint and a final certification level (CERTIFIED / CONDITIONAL / BLOCKED). Use this skill after running surgeon-skill to verify improvements, or standalone to assess any skill's production readiness, even if the user just says "is this good enough to ship". [EXPLICIT] argument-hint: "path-to-skill-directory" allowed-tools: - Read - Glob - Grep - Bash model: opus context: fork
Skill Certify
Final quality gate for Claude Code skills. Runs every check and produces a certification report that says exactly what passed, what failed, and what to fix — with no ambiguity about production readiness. [EXPLICIT]
Part of the Skill Quality Suite: x-ray-skill → surgeon-skill → certify-skill (+ trigger-skill, benchmark-skill, assembly-skill). Each skill is standalone. Use assembly-skill to run the full pipeline in one command. [EXPLICIT]
Deterministic Assets
Use these local assets before producing a certification report. [EXPLICIT]
| Path | Use |
|---|---|
assets/certification-phases.json |
Canonical S/F/B/W/C/M check inventory and rubric dimensions |
assets/certification-level-policy.json |
Exact MOAT/CERTIFIED/CONDITIONAL/BLOCKED formulas |
assets/report-contract.json |
Required report sections, fields, statuses, and blocked phrases |
assets/evidence-policy.json |
Accepted evidence tags and evidence requirements |
assets/activation-policy.json |
Activation and false-positive routing rules |
scripts/validate_certification_report.py |
Offline JSON certification report validator |
scripts/check.sh |
Deterministic positive and negative fixture check |
The validator reads only explicit local JSON files. It does not call the network, current time, model providers, MCP tools, or random sources. [EXPLICIT]
Difference from x-ray-skill
x-ray-skill produces a diagnostic for exploration ("what's the state of this skill?"). certify-skill produces a verdict for decision-making ("can I ship this?"). The checks overlap, but the output differs:
| Aspect | x-ray-skill | certify-skill |
|---|---|---|
| Output | Scorecard + gap analysis | Certification report + verdict |
| Tone | Descriptive (this IS the state) | Prescriptive (this PASSES or FAILS) |
| Actionability | "Top 5 issues" | "Fix these N blockers to certify" |
| Use case | Before improvement | After improvement (or standalone quality gate) |
Usage
/certify-skill /path/to/skill-directory
/certify-skill ./my-skill
Parse the argument as the path to a skill directory containing SKILL.md. [EXPLICIT]
When To Activate
Activate when the user asks to certify, validate, grade, or quality-gate a skill directory or explicit skill artifact. [EXPLICIT]
Do not activate for certificate documents, employment certification, legal certification, or generic quality review without a skill directory. [EXPLICIT]
The Certification Process
Read references/certification-checklist.md for the complete checklist with verification methods and the report template. [EXPLICIT]
Phase 1: Structural Validation (automated)
Verify the skill's file structure mechanically. Structural failures block all further evaluation. [EXPLICIT]
ls {path}/SKILL.md # S1: exists?
wc -l {path}/SKILL.md # S2: under 500 lines?
# Parse YAML frontmatter between --- markers # S3-S5
python3 -c 'import pathlib,re,sys; print("\n".join(re.findall(r"`([^`]+\\.(?:md|py|json|html))`", pathlib.Path(sys.argv[1]).read_text())))' {path}/SKILL.md # S6
ls -R {path}/ | grep -v SKILL.md # S7: list all files → check each referenced
python3 -m json.tool {path}/evals/evals.json 2>/dev/null # S8: valid JSON?
grep -r 'reference/\|tools/' {path} # S9: no old singular reference/ or tools/ paths?
If no extractor script is available in the target runtime, parse Markdown code
spans with a portable Python one-liner or manual file inspection; do not use
grep -P, because not every runtime supports it. [EXPLICIT]
Abort condition: If S1 fails (no SKILL.md), report BLOCKED immediately. No further phases.
Record each check as PASS/FAIL with the command output as evidence. 9 structural checks total — see references/certification-checklist.md for full definitions. [EXPLICIT]
Phase 2: Content Validation (judgment required)
Read SKILL.md and evaluate content against 18 checks across 3 categories:
2A: Frontmatter Quality (4 checks: F1-F4)
| Check | What to Look For | Common Failure |
|---|---|---|
| F1: Third person | "This skill should be used when..." | First person ("I analyze...") |
| F2: 3-5 trigger phrases | Quoted phrases in description | Generic description with no triggers |
| F3: Pushy context | "even if they don't explicitly ask..." | Description stops at literal triggers |
| F4: Minimal allowed-tools | Only tools the skill actually uses | All tools listed when only Read+Grep needed |
2B: Body Sections (9 checks: B1-B9)
For each: is it present, substantive (not placeholder), and meets its minimum criteria? A section header with "TBD" scores as MISSING. [EXPLICIT]
| Check | Minimum Criteria | Why It Matters |
|---|---|---|
| B1: Title + value prop | 1-2 sentences answering "why does this exist?" | Without motivation, skill feels arbitrary |
| B2: Usage/activation | 2+ invocation examples | User doesn't know how to trigger it |
| B3: Progressive disclosure | Each ref mapped to load/skip conditions | Loads everything (waste) or nothing (miss) |
| B4: Core process | Actionable instructions, tables > bullets | The skill's purpose — without this, nothing works |
| B5: Assumptions & Limits | 3+ specific limits with handling | Silent failures on edge inputs |
| B6: Edge Cases | 3+ scenarios with handling instructions | Breaks on real-world variation |
| B7: Good vs Bad example | Side-by-side with reasoning | Model can't calibrate quality without reference points |
| B8: Validation Gate | 5+ testable checkboxes | No self-check → garbage passes through |
| B9: Reference Files | Table: file + content + load-when | Claude doesn't know files it wasn't told about |
2C: Writing Quality (5 checks: W1-W5)
| Check | Detection | Threshold |
|---|---|---|
| W1: Imperative form | grep -c 'you should|you can|you need' SKILL.md |
0 occurrences |
| W2: No CAPS emphasis | Grep ALL CAPS words (excluding acronyms like API, JSON) | 0 occurrences |
| W3: Tables for structured data | Count tables vs bullet lists for multi-dim data | Tables >= bullets |
| W4: Code blocks for templates | Output formats in code blocks, not prose | All templates in blocks |
| W5: One concern per section | No section mixing process + examples + edge cases | No multi-topic sections |
Phase 3: Systemic Coherence (multi-file only)
Skip for single-file skills — report N/A. For multi-file skills, run 5 checks:
| Check | Method | Pass Criteria | Severity |
|---|---|---|---|
| C1: Terminology | Grep 5 key terms across files | Zero variants (same concept = same word) | HIGH |
| C2: No duplication | Spot-check 3 reference paragraphs against SKILL.md | No verbatim matches beyond 1-sentence pointers | MEDIUM |
| C3: Evidence taxonomy | Check if all files use the same claim-tagging system | One system everywhere | MEDIUM |
| C4: Schema alignment | Compare SKILL.md output template fields to workflow/eval JSON fields | Field names match exactly | HIGH |
| C5: Lean integration | Reference file pipeline sections are 3-5 lines, not re-explanations | Pointers, not prose | LOW |
Phase 4: Quality Rubric (10 dimensions)
Score each dimension 1-10 using the detailed rubric in references/certification-checklist.md; if an adjacent quality-rubric.md exists in the active workspace, it may be used as a supplemental guide, but it is not required for this skill to certify. [EXPLICIT]
For each dimension, provide:
- Numeric score (1-10)
- One-sentence justification citing a specific finding (not "good quality" — name the evidence)
- If score < 7: specific fix required to reach 7, with estimated effort
Scoring discipline: A score without evidence is invalid. "Clarity: 8" is not a finding. "Clarity: 8 — all terms defined in Glossary section, zero ambiguous pronouns found" is a finding.
Phase 5: MOAT Validation (deterministic)
If Phases 1-4 result in CERTIFIED, run 5 additional deterministic checks from references/certification-checklist.md Phase 5:
| Check | Pass Criteria |
|---|---|
| M1: evals/evals.json exists with >= 5 tests | File present, >= 5 distinct entries |
| M2: false-positive + edge-case evals | >= 1 of each type in evals.json |
| M3: references/ files substantive | All >= 20 lines, zero TBD/TODO/placeholder |
| M4: Template A structure | "## Usage" or "## When to Activate" + "## Validation Gate" present; no Template B markers |
| M5: evidence tag coverage | [EXPLICIT]/[INFERRED]/[OPEN] on >= 80% factual claims (>= 50% for Utility tier) |
Skip Phase 5 if the skill is CONDITIONAL or BLOCKED — MOAT requires CERTIFIED as a prerequisite. [EXPLICIT]
Phase 6: Produce Report
Use the Certification Report Template from references/certification-checklist.md. Apply the certification formula:
| Level | Formula | Recommendation |
|---|---|---|
| MOAT | CERTIFIED + all M1-M5 pass | "Ship it. Production-quality with full quality assurance." |
| CERTIFIED | All dimensions >= 7, average >= 8, all structural pass | "Passes quality. Upgrade to MOAT: add {missing M-checks}." |
| CONDITIONAL | Average >= 8 but 1-2 dims at 6, or 1-2 structural failures | "Fix {N} blockers, re-certify. Effort: {estimate}." |
| BLOCKED | Any dim < 6, or 3+ structural failures, or no SKILL.md | "Run /surgeon-skill {path}. {N} foundational issues." |
Certification is deterministic for structural checks and MOAT M-checks, judgment-based for rubric. If two certifications of the same unchanged skill produce different verdicts, the structural and MOAT results should be identical — only rubric scores may vary by 1 point on subjective dimensions (density, simplicity, value).
When a JSON report is available, run:
python3 -B skills/certify-skill/scripts/validate_certification_report.py \
--phases skills/certify-skill/assets/certification-phases.json \
--level-policy skills/certify-skill/assets/certification-level-policy.json \
--contract skills/certify-skill/assets/report-contract.json \
--evidence skills/certify-skill/assets/evidence-policy.json \
--report <certification-report.json>
Assumptions & Limits
- Read-only. This skill never modifies the skill being certified.
- Structural checks (Phase 1) are deterministic — same skill always produces same results.
- Rubric dimensions 4 (density), 5 (simplicity), and 10 (value) involve subjective judgment. Expected variance: 1 point per run. If variance exceeds 2 points, the skill's quality is in a borderline zone.
- Cannot evaluate runtime behavior. A skill can pass certification structurally but produce poor output due to flawed instruction logic. Use the skill-creator's eval loop for behavioral testing.
- Systemic coherence (Phase 3) is N/A for single-file skills. This is correct, not a gap.
- Certification takes 5-15 minutes depending on file count. Skills with 10+ files increase Phase 3 check time linearly.
Failure Modes
| Failure | Signal | Recovery |
|---|---|---|
| No SKILL.md found | S1 fails | Report BLOCKED immediately. Ask user to verify path. |
| Unparseable frontmatter | YAML error on frontmatter parse | Report as BLOCKER. Skill cannot trigger without valid frontmatter. |
| Borderline scores (multiple 7s, average 7.9) | CONDITIONAL but close to BLOCKED | Report honestly. List which dimensions need +1 to reach CERTIFIED. |
| Prior certification exists | User asks to re-certify after changes | Show delta: improved/degraded/unchanged per dimension. Highlight what changed. |
| Skill deliberately breaks conventions | Intentional deviation documented in the skill | Flag but don't auto-fail. Note: "Intentional deviation — user decision." |
Edge Cases
- Skill with no frontmatter: BLOCKED. Primary fix: "Add YAML frontmatter with name and description between --- markers."
- Skill that deliberately breaks conventions: Note the deviation. If documented and intentional, flag but don't auto-fail. If undocumented, score as a gap.
- Re-certification after surgeon-skill: Show before/after delta. Highlight improvements. If new issues appeared (rare), flag them explicitly.
- Very large skill (10+ files): Increase Phase 3 sample size. Check 5 paragraphs instead of 3. Check all key terms instead of 5.
- Skill that scores exactly on thresholds: Average 8.0, all dims exactly 7 = CERTIFIED. Average 7.9 = CONDITIONAL (formula is strict). Document the edge clearly.
- Single-file skill scoring 10/10: Valid. A well-crafted single SKILL.md with no need for references/scripts/agents can score perfectly. Don't penalize simplicity.
Example: Good vs Bad Certification
Bad certification:
Certification: CONDITIONAL. Some issues found. Please fix and re-certify. [EXPLICIT]
No evidence, no specifics, no fix instructions. Useless. [EXPLICIT]
Good certification:
Certification: CONDITIONAL (11/13 gate, avg 7.8/10)
Fails: S6 (references/patterns.md referenced but file doesn't exist),
Checkpoint 6 (no Good vs Bad example). [EXPLICIT]
Rubric: Depth 6/10 (only 2 edge cases; need 3+), others 8+. [EXPLICIT]
Fix: (1) Create references/patterns.md or remove the reference. [EXPLICIT]
(2) Add Good vs Bad section with concrete comparison. [EXPLICIT]
(3) Add 1+ edge case to Edge Cases section. [EXPLICIT]
Estimated effort: 30 minutes. Re-certify after. [EXPLICIT]
Specific, evidenced, actionable, with effort estimate. [EXPLICIT]
Validation Gate
Before delivering the certification report:
- All 9 structural checks have a binary PASS/FAIL with command evidence
- All 18 content checks (F1-4, B1-9, W1-5) have a result
- Systemic checks completed or N/A (with reason) for single-file skills
- All 10 rubric dimensions have a numeric score + one-sentence justification with evidence
- MOAT checks M1-M5 evaluated (or skipped if not CERTIFIED)
- Certification level matches the formula exactly (not assigned by feel)
- Every FAIL or BLOCKED item has a specific fix with estimated effort
- Report follows the template from references/certification-checklist.md
- If re-certification: delta from prior run is shown
Reference Files
| File | Content | Load When |
|---|---|---|
references/certification-checklist.md |
Complete checklist: 9 structural checks with commands, 18 content checks with criteria, 5 systemic checks with methods, 10 rubric scoring summaries, certification formula, report template | Always — this IS the certification engine |
Author: Javier Montano | Last updated: March 27, 2026