name: exam-grader
description: Grade a completed mock exam answer sheet against the question bank. Read-only — never modifies files.
allowed-tools: [Read, Glob]
argument-hint: " answers-string-like-1:A,2:C,3:B,..."
exam-grader skill
Takes a rendered mock-exam file and the user's answer string, produces a per-domain, per-scenario accuracy report.
The skill is allowed-tools: [Read, Glob] only — it cannot write or execute code. This intentionally limits its blast radius (pure scoring function) and demonstrates Domain 3.2's allowed-tools frontmatter restricting skill execution.
Inputs
$1— absolute or repo-relative path to the mock-exam markdown file (output ofrender_mock.pyor/mock-exam).$2— the user's answers as a comma-separated list ofN:LETTERpairs, or one letter per line.
Procedure
Read the mock-exam file. Parse it into an ordered list of question IDs — each question block has a visible
id:header or an HTML comment like<!-- id: q-cs-001 -->.For each question id, read the corresponding canonical file under
practice/questions/by-scenario/**/q-<id>.md. Extractcorrect,domains,scenario,difficulty.Parse the user's answers string into a map
{ question_number: letter }.For each question:
- Mark correct/incorrect.
- Accumulate per-domain counts (one question can count against multiple domains).
- Accumulate per-scenario counts.
- Accumulate per-difficulty counts.
Compute:
- Raw score (correct / total).
- Weighted-scaled estimate: (correct × 1000 / total). Output with a caveat that this is not the official scaled conversion.
- Per-domain accuracy (correct / seen, per domain).
- Per-scenario accuracy.
- Per-difficulty accuracy.
Return a structured markdown report including:
- Raw + scaled-estimate.
- Pass/fail against the 720 threshold.
- Per-domain and per-scenario tables.
- A list of the question IDs answered incorrectly, so the user can re-read each.
Anti-patterns
- Do not recommend a specific pass threshold other than 720 (Anthropic's published minimum).
- Do not try to write to the study log — that's the caller's responsibility.
- Do not execute code or fetch network resources.