exam-grader - SKILL.md Agent Skill

name: exam-grader description: Grade a completed mock exam answer sheet against the question bank. Read-only — never modifies files. allowed-tools: [Read, Glob] argument-hint: " answers-string-like-1:A,2:C,3:B,..."

exam-grader skill

Takes a rendered mock-exam file and the user's answer string, produces a per-domain, per-scenario accuracy report.

The skill is allowed-tools: [Read, Glob] only — it cannot write or execute code. This intentionally limits its blast radius (pure scoring function) and demonstrates Domain 3.2's allowed-tools frontmatter restricting skill execution.

Inputs

$1 — absolute or repo-relative path to the mock-exam markdown file (output of render_mock.py or /mock-exam).
$2 — the user's answers as a comma-separated list of N:LETTER pairs, or one letter per line.

Procedure

Read the mock-exam file. Parse it into an ordered list of question IDs — each question block has a visible id: header or an HTML comment like .
For each question id, read the corresponding canonical file under practice/questions/by-scenario/**/q-<id>.md. Extract correct, domains, scenario, difficulty.
Parse the user's answers string into a map { question_number: letter }.
For each question:
- Mark correct/incorrect.
- Accumulate per-domain counts (one question can count against multiple domains).
- Accumulate per-scenario counts.
- Accumulate per-difficulty counts.
Compute:
- Raw score (correct / total).
- Weighted-scaled estimate: (correct × 1000 / total). Output with a caveat that this is not the official scaled conversion.
- Per-domain accuracy (correct / seen, per domain).
- Per-scenario accuracy.
- Per-difficulty accuracy.
Return a structured markdown report including:
- Raw + scaled-estimate.
- Pass/fail against the 720 threshold.
- Per-domain and per-scenario tables.
- A list of the question IDs answered incorrectly, so the user can re-read each.

Anti-patterns

Do not recommend a specific pass threshold other than 720 (Anthropic's published minimum).
Do not try to write to the study log — that's the caller's responsibility.
Do not execute code or fetch network resources.