exam-grader

star 0

Grade a completed mock exam answer sheet against the question bank. Read-only — never modifies files.

bdevz By bdevz schedule Updated 4/18/2026

name: exam-grader description: Grade a completed mock exam answer sheet against the question bank. Read-only — never modifies files. allowed-tools: [Read, Glob] argument-hint: " answers-string-like-1:A,2:C,3:B,..."

exam-grader skill

Takes a rendered mock-exam file and the user's answer string, produces a per-domain, per-scenario accuracy report.

The skill is allowed-tools: [Read, Glob] only — it cannot write or execute code. This intentionally limits its blast radius (pure scoring function) and demonstrates Domain 3.2's allowed-tools frontmatter restricting skill execution.

Inputs

  • $1 — absolute or repo-relative path to the mock-exam markdown file (output of render_mock.py or /mock-exam).
  • $2 — the user's answers as a comma-separated list of N:LETTER pairs, or one letter per line.

Procedure

  1. Read the mock-exam file. Parse it into an ordered list of question IDs — each question block has a visible id: header or an HTML comment like <!-- id: q-cs-001 -->.

  2. For each question id, read the corresponding canonical file under practice/questions/by-scenario/**/q-<id>.md. Extract correct, domains, scenario, difficulty.

  3. Parse the user's answers string into a map { question_number: letter }.

  4. For each question:

    • Mark correct/incorrect.
    • Accumulate per-domain counts (one question can count against multiple domains).
    • Accumulate per-scenario counts.
    • Accumulate per-difficulty counts.
  5. Compute:

    • Raw score (correct / total).
    • Weighted-scaled estimate: (correct × 1000 / total). Output with a caveat that this is not the official scaled conversion.
    • Per-domain accuracy (correct / seen, per domain).
    • Per-scenario accuracy.
    • Per-difficulty accuracy.
  6. Return a structured markdown report including:

    • Raw + scaled-estimate.
    • Pass/fail against the 720 threshold.
    • Per-domain and per-scenario tables.
    • A list of the question IDs answered incorrectly, so the user can re-read each.

Anti-patterns

  • Do not recommend a specific pass threshold other than 720 (Anthropic's published minimum).
  • Do not try to write to the study log — that's the caller's responsibility.
  • Do not execute code or fetch network resources.
Install via CLI
npx skills add https://github.com/bdevz/elo --skill exam-grader
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator