eval-builder

star 16

Build or repair task evaluators and evaluator tests with deterministic-first strategy.

brainqub3 By brainqub3 schedule Updated 2/9/2026

name: eval-builder description: Build or repair task evaluators and evaluator tests with deterministic-first strategy. disable-model-invocation: true allowed-tools: - Read - Edit - Bash - Glob - Grep

eval-builder

Use this skill when evaluator is missing, incomplete, or failing tests.

Goal

Produce:

  1. brainqub3/tasks/<task>/evaluator.py
  2. brainqub3/tasks/<task>/tests/test_evaluator.py
  3. Minimal fixtures/instances for deterministic assertions
  4. Updated task.md output contract

Strategy Priority

  1. Deterministic programmatic checks
  2. JSON schema + explicit constraints
  3. Simulator/replay checks
  4. Fuzzy but programmatic heuristics
  5. LLM judge as last resort

Workflow

  1. Read task.md and instances.jsonl
  2. Identify output contract and success criteria
  3. Implement evaluator with explicit failure taxonomy
  4. Add tests for:
    • clear pass
    • clear fail
    • malformed output
  5. Run pytest brainqub3/tasks/<task>/tests -q
Install via CLI
npx skills add https://github.com/brainqub3/agent-labs --skill eval-builder
Repository Details
star Stars 16
call_split Forks 6
navigation Branch main
article Path SKILL.md
More from Creator