name: eval-builder description: Build or repair task evaluators and evaluator tests with deterministic-first strategy. disable-model-invocation: true allowed-tools: - Read - Edit - Bash - Glob - Grep
eval-builder
Use this skill when evaluator is missing, incomplete, or failing tests.
Goal
Produce:
brainqub3/tasks/<task>/evaluator.pybrainqub3/tasks/<task>/tests/test_evaluator.py- Minimal fixtures/instances for deterministic assertions
- Updated
task.mdoutput contract
Strategy Priority
- Deterministic programmatic checks
- JSON schema + explicit constraints
- Simulator/replay checks
- Fuzzy but programmatic heuristics
- LLM judge as last resort
Workflow
- Read
task.mdandinstances.jsonl - Identify output contract and success criteria
- Implement evaluator with explicit failure taxonomy
- Add tests for:
- clear pass
- clear fail
- malformed output
- Run
pytest brainqub3/tasks/<task>/tests -q