name: task-author description: Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows. disable-model-invocation: true allowed-tools: - Read - Edit - Bash - Glob - Grep
task-author
Use this skill when a new task is needed or when an existing task package is incomplete.
Goal
Produce:
brainqub3/tasks/<task>/task.mdbrainqub3/tasks/<task>/instances.jsonlbrainqub3/tasks/<task>/evaluator.pybrainqub3/tasks/<task>/tests/test_evaluator.py- Optional
fixtures/files required for deterministic evaluation
Workflow
- Initialize scaffold if task folder does not exist:
uv run brainqub3 task init <task_name>. - Define deterministic output contract in
task.md(exact JSON keys, types, and no extra keys unless explicitly allowed). - Choose data mode: fabricated task data or user-provided data.
- Build
instances.jsonlwith stable IDs and task inputs. - Implement evaluator with explicit failure taxonomy (
invalid_json,not_object,schema_mismatch,answer_mismatch, plus task-specific errors). - Add evaluator tests for pass/fail and malformed output.
- Run
uv run pytest brainqub3/tasks/<task>/tests -qand fix until green. - Optionally run a smoke check:
uv run brainqub3 run sas --task <task_name> --instances 1 --allow-mock.
Data Modes
1) Fabricated Data Mode
- Generate deterministic instances directly in
instances.jsonl. - Prefer explicit
goldanswers when truth can be precomputed. - If truth is derived from local files/rules, store deterministic inputs under
fixtures/and derive truth in evaluator. - Include edge cases that break naive or hard-coded solutions.
- Keep constants and generation logic stable for reproducibility.
2) User-Provided Data Mode
- Confirm expected input source (files, folders, schema) before building instances.
- Snapshot minimal required data into task-local
fixtures/when possible. - If data cannot be copied, document stable path assumptions and required layout in
task.md. - Normalize user data into deterministic
instances.jsonlrows withidandinput. - Add evaluator checks for missing files, missing fields, malformed records, and empty data slices.
Quality Gates
task.mdspecifies deterministic success criteria and strict output contract.- Evaluator returns
EvalResultwith actionableerror_typeand usefuldetails. - Tests cover at least:
- clear pass
- clear fail
- invalid JSON
- schema mismatch
- one data-source-specific failure path
uv run pytest brainqub3/tasks/<task>/tests -qpasses before any SAS/MAS run.
Coordination
- Use
eval-builderwhen evaluator complexity grows or tests are brittle. - Report changed files and validation commands run.