task-author

name: task-author description: Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows. disable-model-invocation: true allowed-tools: - Read - Edit - Bash - Glob - Grep

Use this skill when a new task is needed or when an existing task package is incomplete.

Produce:

Initialize scaffold if task folder does not exist: uv run brainqub3 task init <task_name>.
Define deterministic output contract in task.md (exact JSON keys, types, and no extra keys unless explicitly allowed).
Choose data mode: fabricated task data or user-provided data.
Build instances.jsonl with stable IDs and task inputs.
Implement evaluator with explicit failure taxonomy (invalid_json, not_object, schema_mismatch, answer_mismatch, plus task-specific errors).
Add evaluator tests for pass/fail and malformed output.
Run uv run pytest brainqub3/tasks/<task>/tests -q and fix until green.
Optionally run a smoke check: uv run brainqub3 run sas --task <task_name> --instances 1 --allow-mock.

Generate deterministic instances directly in instances.jsonl.
Prefer explicit gold answers when truth can be precomputed.
If truth is derived from local files/rules, store deterministic inputs under fixtures/ and derive truth in evaluator.
Include edge cases that break naive or hard-coded solutions.
Keep constants and generation logic stable for reproducibility.

Confirm expected input source (files, folders, schema) before building instances.
Snapshot minimal required data into task-local fixtures/ when possible.
If data cannot be copied, document stable path assumptions and required layout in task.md.
Normalize user data into deterministic instances.jsonl rows with id and input.
Add evaluator checks for missing files, missing fields, malformed records, and empty data slices.

task.md specifies deterministic success criteria and strict output contract.
Evaluator returns EvalResult with actionable error_type and useful details.
Tests cover at least:
- clear pass
- clear fail
- invalid JSON
- schema mismatch
- one data-source-specific failure path
uv run pytest brainqub3/tasks/<task>/tests -q passes before any SAS/MAS run.