task-author

star 16

Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows.

brainqub3 By brainqub3 schedule Updated 2/10/2026

name: task-author description: Create or repair Brainqub3 task packages that must pass evaluator tests before runs, including both fabricated instances and user-provided data workflows. disable-model-invocation: true allowed-tools: - Read - Edit - Bash - Glob - Grep

task-author

Use this skill when a new task is needed or when an existing task package is incomplete.

Goal

Produce:

  1. brainqub3/tasks/<task>/task.md
  2. brainqub3/tasks/<task>/instances.jsonl
  3. brainqub3/tasks/<task>/evaluator.py
  4. brainqub3/tasks/<task>/tests/test_evaluator.py
  5. Optional fixtures/ files required for deterministic evaluation

Workflow

  1. Initialize scaffold if task folder does not exist: uv run brainqub3 task init <task_name>.
  2. Define deterministic output contract in task.md (exact JSON keys, types, and no extra keys unless explicitly allowed).
  3. Choose data mode: fabricated task data or user-provided data.
  4. Build instances.jsonl with stable IDs and task inputs.
  5. Implement evaluator with explicit failure taxonomy (invalid_json, not_object, schema_mismatch, answer_mismatch, plus task-specific errors).
  6. Add evaluator tests for pass/fail and malformed output.
  7. Run uv run pytest brainqub3/tasks/<task>/tests -q and fix until green.
  8. Optionally run a smoke check: uv run brainqub3 run sas --task <task_name> --instances 1 --allow-mock.

Data Modes

1) Fabricated Data Mode

  • Generate deterministic instances directly in instances.jsonl.
  • Prefer explicit gold answers when truth can be precomputed.
  • If truth is derived from local files/rules, store deterministic inputs under fixtures/ and derive truth in evaluator.
  • Include edge cases that break naive or hard-coded solutions.
  • Keep constants and generation logic stable for reproducibility.

2) User-Provided Data Mode

  • Confirm expected input source (files, folders, schema) before building instances.
  • Snapshot minimal required data into task-local fixtures/ when possible.
  • If data cannot be copied, document stable path assumptions and required layout in task.md.
  • Normalize user data into deterministic instances.jsonl rows with id and input.
  • Add evaluator checks for missing files, missing fields, malformed records, and empty data slices.

Quality Gates

  • task.md specifies deterministic success criteria and strict output contract.
  • Evaluator returns EvalResult with actionable error_type and useful details.
  • Tests cover at least:
    • clear pass
    • clear fail
    • invalid JSON
    • schema mismatch
    • one data-source-specific failure path
  • uv run pytest brainqub3/tasks/<task>/tests -q passes before any SAS/MAS run.

Coordination

  • Use eval-builder when evaluator complexity grows or tests are brittle.
  • Report changed files and validation commands run.
Install via CLI
npx skills add https://github.com/brainqub3/agent-labs --skill task-author
Repository Details
star Stars 16
call_split Forks 6
navigation Branch main
article Path SKILL.md
More from Creator