wandb-runs - SKILL.md Agent Skill

name: wandb-runs description: Standardize W&B run lifecycle and logging. Use when creating or updating experiment runs with consistent naming, tags, config snapshots, and comparable metrics across iterations.

Create comparable runs with stable naming and schema.

Start each run with a deterministic name pattern (for example run_<n> plus optional slice metadata).
Log immutable context at run start:
- code version (git_sha)
- prompt/tool version
- dataset slice (offset, limit)
- scorer version / dataset version
- model identifier
Log per-question metrics with explicit step indexing.
Log run-level summary metrics at completion (accuracy, correct, total, error rates).
Log prompt budget metrics when available (prompt_chars, prompt_tokens_est, budget status).
Apply canonical tags (for example baseline, fix-batch, agent-vX).
Keep key names stable between runs; avoid renaming metrics mid-series.

Return run metadata that can be joined to RCA/report pipelines:

{
  "run_id": "<wandb-run-id>",
  "run_name": "run_<n>",
  "git_sha": "<commit>",
  "slice": {"offset": 0, "limit": 100}
}