wandb-reports - SKILL.md Agent Skill

name: wandb-reports description: Build W&B dashboards and reports that explain run quality and fix impact. Use when presenting accuracy trends, failure buckets, RCA summaries, and cross-run comparisons.

Publish readable run evidence for humans and automation.

Create a run dashboard with canonical panels:
- overall accuracy
- correct vs failed counts
- sql/tool error rates
- retry/tool-call distributions
Include question text, prediction, and correctness in tabular views.
Add RCA summary panels or linked tables for failed questions.
Use stable run labels (run_1, run_2, run_3, ...).
Keep chart semantics consistent between runs to avoid misleading comparisons.

Return a report artifact descriptor:

{
  "run_id": "<wandb-run-id>",
  "run_label": "run_4",
  "dashboard_url": "<url>",
  "panels": ["accuracy", "failure_table", "rca_summary"]
}