name: wandb-reports description: Build W&B dashboards and reports that explain run quality and fix impact. Use when presenting accuracy trends, failure buckets, RCA summaries, and cross-run comparisons.
W&B Reports
Publish readable run evidence for humans and automation.
Execute
- Create a run dashboard with canonical panels:
- overall accuracy
- correct vs failed counts
- sql/tool error rates
- retry/tool-call distributions
- Include question text, prediction, and correctness in tabular views.
- Add RCA summary panels or linked tables for failed questions.
- Use stable run labels (
run_1,run_2,run_3, ...). - Keep chart semantics consistent between runs to avoid misleading comparisons.
Fallback Order
- Use W&B report APIs and MCP report creation tools.
- If layout semantics are unclear, use official W&B reporting docs.
- If needed, inspect existing report/dashboard scripts in the repo.
Output Contract
Return a report artifact descriptor:
{
"run_id": "<wandb-run-id>",
"run_label": "run_4",
"dashboard_url": "<url>",
"panels": ["accuracy", "failure_table", "rca_summary"]
}