wandb-improve

star 4

Analyze Weave traces and eval results, then improve Mistral prompts for the Promus task agent

stickerdaniel

By stickerdaniel schedule Updated 3/1/2026

play_arrow Run Skill in Manus View GitHub

name: wandb-improve description: Analyze Weave traces and eval results, then improve Mistral prompts for the Promus task agent

Self-Improvement Workflow for Promus

You have access to the W&B MCP Server. Follow this loop:

Step 1: Run current evals

Run cd evals && uv run python scripts/run_eval.py to get baseline scores.

Step 2: Analyze results via W&B MCP

Use the W&B MCP tools to:

Query the latest evaluation results: "Show evaluation scores for promus/task-agent"
Query recent traces: "Show the last 20 task processing traces, sorted by latency"
Identify patterns: Which intents fail? Which tools are misrouted? What's slow?

Step 3: Diagnose issues

Read the current prompts in evals/scripts/run_eval.py:

INTENT_CLASSIFIER_PROMPT — intent classification system prompt
TASK_PLANNER_PROMPT — task planning system prompt

Compare prompt instructions against the failures found in traces.

Step 4: Improve

Edit the prompts to address identified issues. Common improvements:

Add few-shot examples for misclassified intents
Clarify tool selection rules
Add tone guidance for drafting
Adjust temperature for consistency

Step 5: Re-evaluate

Run cd evals && uv run python scripts/run_eval.py again to generate new scores.

Step 6: Report

Use W&B MCP create_wandb_report_tool to create a comparison report showing before/after metrics. Document what changed and why in a commit message.

Install via CLI

npx skills add https://github.com/stickerdaniel/promus --skill wandb-improve

Repository Details

star Stars 4

call_split Forks 0

navigation Branch main

article Path SKILL.md

More from Creator

stickerdaniel

stickerdaniel Explore all skills →