name: wandb-experiment-analyzer description: When the user asks to analyze a Weights & Biases (wandb) project or experiment results, particularly when they need to identify the best-performing run based on validation metrics and find the optimal training step within that run. This skill handles querying wandb projects via GraphQL API, extracting run summaries and history data, comparing validation scores across experiments, analyzing step-level performance within the best run, and exporting results to structured formats like CSV files. It's triggered by requests involving wandb URLs, experiment comparison, validation performance analysis, or finding optimal training checkpoints.
Instructions
Overview
This skill analyzes wandb projects to identify the best-performing experiment based on validation metrics and finds the optimal training step within that experiment. It outputs results to a CSV file.
Core Workflow
1. Parse User Request
- Extract the wandb project URL from the user's request (format:
https://wandb.ai/<entity>/<project>) - Identify the target validation metric (default:
val/test_score/or similar validation score metrics) - Determine output requirements (CSV file location, format)
2. Query Project Information
- Use the
wandb-query_wandb_toolwith GraphQL to fetch:- Project metadata (entity, project name, run count)
- List of all runs with their summary metrics
- Look for validation score metrics in summary data
3. Identify Best Experiment
- Parse summary metrics from all runs
- Extract validation scores (look for metrics like
val/test_score/,val_score,validation_score) - Compare scores across all runs to identify the best-performing experiment
- Note: Some runs may have null or missing validation scores
4. Fetch Detailed History for Best Run
- Query the history data for the best run using GraphQL
- Request sufficient samples to capture all validation score entries
- Handle large response data that may be truncated or saved to files
5. Analyze Step-Level Performance
- Parse the history data to extract validation scores and corresponding steps
- Filter out null/missing validation scores
- Sort by validation score in descending order
- Identify the step with the highest validation score
6. Generate Output
- Create a CSV file with columns:
best_experiment_name,best_step,best_val_score - Save to the workspace directory as specified by the user
- Provide a summary of findings to the user
Key Considerations
Validation Metric Identification
- The validation metric name may vary across projects
- Common patterns:
val/test_score/,val_score,validation/score,eval/score - Check summary metrics first to identify the correct metric name
Data Handling
- Large history responses may be truncated; use the
local-search_overlong_tooloutputtool to search within saved files - Some steps may have null validation scores (only recorded at evaluation intervals)
- Ensure proper JSON parsing of nested data structures
Error Handling
- Handle missing or inaccessible projects/runs gracefully
- Provide clear error messages if validation metrics cannot be found
- Handle authentication/permission issues with wandb API
Tools Required
wandb-query_wandb_tool: For GraphQL queries to wandb APIlocal-search_overlong_tooloutput: For searching within large saved tool outputsfilesystem-write_file: For creating CSV output filesfilesystem-read_file: For reading saved data filesterminal-run_command: For running analysis scripts (when needed)
Output Format
The skill produces a CSV file with the following structure: