name: run-lightdash-evals description: Orchestrate evaluation runs and test case management for Lightdash agents.
Run Lightdash Evaluations
Skill for managing and executing evaluations for Lightdash AI agents.
Purpose
Enables the "Eval-Driven Development" workflow by providing tools to create evaluation suites, append test cases (prompts), execute evaluation runs, and analyze the results.
Tools
Wraps the following MCP tools from the lightdash-tools server:
ldt__list_agent_evaluationsldt__get_agent_evaluationldt__create_agent_evaluationldt__update_agent_evaluationldt__append_agent_evaluation_promptsldt__run_agent_evaluationldt__list_agent_evaluation_runsldt__get_agent_evaluation_run_resultsldt__delete_agent_evaluation
Safety Mode Compliance
- Read Tools:
list_agent_evaluations,get_agent_evaluation,list_agent_evaluation_runs,get_agent_evaluation_run_results. - Write-Safe Tools:
create_agent_evaluation,update_agent_evaluation,append_agent_evaluation_prompts,run_agent_evaluation. - Write-Destructive Tools:
delete_agent_evaluation.
Behavior
- Test Case Management:
- Use
ldt__append_agent_evaluation_promptsto add 20-50 diverse test cases representing real-world user queries. - Organize evaluations by agent or project to maintain clarity.
- Use
- Execution:
- Trigger a run using
ldt__run_agent_evaluation. - Monitor the progress using
ldt__list_agent_evaluation_runs.
- Trigger a run using
- Analysis:
- Once a run is complete, fetch the detailed results via
ldt__get_agent_evaluation_run_results. - Identify patterns in failures (e.g., specific dimensions or metrics that the agent struggles with).
- Once a run is complete, fetch the detailed results via
Rules
- ALWAYS create or update an evaluation suite before deploying major changes to an agent's prompt.
- NEVER delete an evaluation suite without explicit confirmation.
- Use the
agent-tunersub-agent to automatically process evaluation results for improvement.