evaluating-machine-learning-models - SKILL.md Agent Skill

name: evaluating-machine-learning-models description: | Evaluate trained machine learning models with the right metrics and comparison logic. Use for benchmark review, threshold selection, calibration, validation, and model comparison; not for feature engineering or leakage auditing. allowed-tools: Read, Write, Edit, Grep, Glob, Bash(cmd:*) version: 1.0.0 author: Jeremy Longshore jeremy@intentsolutions.io license: MIT

Use this skill when the model exists and the question is whether it is good enough.

This skill focuses on choosing and interpreting the right evaluation metrics for the problem, then comparing candidate models or thresholds.

Comparing candidate models with consistent metrics
Reviewing precision/recall/F1/AUC, regression error, calibration, or ranking quality
Stress-testing validation strategy before deployment or publication

Building the training pipeline itself: use scikit-learn for classical modeling or ml-pipeline-workflow for end-to-end workflow ownership
Engineering features: use preprocessing-data-with-automated-pipelines
Checking train/test contamination: use ml-data-leakage-guard