data-modeling-baselines - SKILL.md Agent Skill

name: data-modeling-baselines description: Baseline machine learning workflow for tabular data. Use when the user asks to build first-pass classification or regression models, compare simple algorithms, evaluate metrics, and produce an experiment summary before advanced tuning.

Data Modeling Baselines

Overview

Use this skill to move from cleaned tabular data to a defensible baseline result with clear metrics and reproducible assumptions.

Workflow

Define target variable and task type (classification or regression).
Validate leakage risk and remove forbidden columns.
Split train/test with fixed random seed.
Train baseline models (dummy, linear, tree-based).
Compare metrics and record the winner.
Write experiment summary and next steps.

Mandatory Inputs

Before training, confirm:

Target column name
Business metric (for example, recall vs precision tradeoff)
Unit cost of false positives and false negatives (classification)
Unit cost of prediction error (regression)
Any fairness or explainability requirement

If unknown, proceed with explicit assumptions.

Quick Command

python .agent/skills/data-modeling-baselines/scripts/train_baseline.py data/raw/raw-dataset.xlsx --target target_column --output data/baseline-result.json

Evaluation Rules

Classification default metric: f1_weighted.
Regression default metric: rmse (lower is better).
Always compare against dummy baseline.
Report both business-facing and technical interpretation.

References

Use references/modeling-checklist.md before training.
Use references/experiment-report-template.md to structure output.