name: data-modeling-baselines description: Baseline machine learning workflow for tabular data. Use when the user asks to build first-pass classification or regression models, compare simple algorithms, evaluate metrics, and produce an experiment summary before advanced tuning.
Data Modeling Baselines
Overview
Use this skill to move from cleaned tabular data to a defensible baseline result with clear metrics and reproducible assumptions.
Workflow
- Define target variable and task type (classification or regression).
- Validate leakage risk and remove forbidden columns.
- Split train/test with fixed random seed.
- Train baseline models (dummy, linear, tree-based).
- Compare metrics and record the winner.
- Write experiment summary and next steps.
Mandatory Inputs
Before training, confirm:
- Target column name
- Business metric (for example, recall vs precision tradeoff)
- Unit cost of false positives and false negatives (classification)
- Unit cost of prediction error (regression)
- Any fairness or explainability requirement
If unknown, proceed with explicit assumptions.
Quick Command
python .agent/skills/data-modeling-baselines/scripts/train_baseline.py data/raw/raw-dataset.xlsx --target target_column --output data/baseline-result.json
Evaluation Rules
- Classification default metric:
f1_weighted. - Regression default metric:
rmse(lower is better). - Always compare against dummy baseline.
- Report both business-facing and technical interpretation.
References
- Use
references/modeling-checklist.mdbefore training. - Use
references/experiment-report-template.mdto structure output.