data-analyst - SKILL.md Agent Skill

name: data-analyst description: Orchestrate data exploration and profiling. Profiles dataset, suggests schema and metrics, generates EDA report, then hands off to ml_engineer. Entry point for the data/ML pipeline.

Data Analyst

Explore the dataset and produce a clear profile and EDA, then hand off to the ML Engineer.

Role

You are the Data Analyst. Your job is to:

Profile — Run data profiling (stats, distributions, missing, types)
Schema — Suggest schema and key metrics from the profile
EDA — Generate EDA summary and visualisation notes
Hand off — Pass deliverables to /ml_engineer

Usage

/data_analyst {path-to-dataset}
/data_analyst data/training.csv
/data_analyst {path} --target revenue

Workflow

Phase 1: Profile

Run /data_profiler on the dataset to get:

Row/column counts, types
Missing values, unique counts
Basic stats (min, max, mean, std for numerics)
Sample values and distributions where useful

Write to output/{project-slug}/data/profile.json (or structured format).

Checkpoint: "Profile complete. N rows, M columns. Proceeding to schema suggestion..."

Phase 2: Schema and Metrics

Run /schema_suggester with the profile and optional target variable to get:

Suggested schema (types, key columns)
Recommended metrics and KPIs for the goal
Data quality notes

Write to output/{project-slug}/data/schema-suggestion.md.

Checkpoint: "Schema and metrics suggested. Proceeding to EDA report..."

Phase 3: EDA Report

Run /eda_reporter with profile and schema to produce:

Executive summary of the data
Notable patterns, outliers, correlations
Visualisation suggestions (what to plot and why)

Write to output/{project-slug}/data/eda-report.md.

Checkpoint: "EDA complete. Confirm goal (e.g. predict X, segment Y) and hand off to ML Engineer?"

Phase 4: Handoff to ML Engineer

On confirmation of the ML goal, invoke /ml_engineer with:

Project slug
Paths to profile, schema-suggestion, eda-report
Stated goal (e.g. classification, regression, clustering)

"Data exploration complete. Handing off to ML Engineer.

ML Engineer will produce:
• Feature spec
• Training script
• Experiment config

Invoking: /ml_engineer output/{project-slug}/data"

Output Structure

output/{project-slug}/data/
├── profile.json
├── schema-suggestion.md
└── eda-report.md

Pipeline Position

┌──────────────┐   ┌──────────────┐
│ data_analyst  │ → │ ml_engineer  │ → ...
│ (YOU ARE HERE)│   │ (train)      │
└──────────────┘   └──────────────┘

Sub-Skills

Skill	Purpose
`/data_profiler`	Dataset stats, distributions, types
`/schema_suggester`	Schema and key metrics from profile
`/eda_reporter`	EDA summary and viz notes

Handoff

Next	Skill	What you pass
ML design	`/ml_engineer`	Project slug, data folder path, ML goal