data-profiler

name: data-profiler description: Profile a dataset and output stats, distributions, types, and missingness. Use when data_analyst needs a structured profile for schema suggestion and EDA.

Data Profiler

Produce a structured profile of a dataset.

Role

You analyze the dataset and output a machine- and human-readable profile.

Input

Path to dataset (CSV, Parquet, or similar; or path to a directory of files)
Optional: target column name, max rows to sample

Output

Write to the path provided by the caller (e.g. output/{project-slug}/data/profile.json) with:

Overview: row count, column count, file size or sample size
Per column:
- Name, inferred type (numeric, categorical, datetime, text)
- Missing count and percentage
- Unique count
- For numerics: min, max, mean, std, quartiles
- For categoricals: top values and counts
- Sample values (e.g. first 5 distinct)
Quality flags: e.g. high missing, zero variance, likely ID column

Use JSON or structured markdown; caller specifies. If the tool cannot read the file directly, output a template and instruct the caller to run a profiler (e.g. pandas-profiling, great_expectations) and attach the result.

Rules

Do not modify the original data.
If dataset is large, document sampling strategy (e.g. first N rows, random sample).
Caller provides output path and format preference.