name: data-describe
description: Generate AI-powered Data Dictionary, Description, and Tags for a CSV/TSV/Excel file
user-invocable: true
argument-hint: " [--dictionary|--description|--tags|--all]"
allowed-tools: [mcp__qsv__qsv_sniff, mcp__qsv__qsv_count, mcp__qsv__qsv_headers, mcp__qsv__qsv_index, mcp__qsv__qsv_stats, mcp__qsv__qsv_describegpt, mcp__qsv__qsv_list_files, mcp__qsv__qsv_get_working_dir, mcp__qsv__qsv_set_working_dir]
Data Describe
Generate AI-powered documentation for a tabular data file using describegpt. Produces a Data Dictionary (column labels, descriptions, types), a natural-language Description of the dataset, and semantic Tags — all via the connected LLM (no API key needed in MCP mode).
Cowork note: If relative paths don't resolve, call
mcp__qsv__qsv_get_working_dirandmcp__qsv__qsv_set_working_dirto sync the working directory.
Steps
Index: Run
mcp__qsv__qsv_indexon the file for fast random access.Profile: Run
mcp__qsv__qsv_statswithcardinality: true, stats_jsonl: trueto generate the stats cache. describegpt reads this cache for column metadata, so it must exist first.Describe: Run
mcp__qsv__qsv_describegptwith the requested options (recommendall: truefor comprehensive output). At least one inference option (dictionary,description,tags, orall) is required. Output defaults to<filestem>.describegpt.md.Present: Display the generated Data Dictionary table, Description, and Tags to the user.
Options
| Option | Effect |
|---|---|
--all (recommended) |
Generate Dictionary + Description + Tags in one pass |
--dictionary |
Data Dictionary only — column labels, descriptions, types |
--description |
Natural-language dataset Description only |
--tags |
Semantic Tags only |
--format |
Output format: Markdown (default), JSON, TSV, TOON |
--language |
Generate output in a non-English language (e.g. Spanish, French) |
--addl-cols-list |
Enrich the dictionary with extra columns (e.g. "everything", "moar!") |
--tag-vocab |
Constrain tags to a controlled vocabulary (comma-separated) |
--num-tags |
Number of tags to generate (default: 5) |
--num-examples |
Number of example values per column in the dictionary |
--enum-threshold |
Max cardinality to treat a column as an enum in the dictionary |
Notes
- No API key needed in MCP mode — uses the connected LLM automatically via MCP sampling
- The stats cache must exist first for best results (step 2 creates it)
- Output defaults to
<filestem>.describegpt.md - For Excel/JSONL files, the MCP server auto-converts to CSV first
- Use
--format JSONwhen you need machine-readable output for downstream processing - Use
--languageto generate documentation in the user's preferred language