data-viz - SKILL.md Agent Skill

name: data-viz description: Create publication-quality visualizations from CSV/TSV/Excel data using Python user-invocable: true argument-hint: " [chart type]" allowed-tools: [mcpqsvqsv_sniff, mcpqsvqsv_count, mcpqsvqsv_headers, mcpqsvqsv_index, mcpqsvqsv_stats, mcpqsvqsv_moarstats, mcpqsvqsv_frequency, mcpqsvqsv_search, mcpqsvqsv_select, mcpqsvqsv_slice, mcpqsvqsv_sqlp, mcpqsvqsv_command, mcpqsvqsv_list_files, mcpqsvqsv_search_tools, mcpqsvqsv_get_working_dir, mcpqsvqsv_set_working_dir]

Data Viz

Create publication-quality data visualizations from tabular data files. Uses qsv to profile and prepare data, then generates Python charts with best practices for clarity, accuracy, and design.

Cowork note: If relative paths don't resolve, call mcp__qsv__qsv_get_working_dir and mcp__qsv__qsv_set_working_dir to sync the working directory.

Steps

1. Understand the Request

Determine:

Data source: CSV/TSV/Excel file, or query results from a prior analysis
Chart type: Explicitly requested or needs to be recommended
Purpose: Exploration, presentation, report, dashboard component
Audience: Technical team, executives, external stakeholders

2. Profile the Data with qsv

a. Index and detect: Run mcp__qsv__qsv_index, then mcp__qsv__qsv_sniff to detect format and encoding.

b. Understand structure: Run mcp__qsv__qsv_headers and mcp__qsv__qsv_count to get column names and row count.

c. Profile columns: Run mcp__qsv__qsv_stats with cardinality: true, stats_jsonl: true to understand types, ranges, and distributions. Read .stats.csv to inform chart design:

type → choose appropriate axis type (numeric, categorical, date)
min/max → set axis ranges
cardinality → determine if column is categorical (low) or continuous (high)
nullcount → note missing data that could affect the chart

d. Check distributions: Run mcp__qsv__qsv_frequency with limit: 20 on columns you plan to plot — this reveals the actual values and whether grouping or filtering is needed.

e. Run moarstats for visualization hints: Run mcp__qsv__qsv_moarstats with advanced: true. Read the enriched .stats.csv for chart design decisions:

Stats Column	Visualization Hint
`skewness` / `pearson_skewness`	If \|skewness\| > 1, use log scale or split view; histogram will be lopsided on linear scale
`bimodality_coefficient`	If >= 0.555, data is bimodal — overlay two distributions or use separate panels per group
`kurtosis`	If > 3, heavy tails — add outlier annotations or use box plot alongside histogram
`outliers_percentage`	If > 5%, annotate outliers in scatter plots; if > 10%, consider separate outlier panel
`q1`, `q3`, `iqr`	Set box plot boundaries; whiskers at inner fences (`q1 - 1.5iqr`, `q3 + 1.5iqr`)
`cv`	If CV > 100%, data is highly variable relative to mean — use normalized/percentage scale
`sparsity`	If > 0.5, too many nulls to visualize meaningfully — warn user or show completeness bar
`mode`, `mode_count`	If mode dominates (> 50% of rows), bar chart of top-N values is more informative than histogram

f. Preview data: Run mcp__qsv__qsv_slice with len: 5 to see actual values and formats.

3. Prepare the Data

Use qsv to prepare visualization-ready data:

Filter: mcp__qsv__qsv_search or mcp__qsv__qsv_sqlp to subset rows
Aggregate: mcp__qsv__qsv_sqlp for GROUP BY, window functions, computed columns
Select columns: mcp__qsv__qsv_select to keep only what's needed
Sort: mcp__qsv__qsv_sqlp with ORDER BY for ordered categories or time series

Export the prepared data to a CSV file for Python to read.

4. Select Chart Type

If the user didn't specify, recommend based on data and question:

Data Relationship	Recommended Chart	How qsv Helps Choose
Trend over time	Line chart	`stats` shows Date/DateTime type
Comparison across categories	Bar chart (horizontal if many)	`frequency` shows category counts; `cardinality` < 20
Part-to-whole composition	Stacked bar or area chart	`frequency` shows proportions; avoid pie unless < 6 categories
Distribution of values	Histogram or box plot	`stats` shows min/max/mean/stddev; `moarstats` shows kurtosis
Correlation between two variables	Scatter plot	`stats` shows two numeric columns
Ranking	Horizontal bar chart	`frequency` with `--limit` for top-N
Matrix of relationships	Heatmap	Two categorical columns with low `cardinality`
Two-variable comparison over time	Dual-axis line or grouped bar	Two numeric columns + one Date column

5. Generate the Visualization

Write Python code using matplotlib + seaborn (default) or plotly (if interactive requested):

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load the prepared CSV
df = pd.read_csv('prepared_data.csv')

# Set professional style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Create figure with appropriate size
fig, ax = plt.subplots(figsize=(10, 6))

# [chart-specific code]

# Always include:
ax.set_title('Clear, Descriptive Title', fontsize=14, fontweight='bold')
ax.set_xlabel('X-Axis Label', fontsize=11)
ax.set_ylabel('Y-Axis Label', fontsize=11)

# Format numbers appropriately
# - Percentages: '45.2%' not '0.452'
# - Currency: '$1.2M' not '1200000'
# - Large numbers: '2.3K' or '1.5M' not '2300' or '1500000'

# Remove chart junk
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig('chart_name.png', dpi=150, bbox_inches='tight')
plt.show()

6. Apply Design Best Practices

Color:

Use a consistent, colorblind-friendly palette
Highlight the key data point or trend with a contrasting color
Grey out less important reference data

Typography:

Descriptive title that states the insight, not just the metric (e.g., "Revenue grew 23% YoY" not "Revenue by Month")
Readable axis labels (not rotated 90 degrees if avoidable)
Data labels on key points when they add clarity

Layout:

Appropriate whitespace and margins
Legend placement that doesn't obscure data
Sort categories by value (not alphabetically) unless there's a natural order

Accuracy:

Y-axis starts at zero for bar charts
No misleading axis breaks without clear notation
Consistent scales when comparing panels
Appropriate precision (don't show 10 decimal places)

7. Save and Present

Save the chart as PNG with descriptive name
Display the chart to the user
Provide the Python code so they can modify it
Suggest variations (different chart type, different grouping, zoomed time range)

Data Preparation Recipes

Time Series

qsv_sqlp: SELECT date_col, SUM(value) as total
  FROM data GROUP BY date_col ORDER BY date_col

Top-N Categories

qsv_frequency: --select category_col --limit 10

Or for aggregated values:

qsv_sqlp: SELECT category, SUM(amount) as total
  FROM data GROUP BY category ORDER BY total DESC LIMIT 10

Distribution

qsv_stats: Check min, max, mean, stddev, cardinality
qsv_moarstats: --advanced for kurtosis, bimodality
qsv_sqlp: SELECT FLOOR(value/10)*10 as bin, COUNT(*) as cnt
  FROM data GROUP BY bin ORDER BY bin

Correlation

qsv_select: Pick the two numeric columns
qsv_stats: Verify both are numeric types with reasonable ranges

Comparison Across Groups

qsv_sqlp: SELECT group_col, AVG(metric) as avg_metric, COUNT(*) as n
  FROM data GROUP BY group_col ORDER BY avg_metric DESC

Notes

Always profile with qsv first — stats and frequency reveal the right chart type and catch data issues before plotting
For large files, use mcp__qsv__qsv_sqlp to aggregate before passing to Python — don't load millions of rows into pandas
If interactive charts are requested (hover, zoom), use plotly instead of matplotlib
Specify "presentation" for larger fonts and higher contrast
Multiple charts can be created at once (e.g., "create a 2x2 grid")
Charts are saved to the current directory as PNG files
For data quality issues discovered during profiling, recommend /data-clean before visualizing