csv-pipeline

star 484

Process, transform, analyze, and report on CSV and JSON data files. Use when the user needs to filter rows, join datasets, compute aggregates, convert formats, deduplicate, or generate summary reports from tabular data. Works with any CSV, TSV, or JSON Lines file.

AIDotNet By AIDotNet schedule Updated 3/20/2026

name: csv-pipeline description: Process, transform, analyze, and report on CSV and JSON data files. Use when the user needs to filter rows, join datasets, compute aggregates, convert formats, deduplicate, or generate summary reports from tabular data. Works with any CSV, TSV, or JSON Lines file. compatibility: Requires Python 3. No external dependencies beyond Python standard library (csv, json modules).

CSV Data Pipeline

Process tabular data (CSV, TSV, JSON, JSON Lines) using standard command-line tools and Python. No external dependencies required beyond Python 3.

When to use this skill

  • User provides a CSV/TSV/JSON file and asks to analyze, transform, or report on it
  • Joining, filtering, grouping, or aggregating tabular data
  • Converting between formats (CSV to JSON, JSON to CSV, etc.)
  • Deduplicating, sorting, or cleaning messy data
  • Generating summary statistics or reports
  • ETL workflows: extract from one format, transform, load into another

Scripts overview

Script Purpose Dependencies
csv_tool.py All-in-one CSV/JSON processing tool Python 3 (stdlib only)

Steps

1. Ensure Python 3 is available

python --version

CRITICAL — Error Recovery: If any script below fails with an error, check that Python 3 is available and re-run the exact same command.

2. Inspect a data file

python scripts/csv_tool.py inspect "DATA_FILE"

Shows row count, column names, and non-empty value counts per column.

3. Filter rows

python scripts/csv_tool.py filter "DATA_FILE" --column COLUMN_NAME --op OPERATOR --value VALUE --output "OUTPUT_FILE"

Operators: eq, neq, gt, gte, lt, lte, contains, startswith, endswith

4. Sort data

python scripts/csv_tool.py sort "DATA_FILE" --column COLUMN_NAME --order asc --output "OUTPUT_FILE"

Options: --numeric for numeric sorting, --order desc for descending.

5. Deduplicate

python scripts/csv_tool.py dedup "DATA_FILE" --columns "col1,col2" --output "OUTPUT_FILE"

Remove duplicates by specified columns (or all columns if omitted).

6. Aggregate / Group By

python scripts/csv_tool.py aggregate "DATA_FILE" --group-by COLUMN --agg-column VALUE_COL --func sum --output "OUTPUT_FILE"

Functions: sum, avg, count, min, max

7. Join two datasets

python scripts/csv_tool.py join "LEFT_FILE" "RIGHT_FILE" --on KEY_COLUMN --how inner --output "OUTPUT_FILE"

Join types: inner, left

8. Convert formats

python scripts/csv_tool.py convert "DATA_FILE" --to json --output "OUTPUT_FILE"

Supported conversions: csv, json, jsonl (JSON Lines), tsv

9. Generate summary report

python scripts/csv_tool.py report "DATA_FILE" --group-by CATEGORY_COL --value-column VALUE_COL --output "report.md"

Generates a Markdown summary table with count, sum, avg, min, max per group.

10. Clean data

python scripts/csv_tool.py clean "DATA_FILE" --output "CLEAN_FILE"

Strips whitespace, normalizes empty values (N/A, null, None → empty), normalizes booleans.

Decision guide

  1. Quick lookinspect to understand the data
  2. Filter/sort/dedup → use the corresponding subcommand
  3. Summarizeaggregate for raw data, report for Markdown output
  4. Combine filesjoin two datasets on a shared key
  5. Change formatconvert between CSV/JSON/TSV

Edge cases

  • Large files (100MB+): The tool processes data in streaming fashion where possible
  • Encoding issues: Files are read as UTF-8 by default. For BOM files, use UTF-8-SIG
  • Quoted fields: Python's csv module handles RFC 4180 quoting automatically
  • Mixed types: Numeric operations attempt float conversion, falling back to 0

Scripts

  • csv_tool.py — All-in-one CSV/JSON data processing tool
Install via CLI
npx skills add https://github.com/AIDotNet/OpenCowork --skill csv-pipeline
Repository Details
star Stars 484
call_split Forks 98
navigation Branch main
article Path SKILL.md
More from Creator