name: vera-data-continuous-reviewing description: >- Runs distribution diagnostics and primary hypothesis tests for continuous outcome variables. Produces Shapiro-Wilk normality check, skewness, kurtosis, Q-Q plot, and one fully interpreted group comparison (Welch's t for 2 groups or ANOVA with Tukey HSD for 3+ groups) with effect sizes and nonparametric confirmation. Ends with a recommendation block listing Outputs .R and .py scripts with 2 publication-quality plots. Triggered when user has a continuous/numeric outcome and says "analyze continuous outcome," "my DV is numeric," "compare group means," or names a continuous variable like weight, score, income, time, cost, mpg, blood pressure. Does not handle binary, count, survival, ordinal, repeated measures, or SEM outcomes. allowed-tools: Read, Bash, Write, Edit
Continuous Outcome — Distribution Diagnostics & Hypothesis Testing
Table of Contents
- Scope Boundary
- Workflow
- Decision Tree
- Required Inputs
- Code Structure
- Reporting Standards
- Hypothesis Tests
- Example Dataset
- Method Status
- Minimal Smoke Test
- Cross-Skill Interface
Open-source skill.
Scope Boundary
Use this skill when:
- The outcome is a single continuous variable and the first need is a transparent baseline comparison across groups.
- A primary t-test / ANOVA-style analysis is appropriate before regression or nonlinear exploratory work.
Do not use this skill when:
- The design is repeated / paired, multivariate, or time-indexed.
- The outcome is binary, count, survival, ordinal, or SEM-based rather than continuous.
Workflow
Read each step file in workflow/ before executing that step.
| Step | Responsibility | Executor | Document | Input | Output |
|---|---|---|---|---|---|
| Collect | Collect Inputs | Main Agent | workflow/step01-collect-inputs.md |
User input | Structured input summary |
| Diagnose | Check Distribution | Main Agent | workflow/step02-check-distribution.md |
Prior step output | PART 1 code block |
| Test | Run Primary Test | Main Agent | workflow/step03-run-primary-test.md |
Prior step output | PART 2-3 code blocks |
Decision Tree
1. CHECK DISTRIBUTION
├── Normal (Shapiro-Wilk p ≥ .05, |skewness| < 1) → parametric primary
└── Non-normal → nonparametric primary + recommend QR/trees
2. GROUP COMPARISON
├── 2 groups → Welch's t + Cohen's d + Mann-Whitney U
└── 3+ groups → ANOVA + η² + Tukey HSD + Kruskal-Wallis
Required Inputs
| Role | What to collect |
|---|---|
| Outcome (Y) | Variable name, units, what it measures |
| Group variable | What defines groups, how many levels |
| Predictors | For recommendation block (not executed) |
| Covariates | For recommendation block (not executed) |
Code Structure
PART 0: Setup & Data Loading
PART 1: Distribution Diagnostics → plot_01_distribution.png
PART 2: Primary Hypothesis Test → plot_02_boxplot_[var].png
PART 3: Recommendation Block → text listing additional analyses available
Reporting Standards
- p-values: "< .001" not "0.000"; exact to 3 decimals otherwise
- Effect sizes: Cohen's d (t-test), η² (ANOVA) — always alongside p
- 95% CIs: always for mean differences
- Degrees of freedom: always with t and F statistics
- Sample size: final analytic N
- Decimal places: 2 for M/SD, 3 for p and effect sizes
- Non-significance: "not statistically significant at α = .05" — never "no effect"
- Normality check: Shapiro–Wilk W, p on residuals (not Y directly) for t-test and ANOVA; also report skewness and kurtosis. With large samples (N > 200), Shapiro–Wilk is over-sensitive — trivial deviations from normality reach significance. When N > 200, rely primarily on visual inspection (Q–Q plot) and skewness/kurtosis magnitudes; treat Shapiro–Wilk p as a supporting diagnostic, not a gatekeeper.
- Variance homogeneity: Levene's test (F, p) before choosing Student's t vs Welch's t or pooled ANOVA vs Welch ANOVA. Default to Welch's t (robust to unequal variances) unless Levene p ≥ .05 and sample sizes are balanced.
Hypothesis Tests
| Scenario | Normal (equal var) | Normal (unequal var) | Non-Normal |
|---|---|---|---|
| 2 independent groups | Student's t + Tukey HSD (optional) | Welch's t (default) | Mann-Whitney U |
| 3+ independent groups | ANOVA + Tukey HSD | Welch ANOVA + Games-Howell post-hoc | Kruskal-Wallis + Dunn's |
When Levene's test indicates heterogeneous variances across 3+ groups, use Welch ANOVA for the omnibus test and Games-Howell for pairwise comparisons — Tukey HSD assumes equal variances and inflates Type I error when variances differ. Report which post-hoc was used and why.
Paired/repeated designs → vera-data-repeated-reviewing.
Example Dataset
R built-in mtcars: outcome = mpg, 2-group = am, 3+ group = cyl.
Python: sm.datasets.get_rdataset("mtcars").data (with offline fallback to bundled examples/mtcars.csv).
Method Status
| Status | Methods |
|---|---|
| Implemented in this skill | Residual normality diagnostics, Levene-informed Welch / ANOVA branching, post-hoc tests, and nonparametric confirmation |
Implemented downstream in vera-data-continuous-generating |
Extended group comparisons, OLS, quantile regression, subgroup analysis, and exploratory tree-based models |
| Out of scope in this open-source baseline | Repeated-measures, mixed-effects, and any continuous-outcome workflow that depends on a different data structure |
Minimal Smoke Test
- Smoke-test prompt: "Run
vera-data-continuous-reviewingonmtcars, usingmpgas the outcome andamas the primary grouping variable. Produce the standard baseline artifacts."
Cross-Skill Interface
Output:
├── code_r → .R script
├── code_python → .py script
├── figures/ → 2 PNGs (distribution + boxplot)
└── recommendations → text block (additional analyses available)
Next step: Invoke vera-data-continuous-generating from this skillset to run the full pipeline (additional tests, subgroup analysis, modeling, manuscript generation). See ../../CROSS-SKILL-INTERFACE.md for the shared handoff contract.