stereo-seq-statistical-design - SKILL.md Agent Skill

name: stereo-seq-statistical-design description: Use when Stereo-seq/STOmics analysis needs replicate-aware statistical design, condition comparisons, treatment/disease/control inference, pseudobulk DEG, donor/sample/batch-aware models, spatial-domain or cell-type proportion comparisons, mixed-model design checks, graph/spatial organization metrics across conditions, or warnings about pseudo-replication before claiming differential biology.

Stereo-seq Statistical Design

Use This For

Designing condition, treatment, disease/control, time-point, donor, batch, or replicate comparisons for Stereo-seq data.
Checking whether a proposed DEG, enrichment, cell-type proportion, domain-size, CCI, or spatial organization comparison has biological replication.
Running pseudobulk aggregation from bin/cellbin/spatial units to sample/replicate/group-level matrices.
Planning replicate-level statistical tests after annotation, domain discovery, spatial program scoring, CCI, or 3D reconstruction.

For ordinary marker discovery without replicate inference, use stereo-seq-spatial-programs. For sample metadata setup, use stereo-seq-project-orchestration.

Default Requirements

Read source_code.md before writing statistical code. The preferred patterns come from spatialLIBD, Stereo-seq article code, and condition-level spatial organization tools.
If no curated source fits, search code_candidates.tsv and stereo-seq-publication-story/references/github_code_registry.tsv before external search.
State the experimental unit before analysis. For biological condition claims, the experimental unit is usually donor/animal/sample/section group, not individual bins, cells, or spots.
Do not present per-bin or per-cell FindMarkers results as replicate-aware condition inference unless the user explicitly accepts pseudo-replication.
Prefer pseudobulk or replicate-level summaries when there are at least two biological replicates per group.
Include batch/donor/section covariates in the design table when available and flag full confounding.
In the final response, state the reused paper, DOI, repository/source file, and dataset-specific edits.

Workflow

Identify the claim:
- gene-level DEG;
- pathway/program score comparison;
- cell-type/domain proportion shift;
- spatial organization/graph metric;
- CCI or ligand-receptor change;
- 3D/section-level comparison.
Validate the design table with scripts/stereo_design_table_qc.py.
Choose the analysis unit:
- pseudobulk by sample/replicate and cell type/domain for DEG;
- replicate-level domain/cell-type proportions for abundance;
- per-sample spatial graph metrics for organization;
- section-level summary for serial sections.
Use scripts/pseudobulk_count_matrix_template.py to aggregate a long count table when the input is not already pseudobulked.
For R/Bioconductor DEG, adapt spatialLIBD or PseudoBulkDEG patterns from source_code.md, preserving design formulas and covariates.
Report effect sizes, model design, replicate counts, confounding checks, and unsupported claims.

Reusable Article Code

scripts/stereo_design_table_qc.py: design-table validator inspired by spatialLIBD registration statistics, snRNA_NF pseudobulk DEG, Endo.R/GF-SPF condition workflows, and GraphCompass condition-level spatial metrics.
scripts/pseudobulk_count_matrix_template.py: long-table pseudobulk count aggregation template for sample/replicate/group-aware DEG handoff.

When using any bundled script, report the paper and original source file from source_code.md.

Output Expectations

Experimental unit, comparison groups, replicate counts, and covariates.
Explicit pseudo-replication warning if only spatial units/cells/bins are available.
Pseudobulk matrix or replicate-level summary table when applicable.
Design formula or planned statistical model, including batch/donor/section handling.
Blockers for missing replicate, condition, sample, donor, batch, count, or group columns.
Reused article code source, DOI, repository, original file name, and dataset-specific edits.