name: stereo-seq-statistical-design description: Use when Stereo-seq/STOmics analysis needs replicate-aware statistical design, condition comparisons, treatment/disease/control inference, pseudobulk DEG, donor/sample/batch-aware models, spatial-domain or cell-type proportion comparisons, mixed-model design checks, graph/spatial organization metrics across conditions, or warnings about pseudo-replication before claiming differential biology.
Stereo-seq Statistical Design
Use This For
- Designing condition, treatment, disease/control, time-point, donor, batch, or replicate comparisons for Stereo-seq data.
- Checking whether a proposed DEG, enrichment, cell-type proportion, domain-size, CCI, or spatial organization comparison has biological replication.
- Running pseudobulk aggregation from bin/cellbin/spatial units to sample/replicate/group-level matrices.
- Planning replicate-level statistical tests after annotation, domain discovery, spatial program scoring, CCI, or 3D reconstruction.
For ordinary marker discovery without replicate inference, use stereo-seq-spatial-programs. For sample metadata setup, use stereo-seq-project-orchestration.
Default Requirements
- Read source_code.md before writing statistical code. The preferred patterns come from spatialLIBD, Stereo-seq article code, and condition-level spatial organization tools.
- If no curated source fits, search code_candidates.tsv and
stereo-seq-publication-story/references/github_code_registry.tsvbefore external search. - State the experimental unit before analysis. For biological condition claims, the experimental unit is usually donor/animal/sample/section group, not individual bins, cells, or spots.
- Do not present per-bin or per-cell
FindMarkersresults as replicate-aware condition inference unless the user explicitly accepts pseudo-replication. - Prefer pseudobulk or replicate-level summaries when there are at least two biological replicates per group.
- Include batch/donor/section covariates in the design table when available and flag full confounding.
- In the final response, state the reused paper, DOI, repository/source file, and dataset-specific edits.
Workflow
- Identify the claim:
- gene-level DEG;
- pathway/program score comparison;
- cell-type/domain proportion shift;
- spatial organization/graph metric;
- CCI or ligand-receptor change;
- 3D/section-level comparison.
- Validate the design table with
scripts/stereo_design_table_qc.py. - Choose the analysis unit:
- pseudobulk by sample/replicate and cell type/domain for DEG;
- replicate-level domain/cell-type proportions for abundance;
- per-sample spatial graph metrics for organization;
- section-level summary for serial sections.
- Use
scripts/pseudobulk_count_matrix_template.pyto aggregate a long count table when the input is not already pseudobulked. - For R/Bioconductor DEG, adapt spatialLIBD or PseudoBulkDEG patterns from source_code.md, preserving design formulas and covariates.
- Report effect sizes, model design, replicate counts, confounding checks, and unsupported claims.
Reusable Article Code
scripts/stereo_design_table_qc.py: design-table validator inspired by spatialLIBD registration statistics, snRNA_NF pseudobulk DEG, Endo.R/GF-SPF condition workflows, and GraphCompass condition-level spatial metrics.scripts/pseudobulk_count_matrix_template.py: long-table pseudobulk count aggregation template for sample/replicate/group-aware DEG handoff.
When using any bundled script, report the paper and original source file from source_code.md.
Output Expectations
- Experimental unit, comparison groups, replicate counts, and covariates.
- Explicit pseudo-replication warning if only spatial units/cells/bins are available.
- Pseudobulk matrix or replicate-level summary table when applicable.
- Design formula or planned statistical model, including batch/donor/section handling.
- Blockers for missing replicate, condition, sample, donor, batch, count, or group columns.
- Reused article code source, DOI, repository, original file name, and dataset-specific edits.