stereo-seq-statistical-design

star 1

Use when Stereo-seq/STOmics analysis needs replicate-aware statistical design, condition comparisons, treatment/disease/control inference, pseudobulk DEG, donor/sample/batch-aware models, spatial-domain or cell-type proportion comparisons, mixed-model design checks, graph/spatial organization metrics across conditions, or warnings about pseudo-replication before claiming differential biology.

fym0503 By fym0503 schedule Updated 6/4/2026

name: stereo-seq-statistical-design description: Use when Stereo-seq/STOmics analysis needs replicate-aware statistical design, condition comparisons, treatment/disease/control inference, pseudobulk DEG, donor/sample/batch-aware models, spatial-domain or cell-type proportion comparisons, mixed-model design checks, graph/spatial organization metrics across conditions, or warnings about pseudo-replication before claiming differential biology.

Stereo-seq Statistical Design

Use This For

  • Designing condition, treatment, disease/control, time-point, donor, batch, or replicate comparisons for Stereo-seq data.
  • Checking whether a proposed DEG, enrichment, cell-type proportion, domain-size, CCI, or spatial organization comparison has biological replication.
  • Running pseudobulk aggregation from bin/cellbin/spatial units to sample/replicate/group-level matrices.
  • Planning replicate-level statistical tests after annotation, domain discovery, spatial program scoring, CCI, or 3D reconstruction.

For ordinary marker discovery without replicate inference, use stereo-seq-spatial-programs. For sample metadata setup, use stereo-seq-project-orchestration.

Default Requirements

  • Read source_code.md before writing statistical code. The preferred patterns come from spatialLIBD, Stereo-seq article code, and condition-level spatial organization tools.
  • If no curated source fits, search code_candidates.tsv and stereo-seq-publication-story/references/github_code_registry.tsv before external search.
  • State the experimental unit before analysis. For biological condition claims, the experimental unit is usually donor/animal/sample/section group, not individual bins, cells, or spots.
  • Do not present per-bin or per-cell FindMarkers results as replicate-aware condition inference unless the user explicitly accepts pseudo-replication.
  • Prefer pseudobulk or replicate-level summaries when there are at least two biological replicates per group.
  • Include batch/donor/section covariates in the design table when available and flag full confounding.
  • In the final response, state the reused paper, DOI, repository/source file, and dataset-specific edits.

Workflow

  1. Identify the claim:
    • gene-level DEG;
    • pathway/program score comparison;
    • cell-type/domain proportion shift;
    • spatial organization/graph metric;
    • CCI or ligand-receptor change;
    • 3D/section-level comparison.
  2. Validate the design table with scripts/stereo_design_table_qc.py.
  3. Choose the analysis unit:
    • pseudobulk by sample/replicate and cell type/domain for DEG;
    • replicate-level domain/cell-type proportions for abundance;
    • per-sample spatial graph metrics for organization;
    • section-level summary for serial sections.
  4. Use scripts/pseudobulk_count_matrix_template.py to aggregate a long count table when the input is not already pseudobulked.
  5. For R/Bioconductor DEG, adapt spatialLIBD or PseudoBulkDEG patterns from source_code.md, preserving design formulas and covariates.
  6. Report effect sizes, model design, replicate counts, confounding checks, and unsupported claims.

Reusable Article Code

  • scripts/stereo_design_table_qc.py: design-table validator inspired by spatialLIBD registration statistics, snRNA_NF pseudobulk DEG, Endo.R/GF-SPF condition workflows, and GraphCompass condition-level spatial metrics.
  • scripts/pseudobulk_count_matrix_template.py: long-table pseudobulk count aggregation template for sample/replicate/group-aware DEG handoff.

When using any bundled script, report the paper and original source file from source_code.md.

Output Expectations

  • Experimental unit, comparison groups, replicate counts, and covariates.
  • Explicit pseudo-replication warning if only spatial units/cells/bins are available.
  • Pseudobulk matrix or replicate-level summary table when applicable.
  • Design formula or planned statistical model, including batch/donor/section handling.
  • Blockers for missing replicate, condition, sample, donor, batch, count, or group columns.
  • Reused article code source, DOI, repository, original file name, and dataset-specific edits.
Install via CLI
npx skills add https://github.com/fym0503/stereo-seq-skills --skill stereo-seq-statistical-design
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator