name: stereo-seq-quality-control-preprocessing description: Use when Stereo-seq or STOmics data needs QC, preprocessing, raw GEM/GEF/SAW loading, raw-to-count-matrix export, binning, bin/cell filtering, mitochondrial/count/gene QC maps, StereoPy output handling, GEM-to-Seurat conversion, or export of cleaned h5ad/RDS objects before downstream analysis.
Stereo-seq Quality Control Preprocessing
Use This For
- Loading Stereo-seq GEM/GEF/SAW/StereoPy outputs and creating binned or filtered analysis objects.
- Converting raw Stereo-seq GEM rows into a sparse gene-by-bin/cell count matrix plus coordinate metadata before downstream tools.
- Plotting QC metrics in tissue coordinates before domain, mapping, CCI, trajectory, or GRN analysis.
- Converting GEM-style count coordinates into Seurat-compatible spatial objects or tabular QC summaries.
Default Requirements
- Use bundled article-derived scripts in
scripts/before writing preprocessing code or searching external repositories. - Read source_code.md and match templates by input object, spatial unit, and preprocessing goal. Do not hard-code tissue-to-tool rules.
- If no curated preprocessing entry fits, search code_candidates.tsv for additional article-linked repositories and reusable files before external search.
- Inspect local Python/R environments before running. Prefer
conda run -n stereo-skills-py python ...for general Python scripts andconda run -n stereo-skills-r Rscript ...for R scripts. For StereoPy/GEF scripts that importstereo, useenvs/environment-python-stereopy.ymlifstereois not already available. Ifstereo,Seurat,Matrix, or another required package is missing, stop and tell the user exactly what is missing and which preprocessing step is blocked. - Preserve raw count outputs and record filtering thresholds.
- When converting raw GEM to a count matrix, export matrix, feature, barcode, coordinate, and provenance files; do not only create a plotting object.
- QC plots should use Arial, readable labels, equal-aspect tissue maps, and legends outside the data where possible.
- In the final response, state the reused paper, DOI, code repository or code DOI, original file, and dataset-specific edits.
Workflow
- Identify the raw input type: GEF, GEM, h5ad, Seurat RDS, or SAW output folder.
- Define the spatial unit and resolution: DNB, bin, pseudo-spot, or segmented cellbin.
- For raw GEM-to-count-matrix work, run
scripts/raw_gem_to_count_matrix_template.pyfirst unless StereoPy/SAW output is required by the user. - Read source_code.md, then adapt the closest local script for GEF, Seurat, h5ad, or cellbin workflows.
- Export cleaned object plus QC tables and figures.
- Report thresholds, discarded units/genes if calculated, matrix dimensions, spatial-unit ID convention, and provenance.
Reusable Article Code
scripts/endo_stereopy_qc_preprocessing_template.py: adapted from Endo.R StereoPy binning/QC for GEF-to-h5ad preprocessing.scripts/raw_gem_to_count_matrix_template.py: raw GEM-to-sparse-matrix/count-table exporter for gene-by-spatial-unit matrices, coordinates, QC metrics, and provenance.scripts/gf_gem_to_seurat_qc_template.R: adapted from GF/SPF cecum GEM-to-Seurat and spatial QC plotting.scripts/h5ad_spatial_qc_overview_template.py: adapted from Endo.R, human cortex, and GF/SPF cecum QC/stat plotting patterns for h5ad count/feature/spatial overview figures.
Output Expectations
- Cleaned h5ad/RDS or binned count object.
- Raw count matrix outputs when requested: Matrix Market
.mtx,features.tsv,barcodes.tsv,coordinates.tsv,qc.tsv, and provenance JSON. - QC tables for counts, detected genes, optional mitochondrial fraction, and coordinate bounds.
- Spatial QC maps and count/gene distributions.
- Filtering thresholds and whether they came from the template default or dataset-specific adaptation.
- Reused article code source, paper DOI, repository or code DOI, original file name, and dataset-specific edits.