name: genomics-qc
description: Load when running pre-alignment FASTQ quality control — Phred quality scores, Q20/Q30 rates, GC / N content, read-length distribution, adapter-contamination detection. Skip when working with already-aligned BAMs (use genomics-alignment) or when peak / variant files are the input (use the relevant downstream skill).
version: 0.5.0
author: OmicsClaw
license: MIT
tags:
- genomics
- qc
- fastq
- phred
- adapter
- fastqc requires:
- pandas
- numpy
genomics-qc
When to use
The user has a raw FASTQ file (.fastq or .fastq.gz) and wants
standard pre-alignment QC: total reads, mean Phred quality, Q20 /
Q30 rates, GC / N content, mean read length, adapter contamination
percentage, per-base quality profile. This skill mirrors a subset
of FastQC / fastp metrics in pure Python.
It does NOT trim adapters or filter reads — it only measures.
For BAM-level alignment QC use genomics-alignment.
Inputs & Outputs
| Input | Format | Required |
|---|---|---|
| Raw reads | .fastq / .fastq.gz (Phred+33) |
yes (unless --demo) |
| Output | Path | Notes |
|---|---|---|
| QC summary | tables/qc_metrics.csv |
one-row metrics table |
| Per-base quality | tables/per_base_quality.csv |
mean Phred per cycle |
| Read-length distribution | tables/read_length_distribution.csv |
length / count, top-20 lengths only (genomics_qc.py:287) |
| Reproducibility | reproducibility/ |
params + lineage |
| Report | report.md + result.json |
always |
Flow
- Load FASTQ (
--input <reads.fastq[.gz]>) or synthesise demo reads atoutput_dir/demo_reads.fastq(genomics_qc.py:170). - Stream up to
--max-readsrecords (default 500_000); aggregate Phred / GC / N / length stats. - Detect adapter contamination via fixed adapter motif scan.
- Write
tables/qc_metrics.csv(genomics_qc.py:272) +tables/per_base_quality.csv(genomics_qc.py:279) +report.md+result.json.
Gotchas
--max-readsdefaults to 500 000 (genomics_qc.py:247). For very deep libraries this is a hard cap — increase it for full-flowcell QC. Reads beyond the cap are silently ignored.- Empty FASTQ raises
ValueError("No reads found in {fastq_path}")atgenomics_qc.py:138. A truncated upload manifests as exit-1; check the file size first. --inputREQUIRED unless--demo.genomics_qc.py:258raisesValueError("--input required when not using --demo"); non-existent paths raiseFileNotFoundErrorat:261.- No trimming or filtering happens here. This is a pure measurement skill — to actually trim adapters or quality-filter, run fastp / Trimmomatic outside OmicsClaw before re-running this for post-trim QC.
- Phred encoding is assumed Phred+33. Old Solexa / Illumina 1.3+ Phred+64 files would mis-score; the script does NOT auto-detect encoding.
- Demo writes a synthetic FASTQ into
output_dir.genomics_qc.py:170writesdemo_reads.fastqdirectly into the user-supplied output directory — re-running--demooverwrites silently.
Key CLI
# Demo (synthetic FASTQ)
python omicsclaw.py run genomics-qc --demo --output /tmp/qc_demo
# Real FASTQ
python omicsclaw.py run genomics-qc \
--input reads.fastq.gz --output results/ \
--max-reads 1000000
See also
references/parameters.md— every CLI flagreferences/methodology.md— Phred / Q20 / Q30 definitions, adapter motifsreferences/output_contract.md—tables/qc_metrics.csv+ per-base schema- Adjacent skills:
genomics-alignment(downstream — alignment QC after mapping),bulkrna-read-qc(parallel — RNA-seq-flavoured FASTQ QC),sc-qc(parallel — single-cell QC after mapping)