name: genomics-sv-detection
description: Load when summarising structural variants from an SV VCF (DEL / DUP / INV / TRA) — BND-notation parsing, size classification, per-type counts. Skip when working with small SNVs / indels (use genomics-variant-calling) or calling SVs from BAM (run Manta / Delly / Sniffles first).
version: 0.5.0
author: OmicsClaw
license: MIT
tags:
- genomics
- structural-variants
- sv
- manta
- delly
- sniffles
- bnd requires:
- pandas
- numpy
genomics-sv-detection
When to use
The user has an SV VCF (from Manta, Delly, Lumpy, Sniffles, etc.) and wants per-type counts (DEL / DUP / INV / TRA / INS), size classification (small 50 bp–1 kb / medium 1 kb–100 kb / large 100 kb–10 Mb / very-large > 10 Mb), and BND breakend resolution.
The script does NOT call SVs from a BAM. Run an external SV caller first; this skill summarises its VCF output.
Inputs & Outputs
| Input | Format | Required |
|---|---|---|
| Structural variants | .vcf (SV-flavoured: SVTYPE in INFO and/or BND ALT rows) |
yes (unless --demo) |
| Output | Path | Notes |
|---|---|---|
| SV table | tables/structural_variants.csv |
per-SV CHROM/POS/SVTYPE/SVLEN/size_class |
| Report | report.md + result.json |
always |
Flow
- Load VCF (
--input <sv.vcf>) or generate a demo SV VCF atoutput_dir/demo_structural_variants.vcfwith--n-svsrecords (sv_detection.py:170). - Parse records; read
INFO/SVTYPE(sv_detection.py:103). Records withoutINFO/SVTYPE(e.g. pure BNDALTnotation from Manta) classify asUNKNOWN— there is NO BND-to-TRA resolution. - Compute
abs(SVLEN)for size classification (sv_detection.py:105); bin into size classes; aggregate per-type counts. - Write
tables/structural_variants.csv(sv_detection.py:343) +report.md+result.json(:346).
Gotchas
- No SV caller is invoked. This skill ingests an SV VCF — it does NOT run Manta / Delly / Lumpy / Sniffles. To CALL SVs, run an external pipeline first.
--inputREQUIRED unless--demo.sv_detection.py:330raisesValueError("--input required when not using --demo"); non-existent paths raiseFileNotFoundErrorat:333.--n-svsonly affects--demo(sv_detection.py:319, default 100). Silently ignored when--inputis set.- Pure BND records without
INFO/SVTYPEclassify asUNKNOWN.sv_detection.py:103reads onlyINFO/SVTYPE; there is no BNDALT-notation parser and noMATEIDpairing logic. Manta callsets that emit translocations as paired BND records (without anSVTYPE=TRAINFO field) will appear as UNKNOWN, not TRA. Pre-process withbcftools view -i 'INFO/SVTYPE!=""'or with a Manta-specific BND→TRA resolver upstream. SVLENis stored as absolute value in the CSV.sv_detection.py:105writesabs(int(info.get("SVLEN", end - pos)))— a 1234-bp deletion becomes1234in the CSV regardless of the input sign. The original signedSVLENis NOT preserved.- Demo VCF mixes DEL / DUP / INV / TRA at fixed proportions. Useful for orchestrator smoke tests; not biologically meaningful.
Key CLI
# Demo (100 synthetic SVs)
python omicsclaw.py run genomics-sv-detection --demo --output /tmp/sv_demo
# Custom demo size
python omicsclaw.py run genomics-sv-detection --demo --n-svs 500 \
--output /tmp/sv_demo_large
# Real SV VCF
python omicsclaw.py run genomics-sv-detection \
--input manta_diploid.vcf --output results/
See also
references/parameters.md— every CLI flagreferences/methodology.md— SVTYPE / BND semantics, size-class boundariesreferences/output_contract.md—tables/structural_variants.csvschema- Adjacent skills:
genomics-alignment(upstream — provides BAMs for SV callers),genomics-variant-calling(parallel — small SNVs / indels),genomics-cnv-calling(parallel — copy-number from depth, complementary to SV callers),genomics-variant-annotation(downstream — functional impact of breakpoints)