ngs-amplicon-microbiome

star 3.1k

Kick off public 16S, 18S, ITS, COI, or other marker-gene amplicon microbiome workflows using nf-core/ampliseq, QIIME2, DADA2, and Cutadapt.

openai By openai schedule Updated 6/3/2026

name: ngs-amplicon-microbiome description: Kick off public 16S, 18S, ITS, COI, or other marker-gene amplicon microbiome workflows using nf-core/ampliseq, QIIME2, DADA2, and Cutadapt.

Amplicon Microbiome

Use this skill for marker-gene microbiome analysis from amplicon FASTQs.

Essential Inputs

Confirm:

  • marker region: 16S, 18S, ITS, COI, or custom
  • primer sequences and orientation
  • paired-end or single-end reads
  • whether reads should be merged
  • taxonomy database and version
  • sample metadata
  • endpoint: ASV table, taxonomy, diversity, differential abundance, or plots

Public Defaults

Prefer nf-core/ampliseq for reproducible end-to-end runs. Use QIIME2 or DADA2 directly when the user wants notebook-level control or an existing lab protocol requires it.

Preflight

python plugins/ngs-analysis/scripts/ngs_preflight.py --pipeline amplicon_microbiome --emit-install-plan

Local Execution Package

For FASTQ intake/QC before primer, ASV, and taxonomy decisions, use:

python plugins/ngs-analysis/scripts/run_fastq_assay_package.py \
  --lane amplicon_microbiome \
  --sample-sheet amplicon_samples.tsv \
  --execute

This validates read paths and structure, runs seqkit stats and FastQC/MultiQC when available, and writes amplicon_analysis_status.json. The runner now also emits methods/amplicon_methods.json plus a concrete backend handoff bundle under workflow/ so primer, denoiser, truncation, normalization, and taxonomy choices are machine-readable even before a full backend is run.

If the user asks for a full amplicon analysis rather than QC/readiness, do not treat FASTQs alone as sufficient. Require primer sequences, primer orientation, taxonomy database plus version, and sample metadata before presenting the run as analysis-ready. Without that context, run the local execution package and describe the result as a read-QC/readiness bundle only.

For backend ASV/taxonomy/diversity execution when primers, metadata, and taxonomy resources are available, use:

python plugins/ngs-analysis/scripts/run_amplicon_microbiome.py \
  --sample-sheet amplicon_samples.tsv \
  --backend qiime2 \
  --primer-forward GTGYCAGCMGCCGCGGTAA \
  --primer-reverse GGACTACNVGGGTWTCTAAT \
  --taxonomy-classifier silva-138-classifier.qza \
  --metadata sample_metadata.tsv \
  --execute

Use --backend dada2 for a direct R/Bioconductor ASV path. The plugin includes workflows/amplicon_microbiome/run_dada2_backend.R; the runner checks for Rscript and the dada2 R package before execution, then writes normalized ASV, representative-sequence, read-retention, and optional taxonomy tables under tables/.

For nf-core execution, use plugins/ngs-analysis/scripts/run_nfcore_pipeline.py --pipeline ampliseq.

The direct backend runner also emits resources/resource_plan.json, resource_manifest.tsv, resource_env.sh, and resource_readiness.md. The resource check is advisory by default when a QIIME classifier is supplied directly; add --bundle-root silva_138_amplicon=<path>, --include-optional-resources, and --require-resource-plan when missing registered taxonomy databases should block readiness.

The backend runner writes native normalized tables when QIIME2/DADA2/nf-core outputs are present:

  • tables/asv_table.tsv
  • tables/representative_sequences.fasta for direct DADA2 runs
  • tables/taxonomy.tsv
  • tables/read_retention.tsv
  • tables/amplicon_backend_summary.json
  • tables/alpha_diversity.tsv, tables/bray_curtis_distance.tsv, and tables/top_taxa_or_features.tsv when a normalized ASV/feature table is available

QIIME2 BIOM-only feature-table exports are recorded as requiring conversion, with a biom convert command in the backend summary. Do not claim diversity or taxonomy interpretation unless these normalized tables or equivalent supplied inputs exist.

Kickoff Pattern

nf-core preflight run:

nextflow run nf-core/ampliseq \
  -profile test,docker \
  --outdir results/ampliseq_test

Before a real run, verify primer trimming and truncation choices from read-quality profiles.

Visualization Outputs

The local FASTQ package always writes visualizations/index.html and visualizations/visualization_manifest.json. With only FASTQs, this is a read-QC/readiness bundle. If an ASV/feature table is available, pass it to the runner with --asv-table to generate alpha diversity, Bray-Curtis PCoA, and rarefaction artifacts. If a feature taxonomy table is available, pass --taxonomy-table to generate taxa barplots. When downstream tables are labeled synthetic or contain sample columns that are not present in the real sample sheet, the runner marks the run review-only and blocks beta-diversity/PCoA unless --allow-synthetic-diversity is set explicitly.

The run also emits qc_verdict.json and, for amplicon runs, qc_interpretation.json with machine-readable reason codes, a readiness verdict, and follow-on command templates for generating ASV/taxonomy tables and re-rendering plugin-native plots. Backend runs additionally write tables/amplicon_backend_summary.json so exported ASV, taxonomy, read-retention, and BIOM-conversion status are auditable. When a normalized ASV/feature table is available, the backend runner also writes tables/amplicon_diversity_summary.json, visualizations/amplicon_backend_dashboard.html, and SVG plots for sample depth, Shannon diversity, and top taxa/features. If the ASV table is absent, these outputs remain explicitly unavailable rather than inferred from FASTQ QC.

Guardrails

  • Do not choose truncation lengths before looking at quality distributions.
  • Do not mix taxonomy database versions without recording them.
  • Preserve negative controls and extraction blanks in metadata.
Install via CLI
npx skills add https://github.com/openai/plugins --skill ngs-amplicon-microbiome
Repository Details
star Stars 3,124
call_split Forks 365
navigation Branch main
article Path SKILL.md
More from Creator