bio-longread-alignment

star 30

Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.

mdbabumiamssm By mdbabumiamssm schedule Updated 2/4/2026

name: bio-longread-alignment description: Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis. tool_type: cli primary_tool: minimap2 measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools: - read_file - run_shell_command

Long-Read Alignment with minimap2

Oxford Nanopore Alignment

# Basic ONT alignment
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

PacBio HiFi Alignment

# PacBio HiFi reads (high accuracy)
minimap2 -ax map-hifi reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

PacBio CLR Alignment

# PacBio CLR (continuous long reads, lower accuracy)
minimap2 -ax map-pb reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam

Pre-Build Index for Multiple Runs

# Build index once
minimap2 -d reference.mmi reference.fa

# Use index for alignment
minimap2 -ax map-ont reference.mmi reads.fastq.gz | samtools sort -o aligned.bam

Common Options

minimap2 -ax map-ont \
    -t 8 \                         # Threads
    -R '@RG\tID:sample\tSM:sample' \  # Read group
    --secondary=no \               # No secondary alignments
    --MD \                         # Generate MD tag for variants
    -Y \                           # Use soft clipping for supplementary
    reference.fa reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

Splice-Aware Alignment (RNA)

# For direct RNA or cDNA sequencing
minimap2 -ax splice reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam

With Junction BED (Known Splice Sites)

# Provide known splice junctions
minimap2 -ax splice --junc-bed junctions.bed \
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam

Assembly to Reference Alignment

# Assembly with ~0.1% divergence
minimap2 -ax asm5 reference.fa assembly.fa > aligned.sam

# Assembly with higher divergence (~5%)
minimap2 -ax asm20 reference.fa assembly.fa > aligned.sam

Output PAF (Faster, No BAM)

# PAF format (faster, for quick analysis)
minimap2 -x map-ont reference.fa reads.fastq.gz > alignments.paf

Keep Secondary and Supplementary

# Keep all alignments (for SV calling)
minimap2 -ax map-ont \
    --secondary=yes \
    -N 5 \                         # Max secondary alignments
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam

Filter Alignments

# During alignment pipeline
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools view -b -q 10 | \     # Min mapping quality 10
    samtools sort -o aligned.bam

Multiple FASTQ Files

# Concatenate inputs
minimap2 -ax map-ont reference.fa reads1.fastq.gz reads2.fastq.gz | \
    samtools sort -o aligned.bam

# Or use file list
cat file_list.txt | xargs minimap2 -ax map-ont reference.fa | \
    samtools sort -o aligned.bam

Output Statistics

# Get alignment statistics
samtools flagstat aligned.bam

# Detailed stats
samtools stats aligned.bam | grep ^SN

Convert PAF to BED

# Extract alignments to BED
awk 'OFS="\t" {print $6, $8, $9, $1, $12, ($5=="+")?"+":"-"}' alignments.paf > alignments.bed

Key Presets

Preset Description Best For
map-ont ONT reads Nanopore genomic
map-hifi PacBio HiFi PacBio genomic
map-pb PacBio CLR PacBio CLR
splice Long RNA reads cDNA, direct RNA
asm5 Low divergence Same species assembly
asm20 High divergence Cross-species assembly
sr Short reads Illumina (basic)

Key Parameters

Parameter Default Description
-t 3 CPU threads
-k 15 K-mer size
-w 10 Minimizer window
-a off Output SAM
-x none Preset
--secondary yes Output secondary
-N 5 Max secondary alignments
--MD off Generate MD tag
-R none Read group header
-Y off Soft clipping for supplementary

Output Formats

Format Flag Description
PAF (default) Pairwise Alignment Format
SAM -a Sequence Alignment Map
BAM -a | samtools Binary SAM

Related Skills

  • medaka-polishing - Polish consensus with medaka
  • structural-variants - Call SVs from alignments
  • alignment-files/sam-bam-basics - BAM manipulation
Install via CLI
npx skills add https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- --skill bio-longread-alignment
Repository Details
star Stars 30
call_split Forks 7
navigation Branch main
article Path SKILL.md
More from Creator
mdbabumiamssm
mdbabumiamssm Explore all skills →