nextflow-diagram-creation - SKILL.md Agent Skill

name: nextflow-diagram-creation description: Guide for creating Mermaid diagrams (.mmd files) from Nextflow pipeline code. Use this when asked to create workflow diagrams from subworkflows or main.nf scripts.

Creating Pipeline Diagrams from Nextflow Code

Overview

This skill guides you through analyzing Nextflow pipeline code and creating Mermaid diagrams that accurately represent the workflow structure, data flow, and decision logic.

Diagram Standards

Visual Conventions

Use consistent shapes and colors across all diagrams:

Stadium shapes ([text]) for inputs and outputs (purple #b48ead)
Rectangles [text] for processes (blue #5e81ac)
Diamonds {text} for decision points (orange #d08770)

Color Coding (via classDef)

Always include these class definitions at the end of your diagram:

classDef ioStyle fill:#b48ead,stroke:#d08770,color:#eceff4
classDef processStyle fill:#5e81ac,stroke:#88c0d0,color:#eceff4
classDef decisionStyle fill:#d08770,stroke:#bf616a,color:#eceff4

class InputNode,OutputNode ioStyle
class ProcessNode1,ProcessNode2 processStyle
class DecisionNode decisionStyle

Diagram Types

Subworkflow diagrams: Use graph TD (top-down flowchart)
Main pipeline diagrams: Use graph TD with subgraphs for each subworkflow

Analyzing Subworkflows

Step 1: Identify the Workflow Structure

Read the subworkflow file (e.g., subworkflows/local/<workflow_name>/main.nf) and extract:

Workflow name: From workflow WORKFLOW_NAME {
Inputs: From the take: block
Outputs: From the emit: block
Processes: All PROCESS_NAME() calls in the main: block
Decision points: Conditional logic (if/else statements, when parameters)

Step 2: Map the Data Flow

Trace how data flows through the workflow:

Start with inputs: Create input nodes using stadium shapes
Follow process calls: Each process becomes a rectangular node
Track channel transformations: Use .map, .join, .mix to understand connections
Identify branches: Look for parallel processing or conditional paths
End with outputs: Create output nodes using stadium shapes

Step 3: Identify Decision Points

Look for:

if(parameter) statements
Conditional process execution
Channel branching logic
Optional pipeline sections

Create diamond-shaped decision nodes with labeled edges (e.g., -->|Yes|, -->|No|)

Example Analysis

From subworkflows/local/<workflow_name>/main.nf:

workflow MY_WORKFLOW {
    take:
    input_data      // INPUT
    enable_feature  // PARAMETER
    
    main:
    PROCESS_A(...)             // PROCESS
    PROCESS_B(...)             // PROCESS
    
    if(enable_feature) {       // DECISION
        OPTIONAL_PROCESS(...)  // CONDITIONAL PROCESS
    }
    
    emit:
    result = ch_result         // OUTPUT
}

Becomes:

graph TD
    Input([Input Data]) --> PROC_A[PROCESS_A]
    PROC_A --> PROC_B[PROCESS_B]
    PROC_B --> FeatureCheck{Enable Feature?}
    FeatureCheck -->|Yes| OPT[OPTIONAL_PROCESS]
    FeatureCheck -->|No| Skip[SKIP]
    OPT --> Output([Result])
    
    classDef ioStyle fill:#b48ead,stroke:#d08770,color:#eceff4
    classDef processStyle fill:#5e81ac,stroke:#88c0d0,color:#eceff4
    classDef decisionStyle fill:#d08770,stroke:#bf616a,color:#eceff4
    
    class Input,Output ioStyle
    class PROC_A,PROC_B,OPT,Skip processStyle
    class FeatureCheck decisionStyle

Analyzing Main Pipeline (main.nf)

Step 1: Identify High-Level Structure

Pipeline sections: Look for major workflow stages (QC, alignment, variant calling, etc.)
Subworkflow calls: Find all SUBWORKFLOW_NAME() invocations
Technology or sample-type branches: Identify distinct processing paths (e.g., different input types or platforms)
Conditional sections: Parameters that enable or disable pipeline stages (e.g., run_qc, run_alignment)

Step 2: Use Subgraphs for Subworkflows

CRITICAL: When a subworkflow appears in main.nf, represent it as a subgraph that contains the COMPLETE workflow from its standalone diagram. The subgraph must include ALL processes, decisions, and data flow from the detailed subworkflow diagram.

graph TD
    Input([Samplesheet]) --> TypeCheck{Sample Type?}
    TypeCheck -->|Type A| QC_A[QC & Trim]
    
    subgraph SUB_WORKFLOW["SUB_WORKFLOW Subworkflow"]
        direction TB
        SubInput([Processed Files])
        SubInput --> PROC_A[PROCESS_A]
        PROC_A --> PROC_B[PROCESS_B]
        PROC_B --> PROC_C[PROCESS_C]
        PROC_C --> FeatureCheck{Enable Feature?}
        FeatureCheck -->|Yes| OPT[OPTIONAL_PROCESS]
        FeatureCheck -->|No| SubOutput
        OPT --> SubOutput([Final Output])
    end
    
    QC_A --> SubInput

Important:

Include ALL processes from the subworkflow, not just a simplified view
Show ALL decision points and conditional logic
Maintain the same node names as the standalone diagram
Use direction TB (top-bottom) or direction LR (left-right) to control layout within the subgraph
The subgraph box visually groups the subworkflow while maintaining full detail

Step 3: Maintain Consistency

Critical: The flow inside subgraph boxes must match the standalone subworkflow diagram EXACTLY. This is not a simplified view - it is the complete workflow contained within a visual boundary.

Step 4: Show Processing Branches

Use decision nodes to show distinct processing paths based on input type, platform, or parameter:

graph TD
    Input([Input Data]) --> TypeCheck{Input Type?}
    TypeCheck -->|Type A| PathA[Pipeline Branch A]
    TypeCheck -->|Type B| PathB[Pipeline Branch B]

Process Node Naming

Simplification Rules

Keep process names clear: Use actual process names from the code
Group similar processes: If multiple indexes/conversions, consider grouping
Show key transformations: Highlight important data transformations
Avoid clutter: Don't show every intermediate step if it doesn't add clarity

Examples

Good:

ALIGN_READS → ALIGN_READS
SOME_TOOL as ALIAS_NAME → ALIAS_NAME

Avoid:

Overly generic names like Process1, Process2
Including full Nextflow module paths

Common Patterns

Parallel Processing

Show multiple processes branching from the same input:

graph TD
    Input([Input Files]) --> Caller1[TOOL_A]
    Input --> Caller2[TOOL_B]
    Input --> Caller3[TOOL_C]
    Caller1 --> Merge[Merge Results]
    Caller2 --> Merge
    Caller3 --> Merge

Channel Mixing

Use a Mix node to show where channels are combined:

graph TD
    VCF1([VCF 1]) --> Mix[Mix VCFs]
    VCF2([VCF 2]) --> Mix
    VCF3([VCF 3]) --> Mix
    Mix --> Output([Combined Channel])

Masking/Filtering

Show mask generation and application:

graph TD
    Data([Input]) --> GenMask[Generate Mask]
    Data --> Process[Main Process]
    GenMask --> Apply[Apply Mask]
    Process --> Apply
    Apply --> Output([Masked Output])

Optional Branches

Always show both paths for optional processing:

graph TD
    Input([Data]) --> Check{Enable Feature?}
    Check -->|Yes| Process[Process Data]
    Check -->|No| Skip[SKIP]
    Process --> Output([Result])

Best Practices

Do's

✅ Analyze the entire workflow before starting the diagram ✅ Use consistent node shapes and colors ✅ Label decision edges clearly (Yes/No, Technology names, etc.) ✅ Keep input/output nodes at the top/bottom when possible ✅ Group related processes logically ✅ Show all major decision points ✅ Include classDef styling at the end ✅ Test your diagram with the renderer before finalizing

Don'ts

❌ Don't include every minor channel transformation ❌ Don't use inconsistent node shapes ❌ Don't forget to classify nodes with colors ❌ Don't create overly complex diagrams - simplify when needed ❌ Don't show implementation details (e.g., map, join operations) ❌ Don't mix diagram styles (stick to graph TD) ❌ Don't forget the SKIP node for conditional branches

Workflow Diagram Checklist

Before finalizing a diagram, verify:

All inputs are shown with stadium shapes ([text])
All outputs are shown with stadium shapes ([text])
All processes are shown with rectangles [text]
All decisions are shown with diamonds {text}
Decision edges are labeled (|Yes|, |No|, etc.)
classDef declarations are included
All nodes are assigned to a class (ioStyle, processStyle, decisionStyle)
Flow is clear and logical (top to bottom or left to right)
Diagram renders successfully with theme-nord-dark.css
Diagram matches the actual code logic

Example Complete Workflow

The following illustrates a workflow with parallel processing paths, masking, optional branches, and channel mixing:

graph TD
    Input([Input Files]) --> PROC_A[PROCESS_A]
    PROC_A --> PROC_B[PROCESS_B]
    PROC_B --> INDEX[INDEX_OUTPUT]
    
    PROC_B --> MASK_A[GENERATE_MASK_A]
    Input --> MASK_B[GENERATE_MASK_B]
    
    MASK_A --> APPLY_MASK[APPLY_MASK]
    MASK_B --> APPLY_MASK
    INDEX --> APPLY_MASK
    
    APPLY_MASK --> RENAME[RENAME_OUTPUT]
    RENAME --> MainOut([Primary Output])
    
    Input --> PROC_C[PROCESS_C]
    PROC_C --> SUMMARISE[SUMMARISE_RESULTS]
    
    SUMMARISE --> OptCheck{Enable Optional?}
    INDEX --> OptCheck
    
    OptCheck -->|Yes| OPT[OPTIONAL_PROCESS]
    OptCheck -->|No| Skip[SKIP]
    
    OPT --> POST_OPT[POST_PROCESS_OPT]
    POST_OPT --> OptOut([Optional Output])
    
    Input --> CALLER_A[CALLER_A]
    CALLER_A --> IDX_A[INDEX_A]
    IDX_A --> OutA([Output A])
    
    Input --> CALLER_B[CALLER_B]
    CALLER_B --> IDX_B[INDEX_B]
    IDX_B --> OutB([Output B])
    
    PROC_A --> MainVCF([Primary VCF])
    
    MainVCF --> MIX[Mix Outputs]
    OutA --> MIX
    OutB --> MIX
    
    classDef ioStyle fill:#b48ead,stroke:#d08770,color:#eceff4
    classDef processStyle fill:#5e81ac,stroke:#88c0d0,color:#eceff4
    classDef decisionStyle fill:#d08770,stroke:#bf616a,color:#eceff4
    
    class Input,MainOut,OptOut,OutA,OutB,MainVCF ioStyle
    class PROC_A,PROC_B,INDEX,MASK_A,MASK_B,APPLY_MASK,RENAME,PROC_C,SUMMARISE,OPT,POST_OPT,CALLER_A,IDX_A,CALLER_B,IDX_B,MIX,Skip processStyle
    class OptCheck decisionStyle

Storage and Organization

Store diagrams in structured directories mirroring the pipeline layout:

docs/diagrams/
├── subworkflows/
│   ├── <subworkflow_name>/
│   │   ├── <subworkflow_name>.mmd
│   │   ├── <subworkflow_name>.svg
│   │   └── index.html
│   └── <other_subworkflow>/
│       ├── <other_subworkflow>.mmd
│       ├── <other_subworkflow>.svg
│       └── index.html
└── main/
    ├── main_pipeline.mmd
    ├── main_pipeline.svg
    └── index.html

Next Steps

After creating a .mmd file:

Render it using the diagram-rendering skill
Generate HTML documentation for the diagram (see diagram-rendering skill)
Review the diagram with the rendered SVG
Iterate if the flow doesn't match the code exactly