name: nextflow-pipeline-debugging description: Guide for analyzing pipeline output and debugging Nextflow workflows. Use this when you need to inspect channel contents, trace process execution, or analyze intermediate files.
Analyzing and Debugging Pipeline Output
This guide covers techniques for analyzing pipeline output and debugging Nextflow workflows effectively.
Prerequisites
- Nextflow pipeline execution (at least one run completed or in progress)
- Access to the pipeline working directory and results
Table of Contents
- Inspecting Channel Contents
- Using Workflow Trace Files
- Analyzing the Results Folder
- Working with the Work Directory
- Common Debugging Strategies
Inspecting Channel Contents
Using .view() to Debug Channels
The .view() operator is the simplest way to inspect what's flowing through your channels during pipeline execution.
Basic usage:
// View all channel contents
ch_data.view()
// View with a custom label
ch_data.view { "Processing: $it" }
// View with structured output
ch_data.view { meta, file ->
"Sample: ${meta.id}, File: ${file.name}"
}
When to use .view():
- Debugging data structure issues (meta maps, file paths)
- Verifying channel emissions after operators
- Checking data flow between processes
- Confirming multiplicity (how many items are emitted)
Example debugging scenario:
// Problem: Not sure what structure the channel has
ch_input
.view { "Before map: $it" } // Debug original structure
.map { meta, bam, bai -> [meta, bam] }
.view { "After map: $it" } // Debug transformed structure
.set { ch_processed }
Using Workflow Trace Files
Understanding Execution Traces
Nextflow generates trace files that provide detailed information about each process execution.
Default location:
results/pipeline_info/execution_trace_YYYY-MM-DD_HH-MM-SS.txt
Reading Trace Files
The trace file is a tab-delimited file with columns including:
task_id: Unique task identifierhash: Work directory hash (maps towork/XX/YYYYYY...)name: Process namestatus: COMPLETED, FAILED, CACHED, etc.exit: Exit code (0 = success)submit,start,complete: Timestampsduration,realtime: Execution times%cpu,%mem: Resource usagerss,vmem,peak_rss,peak_vmem: Memory metricsrchar,wchar: I/O metrics
Finding Failed Tasks
The CLI output of nextflow will indicate if any tasks failed along with the workfolder hash. You can also use the trace file to find more details about these tasks.
# Find all failed tasks
grep -v "COMPLETED" results/pipeline_info/execution_trace_*.txt | grep -v "CACHED"
# Find tasks with non-zero exit codes
awk -F'\t' '$6 != 0 && NR > 1 {print $2, $3, $4, $6}' results/pipeline_info/execution_trace_*.txt
# Find the work directory for a specific process
grep "PROCESS_NAME" results/pipeline_info/execution_trace_*.txt | awk -F'\t' '{print $2}'
Analyzing the Results Folder
Published Outputs
The results folder contains outputs that have been explicitly published. Use results folders to verify a tool is producing expected outputs, check for anomalies, and compare across samples.
Typical structure:
results/
├── pipeline_info/ # Trace, timeline, DAG, reports
├── [process_name]/ # Process-specific outputs
│ ├── sample1_output.txt
│ └── sample2_output.txt
└── multiqc/ # Quality control reports (if applicable)
Working with the Work Directory
Understanding the Work Directory
Each process execution creates a unique subdirectory in work/ containing:
- Staging area: Input files (symlinks or copies)
- Output files: All files generated by the process
.command.*files: Execution metadata and logs
Work directory structure:
work/
└── XX/
└── YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY/
├── input_file.txt -> /path/to/actual/file
├── output_file.txt
├── .command.sh # The actual script executed
├── .command.run # Wrapper script (with container/env)
├── .command.out # stdout
├── .command.err # stderr
├── .command.log # Combined log
├── .command.begin # Start timestamp
└── .exitcode # Exit code
Finding the Work Directory for a Process
Method 1: Using execution trace
# Get the hash for a specific process/sample
grep "PROCESS_NAME.*sample_id" results/pipeline_info/execution_trace_*.txt | \
awk -F'\t' '{print $2}'
# Navigate to work directory (hash format is XX/YYYYYY...)
cd work/[hash]
Method 2: Using Nextflow CLI
- Use the CLI output during execution to find the work directory hash for failed tasks.
Debugging with Work Directory Files
Inspect what command was run:
cat .command.sh # The actual command
cat .command.run # Full execution wrapper (with container)
Check outputs and errors:
cat .command.out # Standard output
cat .command.err # Standard error
cat .command.log # Combined log
cat .exitcode # Exit code (0 = success)
Common Debugging Strategies
- Start with the CLI output: Look for any error messages or failed tasks indicated in the terminal output during execution.
- Use
.view()to inspect channels: Add.view()operators at key points in your workflow to check the structure and contents of channels. - Check the execution trace: Use the trace files to find failed tasks, their work directory hashes, and resource usage patterns.
- Inspect the work directory: For failed tasks, navigate to the corresponding work directory and check the command scripts, outputs, and logs for clues about what went wrong.
- Compare outputs: If some samples succeed and others fail, compare the outputs and logs between them to identify differences that may indicate the issue.