name: bioqc-mcp description: Automated sequencing quality control and advanced visualization wrapping FastQC, MultiQC, and custom chart generation. Exposes an MCP stdio server for live AI integration alongside a ClawBio CLI runner. license: MIT metadata: version: 0.1.0 author: Dr. Babajan Banaganapalli domain: genomics tags:
- qc
- fastqc
- multiqc
- visualization
- sequencing
- mcp inputs:
- name: input_dir
type: directory
format:
- any description: Directory containing FASTQ files to analyze required: true outputs:
- name: report
type: file
format:
- md description: ClawBio markdown quality control summary
- name: html_report
type: file
format:
- html
description: Interactive MultiQC HTML report
dependencies:
python: '>=3.11'
endpoints:
cli: python skills/bioqc-mcp/bioqc_mcp.py --input {input_dir} --output {output_dir}
openclaw:
requires:
bins:
- python3
- fastqc
- multiqc always: false emoji: ๐ homepage: https://github.com/Babajan-B/BioQC-MCP os:
- darwin
- linux install:
- kind: pip package: multiqc trigger_keywords:
- bioqc
- fastqc mcp
- multiqc mcp
- automated qc pipeline
- mcp qc
- fastq quality control
- sequencing quality control
- generate chart qc
- html
description: Interactive MultiQC HTML report
dependencies:
python: '>=3.11'
endpoints:
cli: python skills/bioqc-mcp/bioqc_mcp.py --input {input_dir} --output {output_dir}
openclaw:
requires:
bins:
๐ BioQC (FastQC & MultiQC MCP)
You are BioQC Reporter, a specialised ClawBio agent for executing automated sequencing quality control pipelines, parsing QC reports, and generating custom visualizations. Your role is to run FastQC/MultiQC, extract quality scores and GC content, and produce beautiful visual summaries.
Trigger
Fire this skill when the user says any of:
- "run quality control on these FASTQ files"
- "run bioqc pipeline"
- "execute fastqc and multiqc"
- "mcp qc analysis"
- "generate charts for my FASTQ quality"
- "find all fastq files and run qc"
- "analyze fastq reports and visualize"
Do NOT fire when:
- The user only wants to run MultiQC on pre-existing tool outputs โ route to
multiqc-reporter - The user wants differential expression analysis โ route to
rnaseq-de - The user wants single-cell RNA-seq clustering โ route to
scrna-orchestrator
Why This Exists
- Without it: Running FastQC, aggregating with MultiQC, parsing text-based logs, and rendering publication-ready custom visualizations requires chaining multiple command line tools and writing verbose Matplotlib scripts.
- With it: A single command runs the full quality control workflow, extracts detailed metrics (per base quality, GC content), generates beautiful custom charts, and compiles a comprehensive Markdown summary.
- Why ClawBio: Merges the local-first execution pipeline with rich data visualizations (20+ chart types) and exposes a full stdio-based MCP server for interactive AI agent environments (like Cursor/Claude Desktop).
Core Capabilities
- Automated QC Execution: Automatically finds FASTQ files, runs FastQC on threads, and aggregates results via MultiQC.
- Quality Metric Extraction: Parses FastQC
summary.txtandfastqc_data.txtto extract exact base quality and GC content distributions. - Advanced Visualizations: Generates 20+ publication-quality chart types (line, violin, bar, scatter, heatmaps, box plots) using Matplotlib and Seaborn.
- Dual CLI/MCP Interface: Runs as a standard ClawBio CLI skill or starts an MCP stdio server to expose its tools directly to AI agents (Cursor, Claude Desktop).
Scope
One skill, one task. This skill executes quality control pipelines on sequencing data and generates visualizations. It does not perform alignment, trimming, or downstream differential expression.
Input Formats
| Format | Extension | Notes |
|---|---|---|
| Sequencing reads | .fastq, .fq, .fastq.gz, .fq.gz |
Single or paired-end FASTQ reads |
| Plot/Chart data | .json |
Structured JSON representing data points for visualization |
Workflow
When the user requests QC analysis or chart generation:
- Verify: Ensure
fastqcandmultiqcare installed on the host system. - Scan: Scan the input directory to discover all valid FASTQ files.
- Analyze: Run FastQC in parallel on all samples, then run MultiQC to aggregate.
- Extract: Parse
fastqc_data.txtto extract per-base quality and GC content distributions. - Visualize: Render custom Seaborn/Matplotlib charts and save them in the
figures/directory. - Report: Compile a consolidated
report.mdwith quality tables, images, and the ClawBio disclaimer. - Bundle: Write a standard
reproducibility/bundle.
CLI Reference
# Run full QC pipeline
python skills/bioqc-mcp/bioqc_mcp.py --input <fastq_dir> --output <output_dir>
# Run in MCP stdio server mode (add to claude_desktop_config.json or cursor mcp.json)
python skills/bioqc-mcp/bioqc_mcp.py --mode mcp
# Generate a custom chart from JSON data
python skills/bioqc-mcp/bioqc_mcp.py --mode chart --chart-type violin --chart-data data.json --output <output_dir>
# Run demo mode (runs complete pipeline on synthetic data)
python skills/bioqc-mcp/bioqc_mcp.py --demo --output /tmp/bioqc_demo
Demo
To verify the skill works:
python clawbio.py run bioqc --demo
Expected output: A parsed quality control report in /tmp/bioqc_demo/report.md covering 2 synthetic samples, custom base quality and GC content distribution plots in /tmp/bioqc_demo/figures/, and a standard ClawBio reproducibility bundle.
Example Output
Running python clawbio.py run bioqc --demo produces:
output/bioqc-demo-<timestamp>/
โโโ report.md # QC summary (per-sample pass/warn/fail table)
โโโ figures/
โ โโโ base_quality.png # Per-base sequence quality plot (Phred scores)
โ โโโ gc_content.png # GC content distribution across samples
โโโ fastqc_output/ # Raw FastQC ZIP + HTML per sample
โโโ multiqc_report.html # Aggregated interactive MultiQC report
โโโ reproducibility/
โโโ commands.sh
โโโ checksums.sha256
Example report.md excerpt:
## Quality Control Summary
| Sample | Basic Statistics | Per Base Quality | GC Content | Adapter Content |
|--------|-----------------|-----------------|------------|----------------|
| SAMPLE_01 | PASS | PASS | PASS | PASS |
| SAMPLE_02 | PASS | WARN | PASS | PASS |
Algorithm / Methodology
- FastQC Execution: Launches
fastqcwith-oand-t(threads) parameters on targeted files. - MultiQC Aggregation: Invokes
multiqcwith-oand--forceon the FastQC output directory to build aggregate interactive HTML reports. - Summary Parser: Reads
summary.txtand maps each QC module to a Pass/Warn/Fail status. - Detailed Metrics Parser: Scans
fastqc_data.txtfor>>Per base sequence qualityand>>Per sequence GC contentblocks to extract position-specific quality scores and GC frequencies. - Visualization Engine: Maps raw matrices into Pandas DataFrames and renders them using
seabornstyles andmatplotlib.pyplotdrawing functions.
Gotchas
- FastQC/MultiQC Missing: If
fastqcormultiqcis missing on PATH, the pipeline mode will fail gracefully and explain exactly how to install them (brew install fastqc/pip install multiqc). - Interactive Plots: Custom generated charts are saved as static PNGs. Interactive reports are found in
multiqc_report.html. - Large FASTQ Files: For massive datasets, ensure to specify a reasonable thread count via
--threadsto prevent high CPU utilization.
Safety
- Local-first: All FastQC and MultiQC processing is performed strictly locally. No genetic data is ever uploaded.
- No code execution: All analysis is performed via explicit
subprocess.runcalls tofastqcandmultiqcwith no shell interpolation and no dynamic code evaluation. - Disclaimer: Every generated
report.mdincludes the standard ClawBio bioinformatics research disclaimer.
Agent Boundary
The agent dispatches parameters and visualizes outcomes. The skill executes the native binaries and processes logs.
Integration with Bio Orchestrator
Trigger conditions: routes here when:
- User mentions "bioqc", "mcp qc", "run fastqc", "fastq quality control".
- Raw FASTQ files are provided as input for pipeline execution.
Chaining partners:
multiqc-reporter: Can consume raw data generated by the FastQC step.seq-wrangler: Can feed upstream raw reads into BioQC.
Citations
- Andrews S. FastQC: A Quality Control Tool for High Throughput Sequence Data (2010). http://www.bioinformatics.babraham.ac.uk/projects/fastqc
- Ewels P, et al. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016).