name: grn-tools description: Gene regulatory network workflow guide for transcriptomics and single-cell expression matrices using Arboreto, GRNBoost2, and GENIE3. Use when the user asks to infer transcription factor-target links, score regulatory edges, or build a GRN from bulk or single-cell expression data. source: drugclaw updated_at: "2026-03-11"
GRN Tools
Use this skill when the user asks for gene regulatory network inference rather than basic dataset profiling.
Typical triggers:
- infer transcription factor to target edges from bulk RNA-seq or single-cell expression data
- run GRNBoost2 or GENIE3 on an expression matrix
- restrict GRN inference to a curated TF list
- export ranked regulatory edges for downstream SCENIC-style or network analysis
Environment Check
which python3 || true
python3 - <<'PY'
mods = ["pandas", "arboreto"]
extra = ["distributed"]
for name in mods + extra:
try:
__import__(name)
print(f"{name}: ok")
except Exception as exc:
print(f"{name}: missing ({exc})")
PY
distributed is only required when using --workers for a local Dask cluster.
Bundled Asset
templates/arboreto_grn.py
Preferred Workflow
- Confirm the matrix orientation before inference. Arboreto expects observations as rows and genes as columns.
- Drop sample-id columns or transpose the matrix before fitting.
- Provide a TF whitelist when the user wants biologically narrower networks.
- Save the full ranked edge table and a summary JSON.
- Treat the output as an inferred regulatory hypothesis set, not a validated causal network.
Quick Start
python3 templates/arboreto_grn.py \
--input expression.tsv \
--algorithm grnboost2 \
--tf-file tf_names.txt \
--min-importance 0.01 \
--top-edges 5000 \
--output grn/network.tsv \
--summary grn/network.json
If the input is genes-by-samples, transpose it first or use --transpose:
python3 templates/arboreto_grn.py \
--input expression_genes_by_samples.csv \
--transpose \
--algorithm genie3 \
--workers 4 \
--output grn/network.tsv \
--summary grn/network.json
Output Expectations
Good answers should mention:
- the exact matrix path and whether it was transposed
- which algorithm ran
- whether a TF list was used
- observation count, gene count, and retained edge count
- any
distributedor dependency limitation - where the network TSV and summary JSON were written
Related Skills
For h5ad, BAM, CRAM, or mzML dataset triage before GRN inference, activate omics-tools.
For statistical modeling on downstream regulon or score tables, activate stat-modeling-tools.
For figure generation from network summaries, activate scientific-visualization-tools.