hdwgcna

name: hdwgcna description: Co-expression network analysis for single-cell and spatial transcriptomics using hdWGCNA (R/Seurat). Covers the full pipeline — metacell construction, soft-power selection, network construction, module identification, module eigengenes (MEs/hMEs), hub gene scoring — plus three downstream analyses: differential module eigengenes between groups (FindDMEs / FindAllDMEs), module-trait correlation (ModuleTraitCorrelation), and enrichment via Enrichr / fgsea. Operates on Seurat objects. license: GPL-3.0-or-later metadata:

hdWGCNA: High-Dimensional Co-Expression Network Analysis

Overview

hdWGCNA is the modern adaptation of WGCNA for high-dimensional transcriptomics — single-cell RNA-seq, single-nucleus RNA-seq, and spatial transcriptomics. It solves WGCNA's two biggest problems with single-cell data:

Sparsity — single cells have many zeros, breaking WGCNA's correlation structure. hdWGCNA aggregates similar cells into "metacells" first.
Batch effects — module eigengenes (MEs) can be confounded by donor/sample. hdWGCNA computes batch-corrected harmonized MEs (hMEs) via Harmony.

The package operates on Seurat objects: every step adds data to seurat_obj@misc[[wgcna_name]], so you can run multiple parallel analyses (e.g. one network per cell type) on the same Seurat object.

When to Use This Skill

Identifying co-expressed gene modules within a cell type (e.g. "neuronal modules in inhibitory neurons")
Finding hub genes (genes most central to a module) for downstream wet-lab validation
Testing module activity differences between conditions (case vs control, treated vs untreated)
Correlating module eigengenes with continuous traits (age, disease severity, cell type proportion)
Functional enrichment of modules (GO, KEGG, custom gene sets) via Enrichr or fgsea
Module preservation testing across datasets (cross-condition / cross-species)

Not for: bulk RNA-seq (use vanilla WGCNA), trajectory analysis (use scVelo / Slingshot), differential expression at the gene level (use Seurat / DESeq2).

Prerequisites

R ≥ 4.2, Seurat ≥ 4.x, WGCNA, harmony, UCell, igraph
A Seurat object with: cell type / cluster labels, sample/donor identifiers, PCA + Harmony reductions if you want hMEs
Recommend ≥ 5,000 cells per group of interest after QC

# Install (one-time)
install.packages("BiocManager")
BiocManager::install(c("WGCNA", "GeneOverlap", "UCell"))
devtools::install_github('smorabit/hdWGCNA', ref='dev')

Quick Start — Full Pipeline

library(Seurat)
library(hdWGCNA)
library(tidyverse)
library(cowplot)
library(patchwork)
library(WGCNA)
allowWGCNAThreads(nThreads = 8)

# Assumes you already have a normalized Seurat object with cell type labels.
seurat_obj <- readRDS('your_annotated_seurat.rds')

# ── 1. Set up hdWGCNA experiment ───────────────────────────────────────────
seurat_obj <- SetupForWGCNA(
  seurat_obj,
  gene_select = "fraction",     # keep genes expressed in ≥ fraction of cells
  fraction = 0.05,
  wgcna_name = "INH"             # name this analysis (you can run multiple)
)

# ── 2. Build metacells ─────────────────────────────────────────────────────
# k-NN aggregation within each (cell_type × Sample) bin → "metacells"
# reduces sparsity while keeping per-sample biology distinct.
seurat_obj <- MetacellsByGroups(
  seurat_obj,
  group.by = c("cell_type", "Sample"),
  reduction = 'harmony',         # use the integrated embedding
  k = 25,                        # cells per metacell
  max_shared = 10,               # max cells shared across metacells (prevents cliques)
  ident.group = 'cell_type'
)
seurat_obj <- NormalizeMetacells(seurat_obj)

# ── 3. Pick the cell type / group to model ─────────────────────────────────
seurat_obj <- SetDatExpr(
  seurat_obj,
  group_name = "INH",            # the cell type of interest
  group.by = 'cell_type',
  assay = 'RNA',
  layer = 'data'
)

# ── 4. Soft-power selection ────────────────────────────────────────────────
seurat_obj <- TestSoftPowers(seurat_obj, networkType = 'signed')
plot_list <- PlotSoftPowers(seurat_obj)
wrap_plots(plot_list, ncol = 2)
# Pick the lowest power giving SFT R² ≥ 0.8 (default auto-selects from GetPowerTable).

# ── 5. Construct network ───────────────────────────────────────────────────
seurat_obj <- ConstructNetwork(
  seurat_obj,
  tom_name = 'INH',              # the TOM file (gene×gene topological overlap) is written here
  overwrite_tom = TRUE
)
PlotDendrogram(seurat_obj, main = 'INH hdWGCNA Dendrogram')

# ── 6. Module eigengenes (with Harmony batch correction) ───────────────────
seurat_obj <- ScaleData(seurat_obj, features = VariableFeatures(seurat_obj))
seurat_obj <- ModuleEigengenes(
  seurat_obj,
  group.by.vars = "Sample"      # harmonize MEs across samples
)

# ── 7. Connectivity (kME) + hub gene scoring ───────────────────────────────
seurat_obj <- ModuleConnectivity(
  seurat_obj,
  group.by = 'cell_type',
  group_name = 'INH'
)
seurat_obj <- ResetModuleNames(seurat_obj, new_name = "INH-M")

# Top-10 hub genes per module
hub_df <- GetHubGenes(seurat_obj, n_hubs = 10)

# UCell signature score for the top-25 hubs per module — fast, per-cell
library(UCell)
seurat_obj <- ModuleExprScore(seurat_obj, n_genes = 25, method = 'UCell')

# Save the analysis to disk — the network construction step is the slow one
saveRDS(seurat_obj, 'seurat_obj_hdwgcna.rds')

The result object is the same Seurat object, with the entire hdWGCNA experiment stored in seurat_obj@misc[["INH"]]. Inspect the three key outputs:

# Module assignments (one row per gene)
modules <- GetModules(seurat_obj) %>% subset(module != 'grey')
head(modules)
#   gene_name  module    color   kME_INH-M1   kME_INH-M2 ...
# 1 GRIN1      INH-M1    red     0.332        0.021
# 2 INTS11     INH-M2    blue    0.050        0.227

# Module eigengenes (cells × modules)
hMEs <- GetMEs(seurat_obj)             # harmonized
MEs  <- GetMEs(seurat_obj, harmonized = FALSE)   # raw

# Hub genes (top N per module by kME)
hub_df <- GetHubGenes(seurat_obj, n_hubs = 25)

Convenience runner: Rscript scripts/build_network.R --rds annotated.rds --cell-type INH --out seurat_hdwgcna.rds. See references/network_construction.md for parameter tuning, metacell diagnostics, and gotchas.

Downstream 1 — Differential Module Eigengenes (DMEs)

The question: does module-N activity differ between condition A and condition B (e.g. AD vs control, treated vs untreated)? This is the most common follow-up after network construction.

Two-group test — `FindDMEs`

# Pick the cells to compare. Often: same cell type, two conditions.
group1 <- seurat_obj@meta.data %>%
  filter(cell_type == 'INH', condition == 'AD')   %>% rownames()
group2 <- seurat_obj@meta.data %>%
  filter(cell_type == 'INH', condition == 'Ctrl') %>% rownames()

DMEs <- FindDMEs(
  seurat_obj,
  barcodes1 = group1,
  barcodes2 = group2,
  test.use  = 'wilcox',          # Mann-Whitney U
  pseudocount.use = 0.01,        # for log2FC stability
  wgcna_name = 'INH'
)

Returns a dataframe with one row per module: p_val, avg_log2FC, pct.1, pct.2, p_val_adj, module. The pct.* columns are the fraction of cells in each group whose ME is non-zero — useful when a module is "active" in one condition but silent in the other.

Visualize

PlotDMEsLollipop(seurat_obj, DMEs, group.by = 'cell_type', wgcna_name = 'INH')
PlotDMEsVolcano (seurat_obj, DMEs, plot_labels = TRUE,    wgcna_name = 'INH')

The lollipop is for a single comparison (great for figures). The volcano shows effect-size × significance at a glance — useful when you have many modules and want to spot the outliers.

One-vs-all — `FindAllDMEs`

Across each level of a grouping variable, runs that level vs all other cells:

DMEs_all <- FindAllDMEs(
  seurat_obj,
  group.by   = 'cell_type',
  wgcna_name = 'INH'
)
# adds a `group` column with the tested category

Module signature scores instead of MEs

If your modules don't have clean eigengenes (e.g. small modules with noisy first PC), test on UCell-based hub-gene signatures instead:

seurat_obj <- ModuleExprScore(seurat_obj, n_genes = 25, method = 'UCell')
DMEs_scores <- FindDMEs(
  seurat_obj,
  barcodes1 = group1, barcodes2 = group2,
  features  = 'ModuleScores',
  wgcna_name = 'INH'
)

Convenience: Rscript scripts/downstream_analyses.R --rds seurat_hdwgcna.rds --task dme --group-by condition --group1 AD --group2 Ctrl --cell-type INH

Source: hdWGCNA Differential MEs tutorial.

Downstream 2 — Module–Trait Correlation

The question: which modules correlate with continuous or ordered phenotypes (age, Braak stage, disease severity, % tumor cells per spot)?

cur_traits <- c('age', 'Braak_stage', 'sex_binary', 'cell_density')

seurat_obj <- ModuleTraitCorrelation(
  seurat_obj,
  traits   = cur_traits,
  group.by = 'cell_type',         # stratify the correlation by cell type
  wgcna_name = 'INH'
)

mt_cor <- GetModuleTraitCorrelation(seurat_obj)
# mt_cor is a list:
#   mt_cor$cor   — correlation coefficients (one matrix per group)
#   mt_cor$pval  — p-values
#   mt_cor$fdr   — FDR-corrected

PlotModuleTraitCorrelation(
  seurat_obj,
  label        = 'fdr',           # or 'pval' or 'cor'
  label_symbol = 'stars',         # *** < 0.001, ** < 0.01, * < 0.05
  text_size    = 2.5,
  high_color   = 'yellow',
  mid_color    = 'black',
  low_color    = 'purple',
  plot_max     = 0.2,             # cap colour scale at |r| = 0.2
  combine      = TRUE             # one figure with patchwork
)

Trait type compatibility

Trait type	Supported?	Notes
Continuous (numeric)	✅	Use as-is
Binary categorical	✅	Convert to 0/1 before passing
Ordered categorical	✅	Convert to factor with explicit `levels = c('low', 'mid', 'high')`, then `as.numeric()`
Unordered categorical (>2 levels)	❌	The correlation is meaningless. Either dummy-encode or skip.

Sample ID, mouse strain, batch — these are unordered categorical; if you want to test for batch effects on modules, use ANOVA via FindDMEs between groups instead.

Convenience: Rscript scripts/downstream_analyses.R --rds seurat_hdwgcna.rds --task trait --traits age,Braak_stage

Source: hdWGCNA Module-Trait Correlation tutorial.

Downstream 3 — Functional Enrichment

The question: what biological processes / pathways does each module represent?

Enrichr (web-based, fast, comprehensive)

library(enrichR)

dbs <- c(
  'GO_Biological_Process_2023',
  'GO_Cellular_Component_2023',
  'GO_Molecular_Function_2023',
  'KEGG_2021_Human',
  'Reactome_2022',
  'WikiPathway_2023_Human'
)

seurat_obj <- RunEnrichr(
  seurat_obj,
  dbs        = dbs,
  max_genes  = 100,                # genes per module sent to Enrichr (sorted by kME)
  wgcna_name = 'INH'
)

enrich_df <- GetEnrichrTable(seurat_obj)
# Columns: Term, Overlap, P.value, Adjusted.P.value, Odds.Ratio, Combined.Score, Genes, db, module

Visualize

# Per-module bar plots — writes one PDF per module to outdir/
EnrichrBarPlot(
  seurat_obj,
  outdir     = 'figures/enrichr/',
  n_terms    = 10,
  plot_size  = c(5, 7),
  logscale   = TRUE,
  wgcna_name = 'INH'
)

# Single dot plot across all modules (great for figures)
EnrichrDotPlot(
  seurat_obj,
  mods       = 'all',
  database   = 'GO_Biological_Process_2023',
  n_terms    = 2,                # top N per module
  term_size  = 8,
  p_adj      = FALSE,
  wgcna_name = 'INH'
)

GSEA via fgsea (when you have a ranked gene list)

For a continuous test (not over-representation), rank genes by kME within a module and use fgsea against any MSigDB collection:

library(fgsea)
library(msigdbr)

# Pull genes ranked by kME in module of interest
modules <- GetModules(seurat_obj) %>% filter(module != 'grey')
target_mod <- 'INH-M3'
ranks <- modules %>%
  arrange(desc(.data[[paste0('kME_', target_mod)]])) %>%
  pull(.data[[paste0('kME_', target_mod)]], name = gene_name)

# Get pathways from MSigDB
pathways <- msigdbr(species = 'Homo sapiens', category = 'H') %>%
  split(x = .$gene_symbol, f = .$gs_name)

gsea_res <- fgsea(pathways, ranks, minSize = 10, maxSize = 500)
head(gsea_res[order(padj)])

Convenience: Rscript scripts/downstream_analyses.R --rds seurat_hdwgcna.rds --task enrich --dbs GO_Biological_Process_2023,KEGG_2021_Human

Source: hdWGCNA Enrichment tutorial.

Visualization Cookbook

Module feature plot (UMAP × module)

plot_list <- ModuleFeaturePlot(seurat_obj, features = 'hMEs', order = TRUE)
wrap_plots(plot_list, ncol = 6)
# Alternatives: features = 'scores' (UCell hub scores), 'MEs' (raw, no Harmony)

Module dot plot across cell types

hMEs <- GetMEs(seurat_obj, harmonized = TRUE)
modules <- GetModules(seurat_obj)
mods <- levels(modules$module); mods <- mods[mods != 'grey']
seurat_obj@meta.data <- cbind(seurat_obj@meta.data, hMEs)
DotPlot(seurat_obj, features = mods, group.by = 'cell_type') +
  RotatedAxis() +
  scale_color_gradient2(high = 'red', mid = 'grey95', low = 'blue')

Hub gene network (top hubs from each module)

ModuleNetworkPlot(seurat_obj, n_hubs = 5)              # one figure per module
HubGeneNetworkPlot(seurat_obj, n_hubs = 5, n_other = 10)   # multi-module network

Module–module correlation

ModuleCorrelogram(seurat_obj, features = 'hMEs')

Module radar plot across cell subtypes

ModuleRadarPlot(
  seurat_obj,
  group.by = 'cluster_label',
  barcodes = seurat_obj@meta.data %>% filter(cell_type == 'INH') %>% rownames(),
  axis.label.size = 4
)

More patterns + custom layouts in references/plotting_guide.md.

Key Parameters to Adjust

Metacell construction (`MetacellsByGroups`)

k (default 25): cells per metacell. Lower → finer resolution but noisier. For small datasets (< 2000 cells/type), drop to 10–15.
max_shared (default 10): caps cell sharing across metacells. Higher → fewer metacells but more redundancy.
group.by: always include both cell type and sample — otherwise metacells will mix donors and you'll lose biology.

Soft power

The plot from PlotSoftPowers should show R² climbing past 0.8. Pick the lowest power that crosses this threshold.
Default networkType='signed' (recommended). 'unsigned' collapses positive and negative correlations — usually wrong.

Network construction (`ConstructNetwork`)

minModuleSize (default 50): smallest allowed module. Reduce for small datasets.
mergeCutHeight (default 0.2): merges similar modules. Increase (0.3–0.4) to consolidate noisy modules.
deepSplit (0–4): how aggressively to split. 2 is default; 4 produces many tiny modules.

Module eigengenes (`ModuleEigengenes`)

group.by.vars: pass your batch/donor variable. Without this you get raw MEs (MEs) — use harmonized (hMEs) for downstream stats.

kME / hub genes

n_hubs (default 25 for ModuleExprScore): how many top-kME genes count as "the module's signature". 25 is a good default; for sparse single-cell data, 50 makes the signature more stable.

Best Practices

Run one network per major cell type, not one big network across all cells. WGCNA's TOM is dominated by cell-type identity otherwise — the "modules" will just be marker-gene signatures for each cell type rather than meaningful within-cell-type co-expression.
Save the Seurat object after ConstructNetwork. It's the slow step — minutes to hours depending on n cells and n genes. Everything downstream (MEs, kME, hubs, DMEs, enrichment) is fast.
The grey module is the dumping ground for unassigned genes. Always subset(module != 'grey') before downstream analyses.
Inspect PlotDendrogram BEFORE downstream analysis. If the dendrogram has one giant module and a tiny grey blob, your soft power is too low. If it has 30+ skinny modules, soft power is too high or minModuleSize is too small.
For module-trait correlation with categorical traits, dummy-encode unordered factors instead of relying on alphabetical order. Otherwise the correlation is meaningless.
Cross-condition module preservation matters. A module discovered in controls may not exist in cases. Test with ModulePreservation before trusting cross-condition DME results.

End-to-End Template

assets/hdwgcna_template.R is a single parameterized script — set the variables at the top (input .rds, cell type, sample column, condition column) and run end-to-end through DME + module-trait correlation + Enrichr.

Rscript assets/hdwgcna_template.R

Convenience Scripts

scripts/build_network.R — network construction only (the slow step). Saves an annotated Seurat object.
scripts/downstream_analyses.R — DME / module-trait / enrichment. Subtask selected via --task dme|trait|enrich|all.

References

hdWGCNA documentation — Morabito et al., main project site
Basic tutorial
Differential Module Eigengenes
Module–Trait Correlation
Enrichment Analysis
Original WGCNA: Langfelder & Horvath (2008), BMC Bioinformatics
hdWGCNA paper: Morabito, Reese, Rahimzadeh et al. (2023), Cell Reports Methods

hdWGCNA: High-Dimensional Co-Expression Network Analysis

Overview

When to Use This Skill

Prerequisites

Quick Start — Full Pipeline

Downstream 1 — Differential Module Eigengenes (DMEs)

Two-group test — FindDMEs

Visualize

One-vs-all — FindAllDMEs

Module signature scores instead of MEs

Downstream 2 — Module–Trait Correlation

Trait type compatibility

Downstream 3 — Functional Enrichment

Enrichr (web-based, fast, comprehensive)

Visualize

GSEA via fgsea (when you have a ranked gene list)

Visualization Cookbook

Module feature plot (UMAP × module)

Module dot plot across cell types

Hub gene network (top hubs from each module)

Module–module correlation

Module radar plot across cell subtypes

Key Parameters to Adjust

Metacell construction (MetacellsByGroups)

Soft power

Network construction (ConstructNetwork)

Module eigengenes (ModuleEigengenes)

kME / hub genes

Best Practices

End-to-End Template

Convenience Scripts

References

Two-group test — `FindDMEs`

One-vs-all — `FindAllDMEs`

Metacell construction (`MetacellsByGroups`)

Network construction (`ConstructNetwork`)

Module eigengenes (`ModuleEigengenes`)