structure-aware-coreset-fc-benchmarking - SKILL.md Agent Skill

name: structure-aware-coreset-fc-benchmarking description: "Accelerating benchmarking of functional connectivity (FC) modeling via structure-aware coreset selection for large-scale fMRI datasets. Reduces combinatorial explosion in model-data evaluation pairs. Activation: functional connectivity, fMRI benchmarking, coreset selection, FC modeling, brain network analysis, connectomics benchmarking."

Structure-aware Core-set Selection for FC Benchmarking

Methodology for accelerating functional connectivity (FC) modeling benchmarking on large-scale fMRI datasets by selecting representative subject subsets (coresets) that preserve structural brain properties.

Metadata

Source: arXiv:2602.05667
Authors: Ling Zhan, Zhen Li, Junjie Huang, et al.
Published: 2026-02-05
Categories: cs.LG

Core Methodology

Key Innovation

Addresses the combinatorial explosion problem in FC benchmarking: with hundreds of FC methods × thousands of subjects × multiple datasets, exhaustive evaluation is computationally prohibitive. The proposed structure-aware coreset selection identifies a small representative subset of subjects that preserves key brain network properties, enabling efficient yet reliable benchmarking.

Technical Framework

Coreset Selection: Identify a representative subset of subjects from large fMRI datasets
Structure-Awareness: Preserve spatial brain structure (anatomical regions) and functional topology (network patterns) in the selection
Evaluation Proxy: Demonstrate that benchmarking on the coreset yields rankings consistent with full-dataset evaluation
Scalability: Reduce computational cost by orders of magnitude while maintaining benchmark validity

Implementation Guide

Prerequisites

Large-scale fMRI datasets (e.g., HCP, UK Biobank)
FC estimation pipeline (partial correlation, regularized inverse covariance, etc.)
Scikit-learn or similar for clustering/selection

Step-by-Step

Compute FC matrices for all subjects
Extract structural features (region-wise connectivity patterns, network topology metrics)
Apply structure-aware coreset selection (clustering + representative sampling)
Validate coreset by comparing FC method rankings against full-dataset rankings
Use coreset for rapid benchmarking of new FC methods

Code Example

import numpy as np
from sklearn.cluster import KMeans

def structure_aware_coreset(fc_matrices, n_select=50, n_clusters=10):
    """Select representative subjects preserving brain network structure."""
    n_subjects = fc_matrices.shape[0]
    # Flatten upper triangular FC values
    feats = np.array([mat[np.triu_indices_from(mat, k=1)] for mat in fc_matrices])
    # Cluster subjects by FC patterns
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(feats)
    # Proportionally sample from each cluster
    selected = []
    for c in range(n_clusters):
        idx = np.where(labels == c)[0]
        n_c = max(1, int(n_select * len(idx) / n_subjects))
        # Pick closest to cluster center
        dists = np.linalg.norm(feats[idx] - kmeans.cluster_centers_[c], axis=1)
        selected.extend(idx[np.argsort(dists)[:n_c]])
    return selected[:n_select]

Applications

FC Method Comparison: Rapid benchmarking of new FC estimation methods
Large-scale Studies: Efficient analysis of biobank-scale datasets
Reproducibility: Standardized benchmarking subsets for community comparison
Pipeline Optimization: Quick evaluation of preprocessing choices

Pitfalls

Coreset may miss rare but important subject phenotypes
Validation needed for each new dataset
Trade-off between coreset size and ranking fidelity
May not generalize across all FC metrics

Related Skills

brain-network-topology
gaussian-graphical-connectivity-analysis
functional-connectome-fingerprint