modal-neural-modality-discovery - SKILL.md Agent Skill

name: modal-neural-modality-discovery description: "MoDAl (Modality Decorrelation and Alignment) framework for self-supervised neural modality discovery in speech neuroprosthesis. Uses contrastive alignment with LLM text embeddings + decorrelation loss to discover complementary neurolinguistic modalities from multiple brain regions. Key innovation: proves contrastive alignment induces transitive modality coalescence, which decorrelation must counteract. Trigger words: MoDAl, neural modality discovery, speech neuroprosthesis, brain-to-text decoding, contrastive decorrelation, Broca area decoding, multi-region brain encoding." category: ai_collection

MoDAl: Self-Supervised Neural Modality Discovery for Speech Neuroprosthesis

Paper

Title: MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis arXiv: 2605.00025v1 Authors: Yuanhao Chen, Peter Chin Date: 2026-04-22 (submitted) / 2026-05-04 (cross-listed) Categories: q-bio.NC, cs.CL, cs.HC, cs.LG ACM Classes: I.2.6; H.5.2; J.3

Abstract

Speech neuroprosthesis systems decode intended speech from neural activity, offering communication restoration for speech-impaired individuals. Current approaches decode predominantly from motor cortical areas, discarding others — such as area 44 (Broca's area) — that may encode complementary linguistic information. MoDAl discovers complementary neural modalities through two objectives in a shared projection space: (1) contrastive loss aligns parallel brain encoders with LLM text embeddings, (2) decorrelation loss prevents encoders from coalescing to duplicative representations. The framework proves these objectives are in productive tension: contrastive alignment induces transitive modality coalescence, which decorrelation must counteract. On Brain-to-Text Benchmark '24, MoDAl reduces WER from 26.3% to 21.6%. Area 44 signals capture structural/syntactic properties (sentence length, grammatical voice, wh-words).

Core Problem: Single-Region Decoding Limitation

Current Approaches

Most brain-to-text systems:

Decode from motor cortex only — primary motor areas for articulation
Discard non-motor regions — Broca's area, Wernicke's area, etc.
Single encoder architecture — one neural encoder for all input

Problem: Non-motor regions encode complementary linguistic information:

Area 44 (Broca's): Syntactic structure, grammatical processing
Sensory cortex: Phonological representations
Prefrontal cortex: Semantic planning, discourse structure

MoDAl Framework Architecture

Dual-Objective Learning

┌─────────────────────────────────────────────┐
│              Shared Projection Space         │
├──────────────┬──────────────┬───────────────┤
│  Encoder 1   │  Encoder 2   │  Encoder 3    │
│  (Motor)     │  (Area 44)   │  (Other)      │
│  Neural → Z1 │  Neural → Z2 │  Neural → Z3  │
└──────┬───────┴──────┬───────┴───────┬───────┘
       │              │               │
       ▼              ▼               ▼
┌─────────────────────────────────────────────┐
│         LLM Text Embedding Target           │
│         (frozen pretrained LLM)             │
└─────────────────────────────────────────────┘

Objective 1: Contrastive Alignment

# Each encoder maps neural activity to shared text embedding space
def contrastive_loss(encoder_outputs, text_embeddings, temperature=0.07):
    """
    Align each brain encoder's output with LLM text embeddings.
    Uses InfoNCE-style contrastive loss.
    """
    similarities = encoder_outputs @ text_embeddings.T / temperature
    # Positive pairs: matching neural-text pairs
    # Negative pairs: mismatched pairs
    return F.cross_entropy(similarities, torch.arange(len(text_embeddings)))

Objective 2: Decorrelation Loss

def decorrelation_loss(encoder_outputs_list):
    """
    Prevent encoders from learning redundant representations.
    Uses off-diagonal decorrelation in the cross-encoder correlation matrix.
    """
    # Compute cross-encoder correlation
    all_outputs = torch.cat(encoder_outputs_list, dim=1)
    corr_matrix = all_outputs.T @ all_outputs / len(all_outputs)
    
    # Penalize off-diagonal blocks (cross-encoder correlations)
    n_encoders = len(encoder_outputs_list)
    block_size = encoder_outputs_list[0].shape[1]
    
    decorr = 0
    for i in range(n_encoders):
        for j in range(i+1, n_encoders):
            block = corr_matrix[i*block_size:(i+1)*block_size, 
                               j*block_size:(j+1)*block_size]
            decorr += (block ** 2).mean()
    
    return decorr

Combined Loss

def modal_loss(encoders, neural_data, text_embeddings, alpha=0.1):
    """
    MoDAl combined loss: contrastive alignment + decorrelation.
    
    The key insight: these objectives are in productive tension.
    - Contrastive pushes all encoders toward the same target (coalescence)
    - Decorrelation pulls them apart (diversification)
    - The balance discovers functionally specialized modalities
    """
    encoder_outputs = [enc(neural_data[i]) for i, enc in enumerate(encoders)]
    
    # Contrastive: align each encoder to text
    contrastive = sum(contrastive_loss(out, text_embeddings) 
                      for out in encoder_outputs)
    
    # Decorrelation: prevent redundancy
    decorr = decorrelation_loss(encoder_outputs)
    
    return contrastive + alpha * decorr

Theoretical Foundation: Transitive Modality Coalescence

The Proof

MoDAl proves that contrastive alignment alone causes transitive coalescence:

If encoder A aligns to text, and encoder B aligns to text
Then encoder A and encoder B become correlated (transitive property)
Without decorrelation, all encoders converge to identical representations

The Counteracting Mechanism

Decorrelation loss specifically targets the off-diagonal correlations between encoders:

Forces each encoder to capture unique aspects of the text representation
Creates functional specialization — each encoder becomes expert in different linguistic features

Results

Brain-to-Text Benchmark '24

Method	WER	Improvement
Previous best (end-to-end)	26.3%	—
MoDAl (with Area 44)	21.6%	-4.7%

Discovered Modality Specialization

Analysis of learned representations reveals:

Brain Region	Discovered Specialization	Linguistic Feature
Motor cortex	Articulatory planning	Phoneme sequences, timing
Area 44 (Broca's)	Syntactic structure	Sentence length, grammatical voice, wh-words
Other regions	Complementary features	Semantic context, prosody

Key finding: The performance gain from incorporating Area 44 arises entirely from the decorrelation mechanism — without decorrelation, Area 44 encoder coalesces with motor encoder, providing no additional information.

Implementation Guide

Prerequisites

Pretrained LLM (for text embeddings, e.g., LLaMA, GPT)
Multi-region neural recordings (motor + non-motor)
Paired neural-text data (e.g., speech production tasks)

Step-by-Step

Prepare neural data: Segment by utterance, align with text transcripts
Initialize encoders: One per brain region (e.g., 1D CNN, Transformer, or LSTM)
Freeze LLM: Use pretrained LLM for text embeddings only (no fine-tuning)
Train with dual loss: Balance contrastive (α=1.0) and decorrelation (α=0.1)
Validate specialization: Analyze encoder outputs for functional differences
Decode: Use ensemble of encoder outputs for final text generation

Hyperparameters

Parameter	Role	Suggested Value
α (decorrelation weight)	Balance alignment vs. diversity	0.05 - 0.2
Temperature	Contrastive sharpness	0.05 - 0.1
Encoder architecture	Neural representation capacity	1D-CNN or lightweight Transformer
Projection dimension	Shared space dimensionality	256 - 512
LLM model	Text embedding source	Any pretrained causal LM

Code Structure

class MoDAlFramework:
    def __init__(self, n_regions, text_embedding_dim=512):
        # One encoder per brain region
        self.encoders = nn.ModuleList([
            NeuralEncoder(output_dim=text_embedding_dim) 
            for _ in range(n_regions)
        ])
        
    def forward(self, neural_data_per_region, text_embeddings):
        # Encode each region
        region_embeddings = [
            enc(data) for enc, data in zip(self.encoders, neural_data_per_region)
        ]
        
        # Contrastive loss
        contrastive = sum(
            F.cross_entropy(emb @ text_embeddings.T / self.temperature, 
                          torch.arange(len(text_embeddings)))
            for emb in region_embeddings
        )
        
        # Decorrelation loss
        decorr = self._decorrelation_loss(region_embeddings)
        
        return contrastive + self.alpha * decorr

Applications

Speech neuroprosthesis — decode speech for ALS, locked-in syndrome patients
Multi-region BCI — leverage complementary information from different brain areas
Neurolinguistic research — discover functional specialization across brain regions
Self-supervised learning for neural data — no manual annotation needed
Brain-LLM alignment — map neural representations to language model space

Related Skills

brain-to-text-unified-decoding: Unified brain-to-text framework
iphoneme-brain-to-text-als-conformerxl: Brain-to-text for ALS
eeg2vision-multimodal-eeg-framework-2d-visual: EEG-to-image reconstruction
llm-eeg-graph-refinement: LLM as clinical graph refiner

Pitfalls

Alpha too high: Encoders become too independent, lose alignment to text
Alpha too low: Encoders coalesce, losing the benefit of multiple regions
Insufficient data: Contrastive learning requires substantial paired data
LLM mismatch: Text embeddings from mismatched LLM may not align with neural representations
Region selection: Not all brain regions provide complementary information — careful selection needed
Temporal alignment: Neural data must be precisely aligned with text timestamps