modal-neural-modality-discovery

star 2

MoDAl (Modality Decorrelation and Alignment) framework for self-supervised neural modality discovery in speech neuroprosthesis. Uses contrastive alignment with LLM text embeddings + decorrelation loss to discover complementary neurolinguistic modalities from multiple brain regions. Key innovation: proves contrastive alignment induces transitive modality coalescence, which decorrelation must counteract. Trigger words: MoDAl, neural modality discovery, speech neuroprosthesis, brain-to-text decoding, contrastive decorrelation, Broca area decoding, multi-region brain encoding.

hiyenwong By hiyenwong schedule Updated 6/4/2026

name: modal-neural-modality-discovery description: "MoDAl (Modality Decorrelation and Alignment) framework for self-supervised neural modality discovery in speech neuroprosthesis. Uses contrastive alignment with LLM text embeddings + decorrelation loss to discover complementary neurolinguistic modalities from multiple brain regions. Key innovation: proves contrastive alignment induces transitive modality coalescence, which decorrelation must counteract. Trigger words: MoDAl, neural modality discovery, speech neuroprosthesis, brain-to-text decoding, contrastive decorrelation, Broca area decoding, multi-region brain encoding." category: ai_collection

MoDAl: Self-Supervised Neural Modality Discovery for Speech Neuroprosthesis

Paper

Title: MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis arXiv: 2605.00025v1 Authors: Yuanhao Chen, Peter Chin Date: 2026-04-22 (submitted) / 2026-05-04 (cross-listed) Categories: q-bio.NC, cs.CL, cs.HC, cs.LG ACM Classes: I.2.6; H.5.2; J.3

Abstract

Speech neuroprosthesis systems decode intended speech from neural activity, offering communication restoration for speech-impaired individuals. Current approaches decode predominantly from motor cortical areas, discarding others — such as area 44 (Broca's area) — that may encode complementary linguistic information. MoDAl discovers complementary neural modalities through two objectives in a shared projection space: (1) contrastive loss aligns parallel brain encoders with LLM text embeddings, (2) decorrelation loss prevents encoders from coalescing to duplicative representations. The framework proves these objectives are in productive tension: contrastive alignment induces transitive modality coalescence, which decorrelation must counteract. On Brain-to-Text Benchmark '24, MoDAl reduces WER from 26.3% to 21.6%. Area 44 signals capture structural/syntactic properties (sentence length, grammatical voice, wh-words).

Core Problem: Single-Region Decoding Limitation

Current Approaches

Most brain-to-text systems:

  1. Decode from motor cortex only — primary motor areas for articulation
  2. Discard non-motor regions — Broca's area, Wernicke's area, etc.
  3. Single encoder architecture — one neural encoder for all input

Problem: Non-motor regions encode complementary linguistic information:

  • Area 44 (Broca's): Syntactic structure, grammatical processing
  • Sensory cortex: Phonological representations
  • Prefrontal cortex: Semantic planning, discourse structure

MoDAl Framework Architecture

Dual-Objective Learning

┌─────────────────────────────────────────────┐
│              Shared Projection Space         │
├──────────────┬──────────────┬───────────────┤
│  Encoder 1   │  Encoder 2   │  Encoder 3    │
│  (Motor)     │  (Area 44)   │  (Other)      │
│  Neural → Z1 │  Neural → Z2 │  Neural → Z3  │
└──────┬───────┴──────┬───────┴───────┬───────┘
       │              │               │
       ▼              ▼               ▼
┌─────────────────────────────────────────────┐
│         LLM Text Embedding Target           │
│         (frozen pretrained LLM)             │
└─────────────────────────────────────────────┘

Objective 1: Contrastive Alignment

# Each encoder maps neural activity to shared text embedding space
def contrastive_loss(encoder_outputs, text_embeddings, temperature=0.07):
    """
    Align each brain encoder's output with LLM text embeddings.
    Uses InfoNCE-style contrastive loss.
    """
    similarities = encoder_outputs @ text_embeddings.T / temperature
    # Positive pairs: matching neural-text pairs
    # Negative pairs: mismatched pairs
    return F.cross_entropy(similarities, torch.arange(len(text_embeddings)))

Objective 2: Decorrelation Loss

def decorrelation_loss(encoder_outputs_list):
    """
    Prevent encoders from learning redundant representations.
    Uses off-diagonal decorrelation in the cross-encoder correlation matrix.
    """
    # Compute cross-encoder correlation
    all_outputs = torch.cat(encoder_outputs_list, dim=1)
    corr_matrix = all_outputs.T @ all_outputs / len(all_outputs)
    
    # Penalize off-diagonal blocks (cross-encoder correlations)
    n_encoders = len(encoder_outputs_list)
    block_size = encoder_outputs_list[0].shape[1]
    
    decorr = 0
    for i in range(n_encoders):
        for j in range(i+1, n_encoders):
            block = corr_matrix[i*block_size:(i+1)*block_size, 
                               j*block_size:(j+1)*block_size]
            decorr += (block ** 2).mean()
    
    return decorr

Combined Loss

def modal_loss(encoders, neural_data, text_embeddings, alpha=0.1):
    """
    MoDAl combined loss: contrastive alignment + decorrelation.
    
    The key insight: these objectives are in productive tension.
    - Contrastive pushes all encoders toward the same target (coalescence)
    - Decorrelation pulls them apart (diversification)
    - The balance discovers functionally specialized modalities
    """
    encoder_outputs = [enc(neural_data[i]) for i, enc in enumerate(encoders)]
    
    # Contrastive: align each encoder to text
    contrastive = sum(contrastive_loss(out, text_embeddings) 
                      for out in encoder_outputs)
    
    # Decorrelation: prevent redundancy
    decorr = decorrelation_loss(encoder_outputs)
    
    return contrastive + alpha * decorr

Theoretical Foundation: Transitive Modality Coalescence

The Proof

MoDAl proves that contrastive alignment alone causes transitive coalescence:

  • If encoder A aligns to text, and encoder B aligns to text
  • Then encoder A and encoder B become correlated (transitive property)
  • Without decorrelation, all encoders converge to identical representations

The Counteracting Mechanism

Decorrelation loss specifically targets the off-diagonal correlations between encoders:

  • Forces each encoder to capture unique aspects of the text representation
  • Creates functional specialization — each encoder becomes expert in different linguistic features

Results

Brain-to-Text Benchmark '24

Method WER Improvement
Previous best (end-to-end) 26.3%
MoDAl (with Area 44) 21.6% -4.7%

Discovered Modality Specialization

Analysis of learned representations reveals:

Brain Region Discovered Specialization Linguistic Feature
Motor cortex Articulatory planning Phoneme sequences, timing
Area 44 (Broca's) Syntactic structure Sentence length, grammatical voice, wh-words
Other regions Complementary features Semantic context, prosody

Key finding: The performance gain from incorporating Area 44 arises entirely from the decorrelation mechanism — without decorrelation, Area 44 encoder coalesces with motor encoder, providing no additional information.

Implementation Guide

Prerequisites

  • Pretrained LLM (for text embeddings, e.g., LLaMA, GPT)
  • Multi-region neural recordings (motor + non-motor)
  • Paired neural-text data (e.g., speech production tasks)

Step-by-Step

  1. Prepare neural data: Segment by utterance, align with text transcripts
  2. Initialize encoders: One per brain region (e.g., 1D CNN, Transformer, or LSTM)
  3. Freeze LLM: Use pretrained LLM for text embeddings only (no fine-tuning)
  4. Train with dual loss: Balance contrastive (α=1.0) and decorrelation (α=0.1)
  5. Validate specialization: Analyze encoder outputs for functional differences
  6. Decode: Use ensemble of encoder outputs for final text generation

Hyperparameters

Parameter Role Suggested Value
α (decorrelation weight) Balance alignment vs. diversity 0.05 - 0.2
Temperature Contrastive sharpness 0.05 - 0.1
Encoder architecture Neural representation capacity 1D-CNN or lightweight Transformer
Projection dimension Shared space dimensionality 256 - 512
LLM model Text embedding source Any pretrained causal LM

Code Structure

class MoDAlFramework:
    def __init__(self, n_regions, text_embedding_dim=512):
        # One encoder per brain region
        self.encoders = nn.ModuleList([
            NeuralEncoder(output_dim=text_embedding_dim) 
            for _ in range(n_regions)
        ])
        
    def forward(self, neural_data_per_region, text_embeddings):
        # Encode each region
        region_embeddings = [
            enc(data) for enc, data in zip(self.encoders, neural_data_per_region)
        ]
        
        # Contrastive loss
        contrastive = sum(
            F.cross_entropy(emb @ text_embeddings.T / self.temperature, 
                          torch.arange(len(text_embeddings)))
            for emb in region_embeddings
        )
        
        # Decorrelation loss
        decorr = self._decorrelation_loss(region_embeddings)
        
        return contrastive + self.alpha * decorr

Applications

  • Speech neuroprosthesis — decode speech for ALS, locked-in syndrome patients
  • Multi-region BCI — leverage complementary information from different brain areas
  • Neurolinguistic research — discover functional specialization across brain regions
  • Self-supervised learning for neural data — no manual annotation needed
  • Brain-LLM alignment — map neural representations to language model space

Related Skills

  • brain-to-text-unified-decoding: Unified brain-to-text framework
  • iphoneme-brain-to-text-als-conformerxl: Brain-to-text for ALS
  • eeg2vision-multimodal-eeg-framework-2d-visual: EEG-to-image reconstruction
  • llm-eeg-graph-refinement: LLM as clinical graph refiner

Pitfalls

  • Alpha too high: Encoders become too independent, lose alignment to text
  • Alpha too low: Encoders coalesce, losing the benefit of multiple regions
  • Insufficient data: Contrastive learning requires substantial paired data
  • LLM mismatch: Text embeddings from mismatched LLM may not align with neural representations
  • Region selection: Not all brain regions provide complementary information — careful selection needed
  • Temporal alignment: Neural data must be precisely aligned with text timestamps
Install via CLI
npx skills add https://github.com/hiyenwong/ai_collection --skill modal-neural-modality-discovery
Repository Details
star Stars 2
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator