cross-modal-dispersion-convergence

star 1

Cross-modal convergence analysis methodology using Generalized Procrustes Algorithm to measure intra-modal representational convergence at single-stimulus level. Reveals how low intra-modal dispersion (high agreement among vision models) elicits significantly higher cross-modal alignment between vision and language models. Activation: cross-modal convergence, representational alignment, Procrustes analysis, vision-language alignment, neural representation, single-stimulus analysis

hiyenwong By hiyenwong schedule Updated 6/3/2026

name: cross-modal-dispersion-convergence description: "Cross-modal convergence analysis methodology using Generalized Procrustes Algorithm to measure intra-modal representational convergence at single-stimulus level. Reveals how low intra-modal dispersion (high agreement among vision models) elicits significantly higher cross-modal alignment between vision and language models. Activation: cross-modal convergence, representational alignment, Procrustes analysis, vision-language alignment, neural representation, single-stimulus analysis"

Cross-Modal Dispersion and Convergence Analysis

Overview

This methodology introduces a framework for understanding how individual stimuli elicit convergent representations across different neural networks and modalities. Using the Generalized Procrustes Algorithm (GPA), it measures intra-modal representational convergence at the single-stimulus level to reveal how stimulus-specific agreement modulates cross-modal alignment.

Key Insight

Low intra-modal dispersion → High cross-modal alignment

Stimuli with high agreement among vision models (low intra-modal dispersion) elicit significantly higher cross-modal alignment between vision and language models than those with high dispersion (up to 2x improvement with DINOv2-language model pairings).

Core Methodology

1. Generalized Procrustes Algorithm (GPA)

The GPA aligns multiple configurations by iterative transformations (translation, rotation, scaling, reflection) to minimize sum of squared distances between corresponding points.

def generalized_procrustes_analysis(representations_list, iterations=1000):
    """
    Align multiple representation spaces to a common reference
    
    Parameters:
    - representations_list: List of representation matrices from different models
    - iterations: Maximum iterations for convergence
    
    Returns:
    - aligned_representations: List of aligned representation matrices
    - mean_configuration: Consensus configuration
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    from scipy.linalg import orthogonal_procrustes
    
    n_models = len(representations_list)
    n_stimuli, n_features = representations_list[0].shape
    
    # Initialize mean configuration
    mean_config = np.mean(representations_list, axis=0)
    
    # Iteratively align to mean configuration
    for _ in range(iterations):
        aligned = []
        for rep in representations_list:
            # Center representations
            rep_centered = rep - np.mean(rep, axis=0)
            mean_centered = mean_config - np.mean(mean_config, axis=0)
            
            # Orthogonal Procrustes
            R, _ = orthogonal_procrustes(rep_centered, mean_centered)
            aligned_rep = rep_centered @ R + np.mean(mean_config, axis=0)
            aligned.append(aligned_rep)
        
        # Update mean configuration
        new_mean = np.mean(aligned, axis=0)
        
        # Check convergence
        if np.allclose(mean_config, new_mean, rtol=1e-6):
            break
        mean_config = new_mean
    
    return aligned, mean_config

2. Intra-Modal Dispersion Calculation

def compute_intra_modal_dispersion(representations_list, stimulus_idx=None):
    """
    Compute intra-modal dispersion for stimuli
    
    For each stimulus, measures how much different models disagree
    about its representation (after Procrustes alignment)
    
    Parameters:
    - representations_list: List of aligned representation matrices from
                            models within the same modality (e.g., vision)
    - stimulus_idx: Specific stimulus index (None for all stimuli)
    
    Returns:
    - dispersion: Per-stimulus dispersion scores
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    n_models = len(representations_list)
    n_stimuli = representations_list[0].shape[0]
    
    if stimulus_idx is not None:
        # Single stimulus dispersion
        stimulus_reps = np.array([rep[stimulus_idx] for rep in representations_list])
        # Variance across models
        dispersion = np.var(stimulus_reps, axis=0).mean()
    else:
        # All stimuli dispersion
        dispersion = []
        for i in range(n_stimuli):
            stimulus_reps = np.array([rep[i] for rep in representations_list])
            disp = np.var(stimulus_reps, axis=0).mean()
            dispersion.append(disp)
        dispersion = np.array(dispersion)
    
    return dispersion

3. Cross-Modal Alignment Measurement

def measure_cross_modal_alignment(vision_reps, language_reps):
    """
    Measure alignment between vision and language representations
    
    Parameters:
    - vision_reps: Vision model representations (n_stimuli x n_features)
    - language_reps: Language model representations (n_stimuli x n_features)
    
    Returns:
    - alignment_score: Cross-modal alignment score
    """
    from scipy.stats import pearsonr
    from scipy.spatial.distance import pdist, squareform
    
    # Compute representational similarity matrices (RDMs)
    vision_rdm = pdist(vision_reps, metric='correlation')
    language_rdm = pdist(language_reps, metric='correlation')
    
    # Correlation between RDMs (alignment score)
    alignment, _ = pearsonr(vision_rdm, language_rdm)
    
    return alignment

4. Complete Analysis Pipeline

def cross_modal_dispersion_analysis(
    vision_models_reps,      # Dict: {model_name: representations}
    language_models_reps,    # Dict: {model_name: representations}
    stimulus_labels=None
):
    """
    Full cross-modal dispersion and convergence analysis
    
    Parameters:
    - vision_models_reps: Dictionary of vision model representations
    - language_models_reps: Dictionary of language model representations
    - stimulus_labels: Optional stimulus category labels
    
    Returns:
    - results: Analysis results including dispersion and alignment
    """
    # Step 1: Align vision models within modality
    vision_list = list(vision_models_reps.values())
    aligned_vision, vision_mean = generalized_procrustes_analysis(vision_list)
    
    # Step 2: Align language models within modality  
    language_list = list(language_models_reps.values())
    aligned_language, language_mean = generalized_procrustes_analysis(language_list)
    
    # Step 3: Compute intra-modal dispersion for each stimulus
    vision_dispersion = compute_intra_modal_dispersion(aligned_vision)
    language_dispersion = compute_intra_modal_dispersion(aligned_language)
    
    # Step 4: Analyze cross-modal alignment vs dispersion
    results = []
    for vision_model_name, vision_aligned in zip(vision_models_reps.keys(), aligned_vision):
        for lang_model_name, lang_aligned in zip(language_models_reps.keys(), aligned_language):
            alignment = measure_cross_modal_alignment(vision_aligned, lang_aligned)
            
            results.append({
                'vision_model': vision_model_name,
                'language_model': lang_model_name,
                'alignment': alignment,
                'vision_dispersion': vision_dispersion.mean(),
                'language_dispersion': language_dispersion.mean(),
                'per_stimulus_correlation': compute_dispersion_alignment_correlation(
                    vision_dispersion, language_dispersion, alignment
                )
            })
    
    return results


def compute_dispersion_alignment_correlation(dispersion, alignment_scores):
    """
    Correlation between dispersion and alignment at single-stimulus level
    """
    from scipy.stats import pearsonr
    
    # Lower dispersion should correlate with higher alignment
    correlation, p_value = pearsonr(-dispersion, alignment_scores)
    
    return {
        'correlation': correlation,
        'p_value': p_value,
        'interpretation': 'Negative correlation expected: low dispersion → high alignment'
    }

Key Findings

1. Intra-Modal Dispersion Modulates Cross-Modal Alignment

Dispersion Level Cross-Modal Alignment (Example)
Low (high agreement) ~2x higher alignment (DINOv2 + language models)
High (low agreement) Lower alignment

2. Generalization Across Model Pairings

  • Effect is robust across different vision-language model pairings
  • Independent of specific stimulus selection criteria
  • Consistent across architectural families

3. Single-Stimulus Resolution

  • Enables understanding of which specific stimuli drive alignment
  • Reveals stimulus-level factors contributing to convergence
  • Provides path toward understanding sources of convergence/divergence

Applications

1. Model-Brain Alignment Research

  • Identify stimuli that maximize model-brain alignment
  • Understand what makes representations "brain-like"
  • Guide model development toward more biological plausibility

2. Multimodal Model Evaluation

  • Evaluate vision-language model alignment quality
  • Identify poorly aligned stimulus categories
  • Guide data curation for multimodal training

3. Cognitive Science

  • Understand how humans represent stimuli across modalities
  • Study cross-modal transfer in human perception
  • Link computational models to cognitive theories

4. Explainable AI

  • Explain why certain stimuli are easy/hard for multimodal models
  • Identify ambiguous or multi-interpretable stimuli
  • Characterize model decision boundaries

Implementation Considerations

Data Requirements

  • Representations from multiple models within each modality
  • Paired stimuli: Same set of stimuli represented by all models
  • Sufficient samples: Enough stimuli to compute reliable statistics

Model Selection

  • Include diverse architectures (CNNs, Transformers, etc.)
  • Cover different training objectives (supervised, self-supervised, CLIP-style)
  • Ensure representation dimensionality is compatible or use dimensionality reduction

Statistical Validation

  • Bootstrap confidence intervals for dispersion and alignment
  • Control for stimulus set size
  • Test robustness to model selection

Visualization

def plot_dispersion_alignment_analysis(dispersion, alignment, stimulus_labels=None):
    """
    Visualize the relationship between dispersion and alignment
    """
    import matplotlib.pyplot as plt
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Scatter plot
    ax1 = axes[0]
    scatter = ax1.scatter(dispersion, alignment, c=stimulus_labels, cmap='viridis', alpha=0.6)
    ax1.set_xlabel('Intra-Modal Dispersion (log scale)')
    ax1.set_ylabel('Cross-Modal Alignment')
    ax1.set_title('Dispersion vs. Alignment')
    if stimulus_labels is not None:
        plt.colorbar(scatter, ax=ax1, label='Stimulus Category')
    
    # Binned analysis
    ax2 = axes[1]
    n_bins = 5
    bins = np.percentile(dispersion, np.linspace(0, 100, n_bins + 1))
    bin_centers = (bins[:-1] + bins[1:]) / 2
    bin_alignments = []
    bin_stds = []
    
    for i in range(n_bins):
        mask = (dispersion >= bins[i]) & (dispersion < bins[i+1])
        bin_alignments.append(alignment[mask].mean())
        bin_stds.append(alignment[mask].std())
    
    ax2.errorbar(bin_centers, bin_alignments, yerr=bin_stds, marker='o')
    ax2.set_xlabel('Intra-Modal Dispersion')
    ax2.set_ylabel('Mean Cross-Modal Alignment')
    ax2.set_title('Binned Analysis')
    
    plt.tight_layout()
    return fig

References

  • Paper: arXiv:2604.21836
  • Title: Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion
  • Authors: Eghbal A. Hosseini, Brian Cheung, Evelina Fedorenko, Alex H. Williams
  • Published: April 23, 2026
  • Categories: q-bio.NC, cs.AI
  • Workshop: ICLR 2026 Workshop on Representational Alignment (Re-Align)

Related Concepts

  • Generalized Procrustes Analysis (GPA)
  • Representational Similarity Analysis (RSA)
  • Model-brain alignment
  • Cross-modal learning
  • Vision-language models
  • Multimodal representations
  • Neural network interpretability
  • Brain-inspired AI
Install via CLI
npx skills add https://github.com/hiyenwong/ai_collection --skill cross-modal-dispersion-convergence
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator