vision-bottleneck-v1 - SKILL.md Agent Skill

name: vision-bottleneck-v1 description: "Vision as looking and seeing through a bottleneck framework. Explains V1 as an information bottleneck that optimizes visual representation given retinal sampling constraints. Addresses why vision research progress is slower downstream than upstream of V1. Keywords: vision bottleneck, V1, primary visual cortex, information theory, visual representation, retinal sampling."

Vision as Looking and Seeing Through a Bottleneck

Framework explaining primary visual cortex (V1) as an information bottleneck that optimally transforms retinal inputs given sampling constraints.

Metadata

Source: arXiv:2604.23030v1
Authors: Li Zhaoping
Published: 2026-04-24
Institution: University College London

Core Methodology

The Vision Research Puzzle

Progress in vision research has been slower downstream than upstream of V1. Traditional frameworks overlook a critical constraint: only a tiny fraction of retinal information can be transmitted through the optic nerve (~1 million fibers) to the brain.

The Bottleneck Framework

RETINA (130 million photoreceptors)
         ↓
    ┌─────────────────────────────┐
    │   INFORMATION BOTTLENECK    │
    │   (Optic Nerve: 1M fibers)  │
    └─────────────────────────────┘
         ↓
         V1 (Primary Visual Cortex)
         ↓
    OPTIMAL REPRESENTATION
    - Edge detection
    - Orientation selectivity
    - Spatial frequency tuning
         ↓
    HIGHER VISUAL AREAS
    - Object recognition
    - Scene understanding

Key Insight: V1 as Optimal Encoder

V1 neurons are positioned to:

Compress: Reduce 130M retinal inputs to ~1M optic nerve fibers
Preserve: Maintain information critical for behavior
Transform: Create representation optimal for downstream processing

Implementation Guide

Information Bottleneck Principle

import numpy as np
from scipy.stats import entropy

class VisualBottleneck:
    """
    Model visual processing as information bottleneck optimization
    """
    def __init__(self, input_dim=1000, bottleneck_dim=100, output_dim=50):
        self.input_dim = input_dim        # Retinal inputs
        self.bottleneck_dim = bottleneck_dim  # V1 representation
        self.output_dim = output_dim      # Behavioral output
        
        # Encoder: Retina → V1
        self.W_encoder = np.random.randn(input_dim, bottleneck_dim) * 0.01
        
        # Decoder: V1 → Behavior
        self.W_decoder = np.random.randn(bottleneck_dim, output_dim) * 0.01
    
    def encode(self, retina_input):
        """
        Encode retinal input to V1 representation
        
        Implements oriented Gabor-like filtering (V1 simple cells)
        """
        # Linear encoding with sparsity
        v1_response = np.dot(retina_input, self.W_encoder)
        
        # ReLU-like nonlinearity (thresholded response)
        v1_response = np.maximum(v1_response, 0)
        
        # Sparsification (only ~5% neurons active)
        k = int(0.05 * self.bottleneck_dim)
        top_k_indices = np.argsort(v1_response)[-k:]
        sparse_response = np.zeros_like(v1_response)
        sparse_response[top_k_indices] = v1_response[top_k_indices]
        
        return sparse_response
    
    def decode(self, v1_response):
        """
        Decode V1 representation to behavioral output
        """
        return np.dot(v1_response, self.W_decoder)
    
    def information_bottleneck_loss(self, X, Y, beta=1.0):
        """
        Information bottleneck objective:
        minimize: I(X;T) - beta * I(T;Y)
        
        where T is the bottleneck representation (V1)
        """
        # Encode all inputs
        T = np.array([self.encode(x) for x in X])
        
        # Estimate mutual information I(X;T)
        # Simplified: use correlation-based approximation
        i_x_t = self._estimate_mutual_info(X, T)
        
        # Estimate mutual information I(T;Y)
        i_t_y = self._estimate_mutual_info(T, Y)
        
        # IB objective: compression vs. prediction tradeoff
        loss = i_x_t - beta * i_t_y
        
        return loss, i_x_t, i_t_y
    
    def _estimate_mutual_info(self, X, Y, bins=10):
        """Estimate mutual information using binning"""
        # Discretize
        X_discrete = np.digitize(X, np.linspace(X.min(), X.max(), bins))
        Y_discrete = np.digitize(Y, np.linspace(Y.min(), Y.max(), bins))
        
        # Joint and marginal distributions
        joint = np.histogram2d(X_discrete.flatten(), Y_discrete.flatten(), bins=bins)[0]
        joint /= joint.sum()
        
        p_x = joint.sum(axis=1)
        p_y = joint.sum(axis=0)
        
        # Mutual information
        mi = 0
        for i in range(bins):
            for j in range(bins):
                if joint[i, j] > 0:
                    mi += joint[i, j] * np.log2(joint[i, j] / (p_x[i] * p_y[j]))
        
        return mi

Gabor Filter Bank (V1 Simple Cells)

class V1SimpleCellBank:
    """
    Gabor filter bank modeling V1 simple cell receptive fields
    """
    def __init__(self, image_size=256, n_orientations=8, n_scales=4):
        self.image_size = image_size
        self.n_orientations = n_orientations
        self.n_scales = n_scales
        self.filters = self._create_gabor_filters()
    
    def _create_gabor_filters(self):
        """Create Gabor filter bank"""
        filters = []
        
        for scale in range(self.n_scales):
            for orientation in range(self.n_orientations):
                # Gabor parameters
                sigma = 4 * (2 ** scale)  # Scale-dependent
                theta = orientation * np.pi / self.n_orientations
                
                # Create filter
                filt = self._gabor(sigma, theta)
                filters.append(filt)
        
        return filters
    
    def _gabor(self, sigma, theta, wavelength=None, phase=0):
        """Create single Gabor filter"""
        if wavelength is None:
            wavelength = sigma * 2
        
        # Grid
        size = int(sigma * 4)
        x = np.linspace(-size, size, 2*size+1)
        y = np.linspace(-size, size, 2*size+1)
        X, Y = np.meshgrid(x, y)
        
        # Rotation
        X_rot = X * np.cos(theta) + Y * np.sin(theta)
        Y_rot = -X * np.sin(theta) + Y * np.cos(theta)
        
        # Gabor function
        gaussian = np.exp(-(X_rot**2 + Y_rot**2) / (2 * sigma**2))
        sinusoid = np.cos(2 * np.pi * X_rot / wavelength + phase)
        
        return gaussian * sinusoid
    
    def filter_image(self, image):
        """
        Apply V1-like filtering to image
        
        Returns:
            V1 representation (sparse, oriented features)
        """
        from scipy.signal import convolve2d
        
        responses = []
        for filt in self.filters:
            response = convolve2d(image, filt, mode='same', boundary='fill')
            responses.append(response)
        
        # Stack and apply nonlinearity
        v1_response = np.array(responses)
        v1_response = np.maximum(v1_response, 0)  # Rectification
        
        return v1_response

Predictive Coding Extension

class PredictiveVisualBottleneck(VisualBottleneck):
    """
    Extend bottleneck with predictive coding (minimize prediction error)
    """
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
        # Predictive model
        self.W_predict = np.random.randn(self.bottleneck_dim, self.input_dim) * 0.01
    
    def predict_input(self, v1_response):
        """
        Predict retinal input from V1 representation
        (top-down prediction in predictive coding)
        """
        return np.dot(v1_response, self.W_predict)
    
    def predictive_coding_loss(self, retina_input, v1_response):
        """
        Predictive coding: minimize prediction error at each level
        """
        # Top-down prediction
        predicted_input = self.predict_input(v1_response)
        
        # Prediction error
        prediction_error = retina_input - predicted_input
        
        # Loss: precision-weighted prediction error
        loss = np.sum(prediction_error ** 2)
        
        return loss, prediction_error

Applications

V1 Function Understanding: Why V1 has specific response properties
Visual Prosthetics: Optimal encoding for retinal implants
Compression Algorithms: Bio-inspired image compression
Deep Learning Architectures: Bottleneck-based network design

Pitfalls

Simplification: Real V1 is more complex than simple bottleneck
Static Model: Doesn't capture V1 dynamics and adaptation
Feedforward Only: Ignores massive feedback connections to V1
Single Modality: Doesn't account for multisensory integration

Related Skills

untrained-cnns-match-backprop-v1
vlm-visual-cortex-alignment-robustness
primary-visual-cortex-v1-functions
brain-inspired-capture-evidence-driven-neuromimetic-perceptual

References

Zhaoping (2026) Vision as looking and seeing through a bottleneck, arXiv:2604.23030
Linsker (1988) Self-organization in a perceptual network
Olshausen & Field (1996) Emergence of simple-cell receptive field properties
Zhaoping (2014) Understanding vision: theory, models, and data