face-perception-inverse-generative - SKILL.md Agent Skill

name: face-perception-inverse-generative description: > Human face perception methodology using controversial stimulus pairs to distinguish between theoretically distinct DNN models. Shows that human face perception is shaped by inverse-generative mechanisms that infer latent causes of facial appearance and discount nuisance variation, tuned by natural image statistics. arXiv:2605.12619. category: neuroscience tags: [face-perception, inverse-rendering, neural-representations, DNN-neuroscience, representational-similarity, controversial-stimuli, ventral-stream] related_skills:

neuroscience-of-transformers
vlm-visual-cortex-alignment-robustness
neural-encoding-evaluation-ground-truth activation_keywords:
face perception inverse generative
inverse rendering face perception
controversial face pairs
human face dissimilarity judgments
natural image statistics face perception
neural representations face recognition
DNN face perception models

Face Perception via Inverse-Generative and Naturalistic Discriminative Objectives

Paper: Human face perception reflects inverse-generative and naturalistic discriminative objectives Authors: Wenxuan Guo, Heiko H. Schutt, Kamila Maria Jozwik, Katherine R. Storrs, Nikolaus Kriegeskorte, Tal Golan arXiv: 2605.12619 (May 12, 2026) Category: q-bio.NC, cs.CV

Overview

This paper addresses what computational objectives shape human face perception. By comparing six DNN models trained on distinct tasks using controversial face pairs (optimized to elicit contrasting model predictions) rather than randomly sampled faces, the study reveals that human face perception is shaped by inverse-generative mechanisms that infer latent causes of facial appearance.

Core Problem

Theoretically distinct DNN models often make indistinguishable representational predictions for randomly sampled faces. Standard RSA with random stimuli cannot expose diagnostic differences among competing computational hypotheses about face perception.

Key Innovation: Controversial Stimuli

Controversial pairs are face pairs specifically optimized to maximize disagreement between model predictions. This diagnostic approach reveals which computational objectives best match human perceptual judgments.

Methodology

Models Compared (6 models, shared architecture, different objectives)

Inverse rendering - infers latent 3D causes of facial appearance
Face identification - identity classification
Object classification - general object categorization
Self-supervised - contrastive learning
Pixel reconstruction - autoencoder-style
Random/naive - baseline

Experimental Design

864 human participants for face-dissimilarity judgments
Stimulus sets varying in realism and pose variation
Controversial pairs + random pairs for comparison
RSA between model representations and human judgments

Key Findings

Inverse-generative models win: Models trained on inverse rendering, face ID, or object classification most robustly matched human judgments
Natural image advantage: Models trained on natural images outperformed synthetic-trained
Controversial pairs are diagnostic: Random pairs cannot distinguish competing models
Latent cause inference: Face perception infers underlying 3D structure, discounts nuisance
Natural statistics tuning: Face perception is tuned by natural image statistics

Core Principle

Human face perception reflects inverse problem solving - the brain infers latent causal structure (identity, 3D shape) from appearance, rather than mere pattern matching.

Workflow for Agents

Controversial Stimuli Design Pattern

1. Train multiple models with different objectives
2. Find input pairs that maximize disagreement between models
3. Collect human perceptual judgments on these diagnostic pairs
4. Compare model RSA matrices against human judgments
5. The model that best predicts human dissimilarity wins

Applications

Visual neuroscience - understanding ventral stream computation
Computer vision - designing human-aligned vision systems
AI safety - understanding AI vs human perception divergence
Computational psychiatry - modeling face perception deficits

Pitfalls

Random stimuli are non-diagnostic - cannot distinguish competing hypotheses
Natural vs synthetic gap - natural image training consistently outperforms synthetic
Inverse rendering is computationally expensive - winning model type is most costly

References

arXiv:2605.12619 (Guo et al., 2026)
Kriegeskorte et al. (2008): Representational Similarity Analysis