spatial-intelligence-li

star 0

Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.

jona By jona schedule Updated 1/28/2026

name: spatial-intelligence-li description: Knowledge base on spatial intelligence as the next frontier in AI, based on Fei-Fei Li's insights from her Y Combinator AI Startup School talk. Use this skill when users ask about spatial intelligence concepts, 3D world modeling, the evolution of computer vision from ImageNet to World Labs, AI research strategy and problem selection, or when seeking advice on AI entrepreneurship and founding AI companies. Also trigger when discussing the relationship between vision/spatial understanding and AGI, differentiating generative vs discriminative models in 3D contexts, or exploring the data-algorithm-compute trinity for AI breakthroughs.

Fei-Fei Li: Spatial Intelligence Knowledge Base

Core Thesis

AGI will not be complete without spatial intelligence—the ability to understand, generate, reason about, and interact with 3D worlds. This represents the hardest and most important unsolved problem in AI.

Key Concepts

Spatial Intelligence

The multifaceted ability encompassing:

  1. Understanding 3D world structure - Perceiving depth, objects, relationships
  2. Generating 3D worlds - Creating novel 3D environments and content
  3. Reasoning about 3D space - Making inferences about spatial relationships
  4. Navigating and interacting - Moving through and manipulating physical world
  5. Communicating spatially - Describing and understanding spatial language

Why Spatial Intelligence is Fundamental

Use evolutionary evidence to explain importance:

  • Vision took ~540 million years to evolve (Cambrian explosion)
  • Language took ~300-500 million years
  • What evolution spent longest developing is likely most fundamental to intelligence
  • Visual/spatial intelligence predates and underlies linguistic intelligence

World Models

AI models that capture 3D structure and spatial intelligence, going beyond:

  • Flat pixels (2D image understanding)
  • Language tokens (text-only reasoning)
  • To represent true physical reality

Generation-Reconstruction Continuum

3D AI applications exist on a spectrum:

Pure Generation <-------------------------> Pure Reconstruction
   |                                                    |
Gaming, Metaverse,                              Robotics,
Creative content                           Physical manipulation
   |                                                    |
   +------ Most applications fall somewhere between ----+

World models must serve the entire continuum.

The Data-Algorithm-Compute Trinity

Major AI breakthroughs require convergence of three elements:

Element ImageNet Revolution Example
Data ImageNet dataset (14M+ labeled images)
Algorithm Convolutional Neural Networks (CNNs, published 1980s)
Compute GPUs enabling parallel processing

Key insight: CNNs existed for decades before working. The algorithm was ready—data and compute were not.

Technical References

Foundational Technologies for Spatial AI

Technology Description Use Case
NeRF (Neural Radiance Fields) Neural network representing 3D scenes as continuous volumetric functions Novel view synthesis, 3D reconstruction
Gaussian Splatting Point-based 3D representation using Gaussian primitives Real-time rendering, efficient 3D capture
Differentiable Rendering Rendering that allows gradient backpropagation Learning 3D from 2D supervision
Pulsar Point-based neural rendering (Christoph Lassner) Efficient 3D scene representation

Discriminative vs Generative Models

Approach Function 3D Application
Discriminative Classify, recognize patterns Object detection, scene understanding
Generative Create new content 3D world generation, content creation

The tension between these represents different approaches to spatial AI.

AI Research Strategy

Problem Selection Framework

When choosing PhD research or startup ideas:

  1. Avoid collision courses - Don't compete where industry scaling advantages dominate
  2. Find compute-immune problems - Focus on areas where throwing more compute alone won't solve it
  3. Embrace delusional problems - If a problem seems impossibly hard, that's often a good sign
  4. Follow curiosity - Burning curiosity sustains multi-year research efforts

The Delusional Problem Mindset

"My entire career is going after problems that are just so hard, 
bordering delusional... if they were easy, someone would have solved them."

Apply this framework:

  1. Identify problems others dismiss as impossible
  2. Ask: "What would need to be true for this to work?"
  3. Find the leverage points (data, algorithm, or compute gaps)
  4. Commit fully despite uncertainty

AI Entrepreneurship Advice

Ground Zero Mindset

When starting a company:

  • Forget past achievements
  • Forget what others think of you
  • Hunker down and build
  • Focus entirely on building from scratch

Open Source Strategy

Decide based on business model alignment:

Scenario Recommendation
Research acceleration needed Open source to enable global collaboration
Network effects valuable Open source to build ecosystem
Proprietary advantage critical Selective openness
Community contribution beneficial Open source with contribution model

ImageNet example: Open sourcing enabled AlexNet's emergence and accelerated the entire field.

Hiring for AI Teams

Primary trait to seek: Intellectual fearlessness

Characteristics:

  • Courage to embrace impossibly hard problems
  • Willingness to go all-in on uncertain bets
  • Comfort with ambiguity and potential failure
  • Driven by curiosity over career optimization

Graduate School Decision Framework

Pursue a PhD only if:

  • Driven by burning curiosity about a specific problem
  • Cannot imagine doing anything else
  • The research question obsesses you
  • Willing to accept opportunity cost

Do not pursue if:

  • Primarily for credentials or career advancement
  • Uncertain about research interests
  • Industry experience would serve goals better

Handling Imposter Syndrome

When feeling like a minority or outsider:

"Don't over-index on feeling like a minority... 
that moment will pass if you focus on the work."

Apply:

  1. Acknowledge the feeling without dwelling
  2. Redirect attention to the problem at hand
  3. Let results speak over time
  4. Build community with fellow researchers

Historical Context

The ImageNet Story (2007-2012)

Timeline for understanding AI breakthrough patterns:

Year Event
2007 ImageNet conceived at Princeton
2009 Initial CVPR poster publication
2009-2011 Open sourced, challenge created, ~30% error rate
2012 AlexNet achieves breakthrough, error drops dramatically

Key decisions that enabled success:

  1. Open sourcing from the beginning
  2. Creating competitive challenge to attract talent
  3. Patience through years of modest results
  4. Betting on data-driven paradigm shift

AI Timeline Perspective

Era Focus
1956 Dartmouth Conference - "machines that think"
1980s CNNs published (Yann LeCun et al.)
2009 ImageNet dataset
2012 Deep learning revolution (AlexNet)
2015 Image captioning breakthroughs
2020s Spatial intelligence frontier

Applying These Insights

When Explaining Spatial Intelligence

  1. Start with evolutionary argument (540M years for vision)
  2. Distinguish from language-only AI
  3. Describe the generation-reconstruction continuum
  4. Reference concrete technologies (NeRF, Gaussian Splatting)

When Advising AI Researchers

  1. Assess their curiosity level and problem obsession
  2. Guide toward compute-immune research areas
  3. Encourage intellectual fearlessness
  4. Emphasize the data-algorithm-compute trinity

When Discussing AI Startups

  1. Apply ground zero mindset
  2. Evaluate open source strategy for their context
  3. Emphasize hiring for intellectual fearlessness
  4. Frame problems in terms of the trinity convergence

Key Organizations and Resources

Resource Description
World Labs (worldlabs.ai) Fei-Fei Li's spatial intelligence startup
Stanford HAI Human-Centered AI Institute
ImageNet (imagenet.org) Original dataset
CVPR, ICCV Primary computer vision conferences
"The Worlds I See" Fei-Fei Li's book
Install via CLI
npx skills add https://github.com/jona/ycombinator-skills --skill spatial-intelligence-li
Repository Details
star Stars 0
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator