3dgs-mcp-renderer

name: 3dgs-mcp-renderer description: "MCP protocol integration with 3DGS rendering pipeline: Agent-controlled Three.js/WebGPU rendering, voice-driven scene reconstruction, real-time parameter manipulation, light tracing backend. Prototype for Agent↔3DGS interaction." when_to_use: "MCP rendering, agent-controlled 3DGS, voice-driven reconstruction, real-time 3DGS editing, Three.js 3DGS, WebGPU Gaussian splatting, interactive rendering control, speech-to-3D, light tracing, HiGS accelerated rendering" version: 0.6.0 author: jaccen tags: ["mcp", "3dgs", "gaussian-splatting", "rendering", "three.js", "webgpu", "voice", "agent", "interactive"] disable-model-invocation: true user-invocable: true

3DGS MCP Renderer — Agent-3DGS Interaction via MCP Protocol

Prototype specification for integrating MCP (Model Context Protocol) with 3DGS rendering pipelines, enabling AI Agents to directly manipulate Three.js/3DGS rendering parameters and achieve voice-driven 3D scene reconstruction.

Architecture

┌─────────────┐     ┌─────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ Voice/Text  │────▶│   Agent     │────▶│  MCP Server      │────▶│  3DGS Renderer   │
│ (Whisper/   │     │ (Claude/    │     │  (Node.js/       │     │  (Three.js/      │
│  Prompt)    │     │  TeleClaw)  │     │   Python)        │     │   WebGPU/HiGS/   │
│             │◀────│             │◀────│                  │◀────│   DDF-GS)        │
└─────────────┘     └─────────────┘     └──────────────────┘     └──────────────────┘
                        │                      │                       │
                        │  Tool calls          │  WebSocket/HTTP       │  WebGL/WebGPU/
                        │  (MCP protocol)       │  transport            │  HiGS/DDF-GS

MCP Tools Specification

Tool 1: `import_scene`

{
  "name": "import_scene",
  "description": "Load a 3DGS scene from PLY/SPLAT file or URL into the renderer",
  "inputSchema": {
    "type": "object",
    "properties": {
      "source": { "type": "string", "description": "File path or URL to .ply/.splat file" },
      "format": { "enum": ["ply", "splat", "spz", "ksplat"], "description": "File format" }
    },
    "required": ["source"]
  },
  "output": { "type": "object", "properties": { "scene_id": "string", "gaussian_count": "number", "bbox": "object" } }
}

Tool 2: `set_camera`

{
  "name": "set_camera",
  "description": "Set camera position, target, and field of view",
  "inputSchema": {
    "type": "object",
    "properties": {
      "position": { "type": "array", "items": {"type": "number"}, "description": "[x, y, z]" },
      "target": { "type": "array", "items": {"type": "number"}, "description": "[x, y, z] look-at point" },
      "fov": { "type": "number", "description": "Field of view in degrees" },
      "up": { "type": "array", "items": {"type": "number"}, "description": "[x, y, z] up vector" }
    },
    "required": ["position", "target"]
  }
}

Tool 3: `modify_gaussians`

{
  "name": "modify_gaussians",
  "description": "Modify properties of Gaussians by selection criteria",
  "inputSchema": {
    "type": "object",
    "properties": {
      "select": {
        "type": "object",
        "properties": {
          "ids": { "type": "array", "items": {"type": "integer"}, "description": "Specific Gaussian IDs" },
          "region": { "type": "object", "properties": {"center": "array", "radius": "number"}, "description": "Sphere selection" },
          "label": { "type": "string", "description": "Semantic label from segmentation" }
        }
      },
      "operations": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "property": { "enum": ["opacity", "color", "position", "scale", "rotation"] },
            "action": { "enum": ["set", "add", "multiply"] },
            "value": {}
          }
        }
      }
    },
    "required": ["select", "operations"]
  }
}

Tool 4: `render_frame`

{
  "name": "render_frame",
  "description": "Render current scene from current camera and return as image",
  "inputSchema": {
    "type": "object",
    "properties": {
      "width": { "type": "integer", "default": 1920 },
      "height": { "type": "integer", "default": 1080 },
      "format": { "enum": ["png", "jpeg", "webp"], "default": "png" },
      "background": { "type": "string", "default": "#000000" }
    }
  },
  "output": { "type": "object", "properties": { "image": "string (base64)", "render_time_ms": "number" } }
}

Tool 5: `query_scene`

{
  "name": "query_scene",
  "description": "Query scene information: statistics, geometry, semantics",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query_type": { "enum": ["stats", "bbox", "gaussian_at_point", "segmentation", "materials"] },
      "point": { "type": "array", "items": {"type": "number"}, "description": "[x, y, z] for point queries" }
    },
    "required": ["query_type"]
  }
}

Tool 6: `cast_ray`

{
  "name": "cast_ray",
  "description": "Cast a ray from origin in direction and return distance to first surface hit. Leverages DDF-GS (arXiv:2606.00817) neural field distilled from trained 3DGS.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "origin": { "type": "array", "items": {"type": "number"}, "description": "[x, y, z] ray origin" },
      "direction": { "type": "array", "items": {"type": "number"}, "description": "[x, y, z] ray direction (normalized)" }
    },
    "required": ["origin", "direction"]
  },
  "output": { "type": "object", "properties": { "distance": "number", "hit": "boolean", "normal": "array [x,y,z]" } }
}

Use cases: Shadow rendering, ambient occlusion, reflection rays, global illumination

Limitation: Requires DDF distillation step after 3DGS training (adds ~10 min for 52MB model)

Tool 7: `simulate_physics`

MCP Tool: simulate_physics — Invoke external physics engine (MPM/SPH/PBD) on 3DGS scene via RAF-style representation abstraction; parameters: object_ids, force, solver_type; returns: updated Gaussian positions/covariances

{
  "name": "simulate_physics",
  "description": "Invoke external physics engine (MPM/SPH/PBD) on 3DGS scene via RAF-style representation abstraction",
  "inputSchema": {
    "type": "object",
    "properties": {
      "object_ids": { "type": "array", "items": {"type": "integer"}, "description": "IDs of objects to simulate" },
      "force": { "type": "object", "properties": {"linear": "array", "angular": "array"}, "description": "Applied force/torque" },
      "solver_type": { "enum": ["mpm", "sph", "pbd", "rigid_body"], "description": "Physics solver backend" },
      "dt": { "type": "number", "description": "Time step in seconds", "default": 0.016 },
      "steps": { "type": "integer", "description": "Number of simulation steps", "default": 1 }
    },
    "required": ["object_ids", "solver_type"]
  },
  "output": { "type": "object", "properties": { "updated_positions": "array", "updated_covariances": "array", "energy": "number" } }
}

Use cases: Physics-driven scene editing, collapse/fall simulation, fluid interaction with Gaussian objects

Tool 8: `query_4d_scene`

MCP Tool: query_4d_scene — Query dynamic 3D scene at arbitrary (x,y,t) coordinates; returns: 3D position, flow vector, segmentation label; enables voice-driven temporal navigation

{
  "name": "query_4d_scene",
  "description": "Query dynamic 3D scene at arbitrary (x,y,t) coordinates; enables voice-driven temporal navigation via D4RT unified query mechanism",
  "inputSchema": {
    "type": "object",
    "properties": {
      "x": { "type": "number", "description": "X coordinate in scene space" },
      "y": { "type": "number", "description": "Y coordinate in scene space" },
      "t": { "type": "number", "description": "Time index in dynamic sequence" },
      "query_fields": { "type": "array", "items": {"enum": ["position_3d", "flow_vector", "segmentation_label", "depth"]}, "description": "Fields to return" }
    },
    "required": ["x", "y", "t"]
  },
  "output": { "type": "object", "properties": { "position_3d": "array [x,y,z]", "flow_vector": "array [dx,dy,dz]", "segmentation_label": "string", "depth": "number" } }
}

Use cases: "What was here at time t=5?", temporal object tracking, voice-driven time scrubbing

Tool 9: `deform_elastic`

MCP Tool: deform_elastic — Apply particle-skinned eigenmode deformation to 3DGS object; parameters: object_id, mode_indices, amplitudes; returns: deformed Gaussian positions

{
  "name": "deform_elastic",
  "description": "Apply particle-skinned eigenmode deformation to 3DGS object (FreeForm-style elastic deformation)",
  "inputSchema": {
    "type": "object",
    "properties": {
      "object_id": { "type": "integer", "description": "ID of object to deform" },
      "mode_indices": { "type": "array", "items": {"type": "integer"}, "description": "Eigenmode indices to activate" },
      "amplitudes": { "type": "array", "items": {"type": "number"}, "description": "Amplitude per eigenmode" },
      "interpolation": { "enum": ["linear", "smoothstep"], "description": "Interpolation method for deformation", "default": "smoothstep" }
    },
    "required": ["object_id", "mode_indices", "amplitudes"]
  },
  "output": { "type": "object", "properties": { "deformed_positions": "array", "eigenmode_energies": "array" } }
}

Use cases: Elastic soft-body deformation, eigenmode-based shape editing, physically plausible object bending

Tool 10: `query_spatial_context`

{
  "name": "query_spatial_context",
  "description": "Query spatial understanding of the current 3DGS scene using spatial intelligence models (Spatial-TTT/Holi-Spatial pipeline). Returns spatial relations, grounding, and scene graph.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "scene_id": { "type": "string", "description": "Scene identifier from import_scene" },
      "query": { "type": "string", "description": "Natural language spatial query about the scene" },
      "mode": { "enum": ["grounding", "relation", "measurement", "scene_graph"], "description": "Type of spatial query" }
    },
    "required": ["scene_id", "query", "mode"]
  },
  "output": { "type": "object", "properties": { "answer": "string", "spatial_data": "object", "confidence": "number" } }
}

Integrates Holi-Spatial (ICML 2026 Oral) data pipeline for automated spatial annotation and Spatial-TTT (ECCV 2026) for streaming spatial memory updates.

Voice Intent Mapping

Voice Intent Example	Intent Type	MCP Tool Call
"What is to the left of the chair?"	Spatial grounding query	`query_spatial_context` (mode="grounding")
"How far is the table from the door?"	Spatial measurement	`query_spatial_context` (mode="measurement")

Voice-Driven Reconstruction Flow

User: "Show me the scene from above"
  │
  ▼
Whisper STT ──▶ Text: "Show me the scene from above"
  │
  ▼
Agent (Claude/TeleClaw) interprets:
  - Intent: Change camera to bird's-eye view
  - Parameters: position=[0, 10, 0], target=[0, 0, 0], up=[0, 0, -1]
  │
  ▼
MCP tool call: set_camera(position=[0, 10, 0], target=[0, 0, 0])
  │
  ▼
MCP tool call: render_frame(width=1920, height=1080)
  │
  ▼
Agent receives base64 image, verifies, reports to user

User: "Make the left wall transparent"
  │
  ▼
Agent:
  1. query_scene(query_type="segmentation") → find "left wall" label
  2. modify_gaussians(select={label: "left wall"}, operations=[{property: "opacity", action: "multiply", value: 0.2}])
  3. render_frame() → verify visual result

Implementation Stack

Component	Technology	Status
MCP Server	Node.js + @modelcontextprotocol/sdk	Prototype
3DGS Renderer	Three.js + gaussian-splat-3d / gsplat.js	Available
WebGPU backend	WebGPU + WGSL compute shaders	Experimental
HiGS backend	Dual-scale tile rasterization (arXiv:2606.00352)	Planned
DDF-GS backend	Neural distance field for ray queries (arXiv:2606.00817)	Planned
Transport	WebSocket (localhost)	Working
Voice STT	Whisper API / Web Speech API	Available
Agent integration	Claude Code / TeleClaw MCP client	Pending

Current Renderer Compatibility

Renderer	Format	WebGPU	MCP-Ready	Stars
gsplat.js	.ply/.splat	Yes	Needs adapter	—
GaussianSplats3D	.ply	WebGL	Needs adapter	—
viser/nerfstudio	.ply	WebGL	Partial	—
PlayCanvas	.ply	Yes	Needs adapter	—
brush (Rust/WebGPU)	.ply	Yes	Closest	4.3k
HiGS	.ply	Yes	Planned	—
DDF-GS	.ply + .ddf	Yes	Planned	—

DDF-GS Distillation Pipeline

Train 3DGS scene normally
Distill into Directed Distance Function (DDF) neural field
- Input: trained 3DGS model (.ply)
- Output: DDF model (~52MB, size independent of Gaussian count)
- Training time: ~10 minutes
- Quality: shadow at 30.3 dB PSNR, AO at 21.3 dB PSNR
DDF enables: shadow maps, AO, reflections, global illumination

HiGS Hierarchical Rendering Integration

HiGS (arXiv:2606.00352) achieves 15.8x rendering speedup via dual-scale tile architecture
MCP integration: render_frame() can leverage HiGS backend for real-time rendering
Architecture: Agent → MCP → HiGS Renderer (macro-tile partitioning + micro-tile rasterization)
Performance target: 950+ FPS on NVIDIA GPU for interactive scene exploration

Known Limitations

Latency: Large scenes (>1M Gaussians) require progressive loading; MCP render_frame may take 100-500ms
Selection precision: Sphere/label-based Gaussian selection may miss thin structures; need ray-picking
State management: MCP server must maintain scene state across tool calls; no built-in undo
GPU memory: WebGL/WebGPU shares GPU memory with browser; cannot load >2GB scenes on most devices

Roadmap

v0.1: MCP tool specification (this document)
v0.2: Node.js MCP server + gsplat.js adapter + DDF-GS cast_ray tool + HiGS backend
v0.3: Voice-to-MCP pipeline (Whisper → Agent → MCP → render) + simulate_physics (RAF) + query_4d_scene (D4RT) + deform_elastic (FreeForm)
v0.4: Semantic querying (integrate OP2GS/Gaga for label-based selection)
v0.5: Real-time streaming (WebSocket-based progressive rendering)
v0.6: DDF-GS distillation integration (shadow/AO/reflection rendering)
v0.7: HiGS hierarchical rendering backend (950+ FPS target)

Rules

Never modify original PLY files: All operations are in-memory only; export requires explicit user command
Validate before render: Always verify camera parameters and Gaussian bounds before rendering
Respect GPU limits: Check available VRAM before loading large scenes; provide downsampling option
Report rendering time: Always include render_time_ms in render_frame output for performance monitoring
Safety gate: Operations affecting >10% of Gaussians require explicit user confirmation

Part of Awesome-Gaussian-Skills

Related Skills

3dgs-engineering-guide — Production deployment (use for end-to-end deployment workflows)
3dgs-spatial-agent — Spatial intelligence agent (use for agent-driven 3D interaction)
3dgs-articulated-reasoner — Articulated object reasoning (use for interactive object manipulation)
3dgs-visualizer — Visualization (use for rendering pipeline output quality assessment)

Guardrail: Do Not Apply From Memory

Do NOT try to apply the logic, method data, bug patterns, or technical details described in this skill from memory. Always read the SKILL.md and referenced files from disk before producing any output. The knowledge base is updated frequently; stale memory may produce outdated, inaccurate, or fabricated results.

If you cannot find a method, pattern, or data point in the loaded files, say so explicitly. Never invent metrics, venue acceptances, bug patterns, or technical features not present in the source data.