desktop-control - SKILL.md Agent Skill

name: desktop-control description: Remote desktop control via WebRTC. Connect to paired desktop agents, capture screens, send mouse/keyboard input, execute shell commands, and transfer files. metadata: openclaw: emoji: "🖥️" requires: bins: ["node"]

Desktop Control Skill

Control remote desktops via WebRTC through the desktop-mcp-server.

Architecture

OpenClaw (this skill) → MCP Server (stdio) → WebRTC → Desktop Agent (remote PC)

MCP Server: Runs at /tmp/desktop-mcp-server, exposes tools via JSON-RPC stdio
Desktop Agent: Runs on the target PC, handles screen capture + input injection
Signaling: WebSocket server for WebRTC offer/answer exchange
Auth: Pairing codes for device registration, JWT tokens for sessions

Setup

1. Build the server (one-time)

cd /tmp/desktop-mcp-server
npm install && npm run build

2. Start the agent on the target desktop

Install and run the agent package on the PC you want to control:

cd /tmp/desktop-mcp-server
npm run start:agent

The agent will display a pairing code. Use desktop_connect with the device ID to connect.

Tool Reference

All tools are called via the helper script:

node <skill_dir>/scripts/mcp-call.mjs <tool_name> '<json_args>'

Connection Management

# Connect to a device
node <skill_dir>/scripts/mcp-call.mjs desktop_connect '{"deviceId":"abc-123"}'

# Check connection status
node <skill_dir>/scripts/mcp-call.mjs desktop_status

# Disconnect
node <skill_dir>/scripts/mcp-call.mjs desktop_disconnect

Screen Capture

# Get latest frame (saved to /tmp/desktop-mcp-state/frames/)
node <skill_dir>/scripts/mcp-call.mjs get_frame '{"quality":80,"format":"jpeg"}'

# Get multiple frames
node <skill_dir>/scripts/mcp-call.mjs get_frames '{"count":3,"quality":60}'

# Get screen resolution and cursor position
node <skill_dir>/scripts/mcp-call.mjs get_screen_info

Frame images are saved to /tmp/desktop-mcp-state/frames/ as JPEG/PNG files. Use the image tool to analyze captured frames.

Mouse Control

# Move mouse
node <skill_dir>/scripts/mcp-call.mjs mouse_move '{"x":500,"y":300}'

# Click (left, right, middle)
node <skill_dir>/scripts/mcp-call.mjs mouse_click '{"x":500,"y":300,"button":"left"}'

# Double-click
node <skill_dir>/scripts/mcp-call.mjs mouse_click '{"x":500,"y":300,"button":"left","double":true}'

# Drag from A to B
node <skill_dir>/scripts/mcp-call.mjs mouse_drag '{"fromX":100,"fromY":100,"toX":500,"toY":500}'

# Scroll (positive=up, negative=down)
node <skill_dir>/scripts/mcp-call.mjs mouse_scroll '{"amount":-3}'

Keyboard Control

# Type text
node <skill_dir>/scripts/mcp-call.mjs keyboard_type '{"text":"Hello World"}'

# Press key combination
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","c"]}'

# Common shortcuts
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","a"]}'     # Select all
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","v"]}'     # Paste
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["alt","tab"]}'    # Switch window
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["enter"]}'        # Enter

# Hold/release key
node <skill_dir>/scripts/mcp-call.mjs keyboard_hold '{"key":"shift","action":"down"}'
node <skill_dir>/scripts/mcp-call.mjs keyboard_hold '{"key":"shift","action":"up"}'

Clipboard

# Read clipboard
node <skill_dir>/scripts/mcp-call.mjs clipboard_read

# Write to clipboard
node <skill_dir>/scripts/mcp-call.mjs clipboard_write '{"text":"copied text"}'

Shell Execution

# Run command on remote desktop
node <skill_dir>/scripts/mcp-call.mjs shell_exec '{"command":"ls -la","timeout":10}'

# With working directory
node <skill_dir>/scripts/mcp-call.mjs shell_exec '{"command":"git status","workingDirectory":"/home/user/project"}'

Audio

# Text-to-speech on remote desktop
node <skill_dir>/scripts/mcp-call.mjs audio_speak '{"text":"Hello from Jarvis"}'

# Record from microphone (5 seconds)
node <skill_dir>/scripts/mcp-call.mjs audio_listen '{"duration":5}'

File Transfer

node <skill_dir>/scripts/mcp-call.mjs file_transfer '{"path":"/tmp/file.txt","direction":"download"}'

Workflow: Visual Desktop Automation

For visual tasks (click on button, fill form, etc.):

Capture screen: get_frame → saves image to disk
Analyze: Use image tool to understand what's on screen
Act: Send mouse/keyboard commands based on analysis
Verify: Capture again and confirm the action worked

Example automation loop:

get_frame → image analysis → mouse_click → get_frame → verify

Environment Variables

Variable	Default	Description
`DESKTOP_MCP_DIR`	`/tmp/desktop-mcp-server`	Path to MCP server repo
`DESKTOP_MCP_STATE`	`/tmp/desktop-mcp-state`	State/frame storage directory

Frame Storage

Captured frames are saved to /tmp/desktop-mcp-state/frames/ with timestamps. Clean up periodically:

find /tmp/desktop-mcp-state/frames/ -name "*.jpg" -mmin +60 -delete

Troubleshooting

"No device connected": Run desktop_connect first with a valid device ID
"No frame available": Agent might not be streaming yet, wait and retry
Timeout: Check that the agent is running and network is reachable
Build errors: Run cd /tmp/desktop-mcp-server && npm run build