name: desktop-control description: Remote desktop control via WebRTC. Connect to paired desktop agents, capture screens, send mouse/keyboard input, execute shell commands, and transfer files. metadata: openclaw: emoji: "๐ฅ๏ธ" requires: bins: ["node"]
Desktop Control Skill
Control remote desktops via WebRTC through the desktop-mcp-server.
Architecture
OpenClaw (this skill) โ MCP Server (stdio) โ WebRTC โ Desktop Agent (remote PC)
- MCP Server: Runs at
/tmp/desktop-mcp-server, exposes tools via JSON-RPC stdio - Desktop Agent: Runs on the target PC, handles screen capture + input injection
- Signaling: WebSocket server for WebRTC offer/answer exchange
- Auth: Pairing codes for device registration, JWT tokens for sessions
Setup
1. Build the server (one-time)
cd /tmp/desktop-mcp-server
npm install && npm run build
2. Start the agent on the target desktop
Install and run the agent package on the PC you want to control:
cd /tmp/desktop-mcp-server
npm run start:agent
The agent will display a pairing code. Use desktop_connect with the device ID to connect.
Tool Reference
All tools are called via the helper script:
node <skill_dir>/scripts/mcp-call.mjs <tool_name> '<json_args>'
Connection Management
# Connect to a device
node <skill_dir>/scripts/mcp-call.mjs desktop_connect '{"deviceId":"abc-123"}'
# Check connection status
node <skill_dir>/scripts/mcp-call.mjs desktop_status
# Disconnect
node <skill_dir>/scripts/mcp-call.mjs desktop_disconnect
Screen Capture
# Get latest frame (saved to /tmp/desktop-mcp-state/frames/)
node <skill_dir>/scripts/mcp-call.mjs get_frame '{"quality":80,"format":"jpeg"}'
# Get multiple frames
node <skill_dir>/scripts/mcp-call.mjs get_frames '{"count":3,"quality":60}'
# Get screen resolution and cursor position
node <skill_dir>/scripts/mcp-call.mjs get_screen_info
Frame images are saved to /tmp/desktop-mcp-state/frames/ as JPEG/PNG files.
Use the image tool to analyze captured frames.
Mouse Control
# Move mouse
node <skill_dir>/scripts/mcp-call.mjs mouse_move '{"x":500,"y":300}'
# Click (left, right, middle)
node <skill_dir>/scripts/mcp-call.mjs mouse_click '{"x":500,"y":300,"button":"left"}'
# Double-click
node <skill_dir>/scripts/mcp-call.mjs mouse_click '{"x":500,"y":300,"button":"left","double":true}'
# Drag from A to B
node <skill_dir>/scripts/mcp-call.mjs mouse_drag '{"fromX":100,"fromY":100,"toX":500,"toY":500}'
# Scroll (positive=up, negative=down)
node <skill_dir>/scripts/mcp-call.mjs mouse_scroll '{"amount":-3}'
Keyboard Control
# Type text
node <skill_dir>/scripts/mcp-call.mjs keyboard_type '{"text":"Hello World"}'
# Press key combination
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","c"]}'
# Common shortcuts
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","a"]}' # Select all
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","v"]}' # Paste
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["alt","tab"]}' # Switch window
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["enter"]}' # Enter
# Hold/release key
node <skill_dir>/scripts/mcp-call.mjs keyboard_hold '{"key":"shift","action":"down"}'
node <skill_dir>/scripts/mcp-call.mjs keyboard_hold '{"key":"shift","action":"up"}'
Clipboard
# Read clipboard
node <skill_dir>/scripts/mcp-call.mjs clipboard_read
# Write to clipboard
node <skill_dir>/scripts/mcp-call.mjs clipboard_write '{"text":"copied text"}'
Shell Execution
# Run command on remote desktop
node <skill_dir>/scripts/mcp-call.mjs shell_exec '{"command":"ls -la","timeout":10}'
# With working directory
node <skill_dir>/scripts/mcp-call.mjs shell_exec '{"command":"git status","workingDirectory":"/home/user/project"}'
Audio
# Text-to-speech on remote desktop
node <skill_dir>/scripts/mcp-call.mjs audio_speak '{"text":"Hello from Jarvis"}'
# Record from microphone (5 seconds)
node <skill_dir>/scripts/mcp-call.mjs audio_listen '{"duration":5}'
File Transfer
node <skill_dir>/scripts/mcp-call.mjs file_transfer '{"path":"/tmp/file.txt","direction":"download"}'
Workflow: Visual Desktop Automation
For visual tasks (click on button, fill form, etc.):
- Capture screen:
get_frameโ saves image to disk - Analyze: Use
imagetool to understand what's on screen - Act: Send mouse/keyboard commands based on analysis
- Verify: Capture again and confirm the action worked
Example automation loop:
get_frame โ image analysis โ mouse_click โ get_frame โ verify
Environment Variables
| Variable | Default | Description |
|---|---|---|
DESKTOP_MCP_DIR |
/tmp/desktop-mcp-server |
Path to MCP server repo |
DESKTOP_MCP_STATE |
/tmp/desktop-mcp-state |
State/frame storage directory |
Frame Storage
Captured frames are saved to /tmp/desktop-mcp-state/frames/ with timestamps.
Clean up periodically:
find /tmp/desktop-mcp-state/frames/ -name "*.jpg" -mmin +60 -delete
Troubleshooting
- "No device connected": Run
desktop_connectfirst with a valid device ID - "No frame available": Agent might not be streaming yet, wait and retry
- Timeout: Check that the agent is running and network is reachable
- Build errors: Run
cd /tmp/desktop-mcp-server && npm run build