desktop-control

star 1

Remote desktop control via WebRTC. Connect to paired desktop agents, capture screens, send mouse/keyboard input, execute shell commands, and transfer files.

jagjerez-org By jagjerez-org schedule Updated 2/19/2026

name: desktop-control description: Remote desktop control via WebRTC. Connect to paired desktop agents, capture screens, send mouse/keyboard input, execute shell commands, and transfer files. metadata: openclaw: emoji: "๐Ÿ–ฅ๏ธ" requires: bins: ["node"]


Desktop Control Skill

Control remote desktops via WebRTC through the desktop-mcp-server.

Architecture

OpenClaw (this skill) โ†’ MCP Server (stdio) โ†’ WebRTC โ†’ Desktop Agent (remote PC)
  • MCP Server: Runs at /tmp/desktop-mcp-server, exposes tools via JSON-RPC stdio
  • Desktop Agent: Runs on the target PC, handles screen capture + input injection
  • Signaling: WebSocket server for WebRTC offer/answer exchange
  • Auth: Pairing codes for device registration, JWT tokens for sessions

Setup

1. Build the server (one-time)

cd /tmp/desktop-mcp-server
npm install && npm run build

2. Start the agent on the target desktop

Install and run the agent package on the PC you want to control:

cd /tmp/desktop-mcp-server
npm run start:agent

The agent will display a pairing code. Use desktop_connect with the device ID to connect.

Tool Reference

All tools are called via the helper script:

node <skill_dir>/scripts/mcp-call.mjs <tool_name> '<json_args>'

Connection Management

# Connect to a device
node <skill_dir>/scripts/mcp-call.mjs desktop_connect '{"deviceId":"abc-123"}'

# Check connection status
node <skill_dir>/scripts/mcp-call.mjs desktop_status

# Disconnect
node <skill_dir>/scripts/mcp-call.mjs desktop_disconnect

Screen Capture

# Get latest frame (saved to /tmp/desktop-mcp-state/frames/)
node <skill_dir>/scripts/mcp-call.mjs get_frame '{"quality":80,"format":"jpeg"}'

# Get multiple frames
node <skill_dir>/scripts/mcp-call.mjs get_frames '{"count":3,"quality":60}'

# Get screen resolution and cursor position
node <skill_dir>/scripts/mcp-call.mjs get_screen_info

Frame images are saved to /tmp/desktop-mcp-state/frames/ as JPEG/PNG files. Use the image tool to analyze captured frames.

Mouse Control

# Move mouse
node <skill_dir>/scripts/mcp-call.mjs mouse_move '{"x":500,"y":300}'

# Click (left, right, middle)
node <skill_dir>/scripts/mcp-call.mjs mouse_click '{"x":500,"y":300,"button":"left"}'

# Double-click
node <skill_dir>/scripts/mcp-call.mjs mouse_click '{"x":500,"y":300,"button":"left","double":true}'

# Drag from A to B
node <skill_dir>/scripts/mcp-call.mjs mouse_drag '{"fromX":100,"fromY":100,"toX":500,"toY":500}'

# Scroll (positive=up, negative=down)
node <skill_dir>/scripts/mcp-call.mjs mouse_scroll '{"amount":-3}'

Keyboard Control

# Type text
node <skill_dir>/scripts/mcp-call.mjs keyboard_type '{"text":"Hello World"}'

# Press key combination
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","c"]}'

# Common shortcuts
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","a"]}'     # Select all
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["ctrl","v"]}'     # Paste
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["alt","tab"]}'    # Switch window
node <skill_dir>/scripts/mcp-call.mjs keyboard_press '{"keys":["enter"]}'        # Enter

# Hold/release key
node <skill_dir>/scripts/mcp-call.mjs keyboard_hold '{"key":"shift","action":"down"}'
node <skill_dir>/scripts/mcp-call.mjs keyboard_hold '{"key":"shift","action":"up"}'

Clipboard

# Read clipboard
node <skill_dir>/scripts/mcp-call.mjs clipboard_read

# Write to clipboard
node <skill_dir>/scripts/mcp-call.mjs clipboard_write '{"text":"copied text"}'

Shell Execution

# Run command on remote desktop
node <skill_dir>/scripts/mcp-call.mjs shell_exec '{"command":"ls -la","timeout":10}'

# With working directory
node <skill_dir>/scripts/mcp-call.mjs shell_exec '{"command":"git status","workingDirectory":"/home/user/project"}'

Audio

# Text-to-speech on remote desktop
node <skill_dir>/scripts/mcp-call.mjs audio_speak '{"text":"Hello from Jarvis"}'

# Record from microphone (5 seconds)
node <skill_dir>/scripts/mcp-call.mjs audio_listen '{"duration":5}'

File Transfer

node <skill_dir>/scripts/mcp-call.mjs file_transfer '{"path":"/tmp/file.txt","direction":"download"}'

Workflow: Visual Desktop Automation

For visual tasks (click on button, fill form, etc.):

  1. Capture screen: get_frame โ†’ saves image to disk
  2. Analyze: Use image tool to understand what's on screen
  3. Act: Send mouse/keyboard commands based on analysis
  4. Verify: Capture again and confirm the action worked

Example automation loop:

get_frame โ†’ image analysis โ†’ mouse_click โ†’ get_frame โ†’ verify

Environment Variables

Variable Default Description
DESKTOP_MCP_DIR /tmp/desktop-mcp-server Path to MCP server repo
DESKTOP_MCP_STATE /tmp/desktop-mcp-state State/frame storage directory

Frame Storage

Captured frames are saved to /tmp/desktop-mcp-state/frames/ with timestamps. Clean up periodically:

find /tmp/desktop-mcp-state/frames/ -name "*.jpg" -mmin +60 -delete

Troubleshooting

  • "No device connected": Run desktop_connect first with a valid device ID
  • "No frame available": Agent might not be streaming yet, wait and retry
  • Timeout: Check that the agent is running and network is reachable
  • Build errors: Run cd /tmp/desktop-mcp-server && npm run build
Install via CLI
npx skills add https://github.com/jagjerez-org/openclaw-skill-desktop-control --skill desktop-control
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
jagjerez-org
jagjerez-org Explore all skills →