harbor-daytona

star 3.1k

Use Harbor's Daytona sandbox platform for computer use — creating sandboxes, taking screenshots, sending mouse/keyboard input, and building agent loops. Use when the user wants to interact with a GUI, automate a desktop, do computer use, control a browser visually, or run Claude computer use against a Daytona sandbox.

av By av schedule Updated 6/13/2026

name: harbor-daytona description: Use Harbor's Daytona sandbox platform for computer use — creating sandboxes, taking screenshots, sending mouse/keyboard input, and building agent loops. Use when the user wants to interact with a GUI, automate a desktop, do computer use, control a browser visually, or run Claude computer use against a Daytona sandbox. allowed-tools: Bash(curl:), Bash(python3:), Bash(harbor:), Bash(docker:), Bash(file:), Bash(base64:), Read(), Write()

Harbor Daytona: Computer Use

Daytona is a self-hosted sandbox platform running inside Harbor. Each sandbox provides an isolated Linux environment with a full XFCE4 desktop (Xvfb + x11vnc + noVNC) controllable via REST API — screenshot, mouse, keyboard, process management.

Quick Start

# Start the Daytona platform (14 containers)
harbor up daytona

# Verify it's healthy
harbor ps | grep daytona

Dashboard: http://localhost:$(harbor config get daytona.host_port)/dashboard Default credentials: dev@daytona.io / password (via Dex OIDC; login is by email)

Auth

All API calls use the admin API key as a Bearer token:

API_KEY=$(harbor config get daytona.admin_api_key)
AUTH="Authorization: Bearer $API_KEY"
API="http://localhost:$(harbor config get daytona.host_port)"

Default key: harbor-daytona-admin-key

Sandbox Lifecycle

Create a sandbox

curl -s -X POST "$API/api/sandbox" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{
    "snapshot": "daytonaio/sandbox:v0.185.0-amd64",
    "user": "daytona",
    "cpu": 2,
    "memory": 4,
    "disk": 10,
    "autoStopInterval": 30
  }'

The response includes the sandbox id — use it in all subsequent calls. The sandbox starts in "state": "creating" and transitions to "started" (typically 20-30 seconds).

CreateSandbox fields: name, snapshot, user, env (object), labels (object), public (bool), cpu, gpu, memory (GB), disk (GB), autoStopInterval (minutes, 0=disabled), autoArchiveInterval, autoDeleteInterval (-1=disabled), target ("us"), volumes, linkedSandbox.

Poll until started

STATE=""
while [ "$STATE" != "started" ]; do
  STATE=$(curl -s "$API/api/sandbox/$SANDBOX_ID" -H "$AUTH" | python3 -c "import sys,json; print(json.load(sys.stdin)['state'])")
  sleep 2
done

Other lifecycle operations

# List sandboxes
curl -s "$API/api/sandbox" -H "$AUTH"

# Get sandbox details
curl -s "$API/api/sandbox/$SANDBOX_ID" -H "$AUTH"

# Stop / start / delete
curl -s -X POST "$API/api/sandbox/$SANDBOX_ID/stop" -H "$AUTH"
curl -s -X POST "$API/api/sandbox/$SANDBOX_ID/start" -H "$AUTH"
curl -s -X DELETE "$API/api/sandbox/$SANDBOX_ID" -H "$AUTH"

Computer Use API

Base URL: $API/api/toolbox/$SANDBOX_ID/toolbox/computeruse

Start the desktop

Computer use processes must be started before taking screenshots or sending input. The default snapshot boots them automatically but they may be in "partial" state.

# Check status — returns {"status": "active"|"partial"|"inactive"|"error"}
curl -s "$BASE/status" -H "$AUTH"

# Start all desktop processes (xvfb, xfce4, x11vnc, novnc, atspi)
curl -s -X POST "$BASE/start" -H "$AUTH"

# Stop desktop processes
curl -s -X POST "$BASE/stop" -H "$AUTH"

Screenshot

# Full screenshot — returns {"screenshot": "<base64 PNG>", "cursorPosition": {"x","y"}, "sizeBytes": N}
curl -s "$BASE/screenshot" -H "$AUTH"

# Compressed (smaller file, lower quality)
curl -s "$BASE/screenshot/compressed" -H "$AUTH"

# Region screenshot (query params: x, y, width, height)
curl -s "$BASE/screenshot/region?x=0&y=0&width=512&height=384" -H "$AUTH"

# Compressed region
curl -s "$BASE/screenshot/region/compressed?x=0&y=0&width=512&height=384" -H "$AUTH"

Default resolution: 1024x768. Response is JSON with base64-encoded PNG in the screenshot field.

To decode and save:

curl -s "$BASE/screenshot" -H "$AUTH" | python3 -c "
import sys, json, base64
data = json.load(sys.stdin)
with open('/tmp/screenshot.png', 'wb') as f:
    f.write(base64.b64decode(data['screenshot']))
"

Mouse

# Click — body: {x, y, button?: "left"|"right"|"middle", double?: bool}
curl -s -X POST "$BASE/mouse/click" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"x": 500, "y": 400, "button": "left"}'

# Double-click
curl -s -X POST "$BASE/mouse/click" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"x": 500, "y": 400, "double": true}'

# Move — body: {x, y}
curl -s -X POST "$BASE/mouse/move" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"x": 300, "y": 200}'

# Drag — body: {startX, startY, endX, endY, button?: "left"|"right"|"middle"}
curl -s -X POST "$BASE/mouse/drag" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"startX": 100, "startY": 200, "endX": 400, "endY": 300}'

# Scroll — body: {x, y, direction: "up"|"down", amount?: N}
curl -s -X POST "$BASE/mouse/scroll" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"x": 500, "y": 400, "direction": "down", "amount": 3}'

# Get position — returns {x, y}
curl -s "$BASE/mouse/position" -H "$AUTH"

Keyboard

# Type text — body: {text, delay?: ms_between_keystrokes}
curl -s -X POST "$BASE/keyboard/type" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"text": "hello world"}'

# Press a single key — body: {key, modifiers?: ["ctrl","alt","shift","cmd"]}
curl -s -X POST "$BASE/keyboard/key" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"key": "Return"}'

# Key with modifiers
curl -s -X POST "$BASE/keyboard/key" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"key": "c", "modifiers": ["ctrl"]}'

# Hotkey combination — body: {keys: "modifier+key"}
curl -s -X POST "$BASE/keyboard/hotkey" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"keys": "ctrl+l"}'

Key names: Return, Tab, Escape, BackSpace, Delete, space, Up, Down, Left, Right, Home, End, Page_Up, Page_Down, F1-F12, plus single characters.

Display Info

# Display geometry — returns {displays: [{id, x, y, width, height, isActive}]}
curl -s "$BASE/display/info" -H "$AUTH"

# List windows — returns {windows: [{id, title}], count: N}
curl -s "$BASE/display/windows" -H "$AUTH"

Process Management

Desktop processes: xvfb, xfce4, x11vnc, novnc, atspi

# Status of a process — returns {processName, running: bool}
curl -s "$BASE/process/xfce4/status" -H "$AUTH"

# Logs / errors
curl -s "$BASE/process/novnc/logs" -H "$AUTH"
curl -s "$BASE/process/x11vnc/errors" -H "$AUTH"

# Restart a process
curl -s -X POST "$BASE/process/xfce4/restart" -H "$AUTH"

Agent Loop Pattern

A computer use agent loop follows this cycle: screenshot -> send to model -> execute action -> repeat.

API_KEY=$(harbor config get daytona.admin_api_key)
AUTH="Authorization: Bearer $API_KEY"
API="http://localhost:$(harbor config get daytona.host_port)"

# 1. Create sandbox
SANDBOX_ID=$(curl -s -X POST "$API/api/sandbox" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"snapshot":"daytonaio/sandbox:v0.185.0-amd64","user":"daytona","cpu":2,"memory":4,"disk":10,"autoStopInterval":30}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")

# 2. Wait for started
while true; do
  STATE=$(curl -s "$API/api/sandbox/$SANDBOX_ID" -H "$AUTH" | python3 -c "import sys,json; print(json.load(sys.stdin)['state'])")
  [ "$STATE" = "started" ] && break
  sleep 2
done

# 3. Start desktop
BASE="$API/api/toolbox/$SANDBOX_ID/toolbox/computeruse"
curl -s -X POST "$BASE/start" -H "$AUTH"

# 4. Loop: screenshot -> decide -> act
# Take screenshot
curl -s "$BASE/screenshot" -H "$AUTH" | python3 -c "
import sys,json,base64
d=json.load(sys.stdin)
with open('/tmp/screen.png','wb') as f: f.write(base64.b64decode(d['screenshot']))
"

# Send /tmp/screen.png to the model for analysis, get back an action, then:
# - click:    curl -X POST "$BASE/mouse/click" -d '{"x":N,"y":N}'
# - type:     curl -X POST "$BASE/keyboard/type" -d '{"text":"..."}'
# - key:      curl -X POST "$BASE/keyboard/key" -d '{"key":"Return"}'
# - hotkey:   curl -X POST "$BASE/keyboard/hotkey" -d '{"keys":"ctrl+c"}'
# - scroll:   curl -X POST "$BASE/mouse/scroll" -d '{"x":N,"y":N,"direction":"down","amount":3}'

# 5. Cleanup
curl -s -X DELETE "$API/api/sandbox/$SANDBOX_ID" -H "$AUTH"

Command Execution (Non-GUI)

For tasks that don't need the desktop, use the toolbox process API instead:

TBOX="$API/api/toolbox/$SANDBOX_ID/toolbox"

# Execute a command — returns stdout/stderr
curl -s -X POST "$TBOX/process/execute" -H "$AUTH" -H "Content-Type: application/json" \
  -d '{"command": "ls -la /home/daytona"}'

# File operations
curl -s "$TBOX/files?path=/home/daytona" -H "$AUTH"                    # list files
curl -s "$TBOX/files/download?path=/home/daytona/file.txt" -H "$AUTH"  # download
curl -s -X POST "$TBOX/files/upload?path=/home/daytona" -H "$AUTH" \
  -F "file=@local-file.txt"                                            # upload

Troubleshooting

Sandbox stuck in "creating"

The runner considers host disk usage. If above ~80%, sandboxes loop with "No available runners." Free disk space and restart:

docker system prune -f
harbor down daytona && rm -rf services/daytona/data/db && harbor up daytona

Desktop processes not starting

# Check individual process status
curl -s "$BASE/process/xvfb/status" -H "$AUTH"
curl -s "$BASE/process/xvfb/errors" -H "$AUTH"

# Force restart
curl -s -X POST "$BASE/stop" -H "$AUTH"
curl -s -X POST "$BASE/start" -H "$AUTH"

Screenshot returns empty/black

Wait 2-3 seconds after start for the desktop to initialize. Check that xfce4 is running:

curl -s "$BASE/process/xfce4/status" -H "$AUTH"

Ports Reference

Port Service
35000 API + Dashboard
35001 Sandbox preview proxy
35002 Runner API
35003 SSH gateway
Install via CLI
npx skills add https://github.com/av/harbor --skill harbor-daytona
Repository Details
star Stars 3,081
call_split Forks 207
navigation Branch main
article Path SKILL.md
More from Creator