name: desktop-control description: "Unified desktop + browser automation. Routes DOM targets to CDP (cli-jaw browser), desktop apps to Computer Use, hybrid combos to both. Codex desktop/CLI + macOS required for Computer Use." metadata: { "openclaw": { "emoji": "๐ฅ๏ธ", "requires": { "bins": ["cli-jaw"], "system": ["macOS", "Google Chrome"] }, "install": [ { "id": "brew-cliclick", "kind": "brew", "formula": "cliclick", "bins": ["cliclick"], "label": "Install cliclick (optional โ pointer-action fallback)", }, ], },
}
Desktop Control
Unified skill for all UI automation. Chooses between CDP and Computer Use based on the target, and reports meaningful actions with a path= + action_class= transcript.
This skill is already injected into your system prompt. Do not run
sed,cat,head, orReadto load it from disk. Guessing absolute paths like/Users/*/.codex/skills/...or/Users/*/.cli-jaw-*/skills/...wastes a turn and often targets a file that doesn't exist. If you need a specific reference file (e.g.,reference/computer-use.md), usecli-jaw skill read desktop-control <ref-name>.
When to use
Trigger on any request that touches a visible UI:
- User message contains
$computer-useor/computer-useโ skip routing analysis, jump straight toreference/computer-use.md. Explicit user opt-in. If Computer Use tools are not available, stop withprecondition failed: computer-use unavailable. - "open this URL / click this button / type in this field" โ read
reference/cdp.md - "switch Chrome tab / open Finder / click System Settings" โ read
reference/computer-use.md - "click the thing inside this Canvas / WebGL / iframe" โ read
reference/vision-click.md - Not sure which path โ read
reference/intent-routing.mdFIRST - Want a real end-to-end example โ read
reference/control-workflow.md
Absolute rules
- Announce the path before acting. First line of every task must be
path=cdp,path=computer-use, orpath=cdp+cu. - Computer Use always starts each assistant turn with
get_app_state(app)before interacting with that app. Re-call it on stale warnings, after actions that change UI state, and whenever confidence drops. - Every meaningful action records an
action_class. Classes:state-read,element-action,value-injection,keyboard-action,pointer-action,pointer-action+vision,scroll-action,drag-action,secondary-action. - Never fall back silently. If the required path is unavailable, stop and report which precondition failed.
- Never claim the cursor was visible. Cursor overlay is best-effort in the current build.
- When uncertain, take a screenshot FIRST. If you ever find yourself guessing โ "is that tab 342 or 357?", "did the click actually land?", "is this the right page?" โ stop and re-ground via
get_app_state(app)(Computer Use) orcli-jaw browser snapshot(CDP). Never chain actions through uncertainty. Guessing indices or URLs leads to infinite correction loops. If two consecutive actions produced ambiguous state, the next call must be a state-read, not another action.
Preconditions (Computer Use path)
- macOS only.
- Active agent runtime must expose Computer Use tools:
list_apps,get_app_state,click,drag,press_key,scroll,select_text,set_value,type_text, andperform_secondary_action. - Start a session by selecting the app display name, bundle identifier, or full app path. Use
list_appsif the app is unknown. - If packaged through cli-jaw,
/Applications/Jaw.appand/Applications/Codex Computer Use.appmay be required for TCC attribution. Missing app bundles are a setup issue, not a reason to silently switch paths. - TCC Accessibility and AppleEvents must be granted to the controlling app.
Transcript format (standard)
CDP action:
path=cdp
url=https://example.com
action=click e3
result=ok
Computer Use action:
path=computer-use
app=Google Chrome
action_class=element-action
action=click(element_index=730)
stale_warning=no
result=ok
Hybrid (lookup via CDP, action via Computer Use):
path=cdp+cu
lookup=cli-jaw browser snapshot โ bbox of "Play"
action_class=pointer-action
action=click(x=812, y=514)
result=ok
Related skills
browserโ CDP command reference (this skill supersedes its coverage).screen-captureโ generic macOS screenshot / webcam / video recording (unchanged).vision-clickโ no longer auto-active. Absorbed as a tactic inreference/vision-click.md. If you need the low-level recipe (NDJSON parsing, DPR correction), runcli-jaw skill install vision-click.
Common failures and the only correct responses
| Symptom | Correct report |
|---|---|
| "I don't see a cursor" | cursor overlay is best-effort in the current build โ action=click(...) succeeded; visible cursor not guaranteed |
| CDP server not running | precondition failed: cli-jaw serve not running. Start with 'jaw serve' and retry. |
| Computer Use tools missing | precondition failed: computer-use unavailable |
| cli-jaw CU app missing in packaged install | precondition failed: /Applications/Codex Computer Use.app missing. Recover: jaw doctor --tcc --fix |
| Stale warning on action | re-call get_app_state(app) then retry; log stale_warning=yes in the transcript |
| Non-GUI task routed here | needs boss follow-up: not GUI automation |