desktop-control

star 4

Unified desktop + browser automation. Routes DOM targets to CDP (cli-jaw browser), desktop apps to Computer Use, hybrid combos to both. Codex desktop/CLI + macOS required for Computer Use.

lidge-jun By lidge-jun schedule Updated 5/30/2026

name: desktop-control description: "Unified desktop + browser automation. Routes DOM targets to CDP (cli-jaw browser), desktop apps to Computer Use, hybrid combos to both. Codex desktop/CLI + macOS required for Computer Use." metadata: { "openclaw": { "emoji": "๐Ÿ–ฅ๏ธ", "requires": { "bins": ["cli-jaw"], "system": ["macOS", "Google Chrome"] }, "install": [ { "id": "brew-cliclick", "kind": "brew", "formula": "cliclick", "bins": ["cliclick"], "label": "Install cliclick (optional โ€” pointer-action fallback)", }, ], },

}

Desktop Control

Unified skill for all UI automation. Chooses between CDP and Computer Use based on the target, and reports meaningful actions with a path= + action_class= transcript.

This skill is already injected into your system prompt. Do not run sed, cat, head, or Read to load it from disk. Guessing absolute paths like /Users/*/.codex/skills/... or /Users/*/.cli-jaw-*/skills/... wastes a turn and often targets a file that doesn't exist. If you need a specific reference file (e.g., reference/computer-use.md), use cli-jaw skill read desktop-control <ref-name>.

When to use

Trigger on any request that touches a visible UI:

Absolute rules

  1. Announce the path before acting. First line of every task must be path=cdp, path=computer-use, or path=cdp+cu.
  2. Computer Use always starts each assistant turn with get_app_state(app) before interacting with that app. Re-call it on stale warnings, after actions that change UI state, and whenever confidence drops.
  3. Every meaningful action records an action_class. Classes: state-read, element-action, value-injection, keyboard-action, pointer-action, pointer-action+vision, scroll-action, drag-action, secondary-action.
  4. Never fall back silently. If the required path is unavailable, stop and report which precondition failed.
  5. Never claim the cursor was visible. Cursor overlay is best-effort in the current build.
  6. When uncertain, take a screenshot FIRST. If you ever find yourself guessing โ€” "is that tab 342 or 357?", "did the click actually land?", "is this the right page?" โ€” stop and re-ground via get_app_state(app) (Computer Use) or cli-jaw browser snapshot (CDP). Never chain actions through uncertainty. Guessing indices or URLs leads to infinite correction loops. If two consecutive actions produced ambiguous state, the next call must be a state-read, not another action.

Preconditions (Computer Use path)

  • macOS only.
  • Active agent runtime must expose Computer Use tools: list_apps, get_app_state, click, drag, press_key, scroll, select_text, set_value, type_text, and perform_secondary_action.
  • Start a session by selecting the app display name, bundle identifier, or full app path. Use list_apps if the app is unknown.
  • If packaged through cli-jaw, /Applications/Jaw.app and /Applications/Codex Computer Use.app may be required for TCC attribution. Missing app bundles are a setup issue, not a reason to silently switch paths.
  • TCC Accessibility and AppleEvents must be granted to the controlling app.

Transcript format (standard)

CDP action:

path=cdp
url=https://example.com
action=click e3
result=ok

Computer Use action:

path=computer-use
app=Google Chrome
action_class=element-action
action=click(element_index=730)
stale_warning=no
result=ok

Hybrid (lookup via CDP, action via Computer Use):

path=cdp+cu
lookup=cli-jaw browser snapshot โ†’ bbox of "Play"
action_class=pointer-action
action=click(x=812, y=514)
result=ok

Related skills

  • browser โ€” CDP command reference (this skill supersedes its coverage).
  • screen-capture โ€” generic macOS screenshot / webcam / video recording (unchanged).
  • vision-click โ€” no longer auto-active. Absorbed as a tactic in reference/vision-click.md. If you need the low-level recipe (NDJSON parsing, DPR correction), run cli-jaw skill install vision-click.

Common failures and the only correct responses

Symptom Correct report
"I don't see a cursor" cursor overlay is best-effort in the current build โ€” action=click(...) succeeded; visible cursor not guaranteed
CDP server not running precondition failed: cli-jaw serve not running. Start with 'jaw serve' and retry.
Computer Use tools missing precondition failed: computer-use unavailable
cli-jaw CU app missing in packaged install precondition failed: /Applications/Codex Computer Use.app missing. Recover: jaw doctor --tcc --fix
Stale warning on action re-call get_app_state(app) then retry; log stale_warning=yes in the transcript
Non-GUI task routed here needs boss follow-up: not GUI automation
Install via CLI
npx skills add https://github.com/lidge-jun/cli-jaw-skills --skill desktop-control
Repository Details
star Stars 4
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator