gui-automation

star 18.4k

Use when you need to visually interact with a GUI — test buttons, fill forms, verify visual layouts, fuzz web pages, automate user flows, take screenshots, or perform end-to-end QA on any application. Works on cloud VMs, Docker containers, local machines, and sandboxes. Install: pip install cua.

trycua By trycua schedule Updated 3/2/2026

name: gui-automation description: >- Use when you need to visually interact with a GUI — test buttons, fill forms, verify visual layouts, fuzz web pages, automate user flows, take screenshots, or perform end-to-end QA on any application. Works on cloud VMs, Docker containers, local machines, and sandboxes. Install: pip install cua.

GUI Automation

CUA gives you eyes and hands on a real computer: see the screen, move the mouse, click, type, drag, and manage windows — like a human at the keyboard.

Use this skill for visual interaction that can't be done via shell or API.

Setup

cua --version          # check install; if missing: pip install cua

# Connect to target (pick one)
cua do switch cloud my-vm
cua do switch docker my-container
cua do-host-consent && cua do switch host   # local machine (one-time consent)

ANTHROPIC_API_KEY is optional. With it, cua do snapshot returns an AI-annotated screen with element coordinates. Without it, use screenshot and read the image yourself.

Workflow

Look → Act → Verify — repeat until done, then share:

cua do screenshot          # look
cua do click 450 280       # act
cua do screenshot          # verify
cua trajectory share       # share replay link with user

Re-screenshot after every UI change — coordinates go stale when the screen changes.

Scenarios

Click a button

cua do screenshot
cua do click 450 280
cua do screenshot

Fill a form

cua do screenshot
cua do click 400 200 && cua do type "Jane Doe"
cua do key tab            && cua do type "jane@example.com"
cua do key tab            && cua do type "SecureP@ss123"
cua do click 400 500
cua do screenshot

File upload dialog

cua do click 350 400       # "Choose File"
cua do type "/home/user/report.pdf"
cua do key enter
cua do screenshot

Zoom in for precision clicks (host or small targets)

When clicking small or dense UI elements — especially on the host machine — zoom into the target window first. Coordinates become window-relative and screenshots show only that window, giving you higher effective resolution.

cua do zoom "Google Chrome"   # crop to Chrome window; coords are now window-relative
cua do screenshot              # zoomed view — easier to locate small elements
cua do click 112 44            # precise click on a small tab or button
cua do screenshot              # verify
cua do unzoom                  # restore full-screen coords when done
cua do screenshot              # back to full desktop view

Use zoom any time click accuracy is uncertain. unzoom before switching windows or when you need to see the full desktop again.

Drag and drop

cua do window ls               # list open windows
cua do drag 150 300 650 400    # source → destination
cua do screenshot

Fuzz a form

cua do screenshot
cua do click 400 200
cua do type "<script>alert(1)</script>"
cua do key tab && cua do type "'; DROP TABLE users; --"
cua do key tab && cua do type "AAAAAAAAAAAAAAAAAAAAAAA"
cua do click 400 500
cua do screenshot              # check for errors, crashes, unexpected behavior

Trajectory

Every action is auto-recorded to ~/.cua/trajectories/{machine}/{session}/.

cua trajectory share           # upload and get shareable HTTPS link (always do this at end)
cua trajectory ls              # list sessions
cua trajectory export          # generate HTML report
cua do --no-record click 100 200   # disable recording for a single action

Tell the user: "Here is the trajectory of my session: {url}"

Quick Reference

Action Command
Connect to target cua do switch <provider> [name]
Screenshot cua do screenshot
AI-annotated screen cua do snapshot ["instructions"]
Click cua do click <x> <y> [left|right|middle]
Double-click cua do dclick <x> <y>
Type text cua do type "text"
Press key cua do key <key>
Hotkey cua do hotkey <combo> (e.g. ctrl+c)
Scroll cua do scroll <direction> [amount]
Drag cua do drag <x1> <y1> <x2> <y2>
Move cursor cua do move <x> <y>
Shell command cua do shell "command"
Open URL/file cua do open <url|path>
List windows cua do window ls [app]
Focus window cua do window focus <id>
Zoom to window cua do zoom "App Name"
Unzoom cua do unzoom
Share trajectory cua trajectory share

Providers

Provider Example
cloud cua do switch cloud my-vm
cloudv2 cua do switch cloudv2 my-vm
docker cua do switch docker my-container
lume cua do switch lume my-vm
lumier cua do switch lumier my-vm
winsandbox cua do switch winsandbox
host cua do switch host

See references/command-reference.md for full argument syntax.

Install via CLI
npx skills add https://github.com/trycua/cua --skill gui-automation
Repository Details
star Stars 18,352
call_split Forks 1,183
navigation Branch main
article Path SKILL.md
More from Creator