crusty

star 0

Desktop automation — control mouse, keyboard, take screenshots, and automate Chrome browser via CLI commands. Use when the user asks to interact with the desktop, browser, or any GUI application.

turinglabsorg By turinglabsorg schedule Updated 2/7/2026

name: crusty description: Desktop automation — control mouse, keyboard, take screenshots, and automate Chrome browser via CLI commands. Use when the user asks to interact with the desktop, browser, or any GUI application. allowed-tools: Bash(./target/release/crusty:), Bash(osascript:), Bash(system_profiler:), Bash(sleep:), Read(/tmp/**), Read

Crusty — Desktop Automation

You have access to crusty, a desktop automation CLI. The binary is at ./target/release/crusty (relative to the project root).

IMPORTANT: Always use ./target/release/crusty directly — do NOT use shell variables or aliases.

Finding & Clicking Elements

Always use find-text to get coordinates before clicking. Never guess coordinates from screenshots.

./target/release/crusty find-text "Submit"            # Find text on screen via OCR
# Output: Submit	450	312                              # text, center_x, center_y (logical coords)
./target/release/crusty mouse move-to 450 312         # Move to measured coordinates
./target/release/crusty mouse click left              # Click

The first run of find-text is slow (~15s) because Swift compiles the OCR script. Subsequent runs are fast.

Commands

Find Text (OCR)

./target/release/crusty find-text "About This Mac"    # Find text, returns: text\tcenter_x\tcenter_y

Screenshot

./target/release/crusty screenshot --logical -o /tmp/shot.png    # Save to file (1px = 1 mouse coord)
./target/release/crusty screenshot --logical                     # Print base64 PNG to stdout
./target/release/crusty screenshot --logical --grid 50           # Overlay coordinate grid

Mouse

./target/release/crusty mouse move-to 500 300             # Absolute move (logical px)
./target/release/crusty mouse click left                  # Click (left/right/middle)
./target/release/crusty mouse click right
./target/release/crusty mouse double-click left
./target/release/crusty mouse position                    # Print current position
./target/release/crusty mouse scroll -3                   # Scroll down
./target/release/crusty mouse scroll 3                    # Scroll up

Keyboard

./target/release/crusty keyboard type "hello world"       # Type text
./target/release/crusty keyboard combo "meta+c"           # Key combination
./target/release/crusty keyboard combo "ctrl+shift+a"
./target/release/crusty keyboard tap return               # Single key tap
./target/release/crusty keyboard tap tab
./target/release/crusty keyboard tap escape

Browser (Chrome via CDP)

./target/release/crusty browser open                      # Launch Chrome with CDP
./target/release/crusty browser open "https://x.com"      # Launch and navigate
./target/release/crusty browser navigate "https://x.com"  # Navigate active tab
./target/release/crusty browser tabs                      # List open tabs
./target/release/crusty browser eval "document.title"     # Execute JavaScript
./target/release/crusty browser find-text "Post"          # Find element by text
./target/release/crusty browser find-selector ".btn"      # Find element by CSS

Workflow

  1. Screenshot — Take a screenshot to see the current state
  2. Find — Use find-text to measure exact coordinates of the target element
  3. Clickmouse move-to the measured coordinates + mouse click
  4. Verify — Screenshot again to confirm

Example: Click a menu item

./target/release/crusty screenshot --logical -o /tmp/shot.png
# View screenshot to understand the screen state
./target/release/crusty find-text "File"
# Output: File	45	12
./target/release/crusty mouse move-to 45 12
./target/release/crusty mouse click left
# Menu opens — find the item
./target/release/crusty find-text "Save"
# Output: Save	62	145
./target/release/crusty mouse move-to 62 145
./target/release/crusty mouse click left

Tips

  • Always use find-text for coordinates — never guess pixel positions from screenshots
  • Use --logical for screenshots — pixel coordinates = mouse coordinates, no math needed
  • Use browser find-text / browser find-selector for web elements (faster than OCR)
  • Key names: meta (Cmd), ctrl, alt, shift, return, tab, escape, space, backspace, delete, up, down, left, right, f1-f12, or any single character
  • Save screenshots to files (--logical -o /tmp/shot.png) so you can view them with the Read tool
Install via CLI
npx skills add https://github.com/turinglabsorg/crusty --skill crusty
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
turinglabsorg
turinglabsorg Explore all skills →