name: crusty description: Desktop automation — control mouse, keyboard, take screenshots, and automate Chrome browser via CLI commands. Use when the user asks to interact with the desktop, browser, or any GUI application. allowed-tools: Bash(./target/release/crusty:), Bash(osascript:), Bash(system_profiler:), Bash(sleep:), Read(/tmp/**), Read
Crusty — Desktop Automation
You have access to crusty, a desktop automation CLI. The binary is at ./target/release/crusty (relative to the project root).
IMPORTANT: Always use ./target/release/crusty directly — do NOT use shell variables or aliases.
Finding & Clicking Elements
Always use find-text to get coordinates before clicking. Never guess coordinates from screenshots.
./target/release/crusty find-text "Submit" # Find text on screen via OCR
# Output: Submit 450 312 # text, center_x, center_y (logical coords)
./target/release/crusty mouse move-to 450 312 # Move to measured coordinates
./target/release/crusty mouse click left # Click
The first run of find-text is slow (~15s) because Swift compiles the OCR script. Subsequent runs are fast.
Commands
Find Text (OCR)
./target/release/crusty find-text "About This Mac" # Find text, returns: text\tcenter_x\tcenter_y
Screenshot
./target/release/crusty screenshot --logical -o /tmp/shot.png # Save to file (1px = 1 mouse coord)
./target/release/crusty screenshot --logical # Print base64 PNG to stdout
./target/release/crusty screenshot --logical --grid 50 # Overlay coordinate grid
Mouse
./target/release/crusty mouse move-to 500 300 # Absolute move (logical px)
./target/release/crusty mouse click left # Click (left/right/middle)
./target/release/crusty mouse click right
./target/release/crusty mouse double-click left
./target/release/crusty mouse position # Print current position
./target/release/crusty mouse scroll -3 # Scroll down
./target/release/crusty mouse scroll 3 # Scroll up
Keyboard
./target/release/crusty keyboard type "hello world" # Type text
./target/release/crusty keyboard combo "meta+c" # Key combination
./target/release/crusty keyboard combo "ctrl+shift+a"
./target/release/crusty keyboard tap return # Single key tap
./target/release/crusty keyboard tap tab
./target/release/crusty keyboard tap escape
Browser (Chrome via CDP)
./target/release/crusty browser open # Launch Chrome with CDP
./target/release/crusty browser open "https://x.com" # Launch and navigate
./target/release/crusty browser navigate "https://x.com" # Navigate active tab
./target/release/crusty browser tabs # List open tabs
./target/release/crusty browser eval "document.title" # Execute JavaScript
./target/release/crusty browser find-text "Post" # Find element by text
./target/release/crusty browser find-selector ".btn" # Find element by CSS
Workflow
- Screenshot — Take a screenshot to see the current state
- Find — Use
find-textto measure exact coordinates of the target element - Click —
mouse move-tothe measured coordinates +mouse click - Verify — Screenshot again to confirm
Example: Click a menu item
./target/release/crusty screenshot --logical -o /tmp/shot.png
# View screenshot to understand the screen state
./target/release/crusty find-text "File"
# Output: File 45 12
./target/release/crusty mouse move-to 45 12
./target/release/crusty mouse click left
# Menu opens — find the item
./target/release/crusty find-text "Save"
# Output: Save 62 145
./target/release/crusty mouse move-to 62 145
./target/release/crusty mouse click left
Tips
- Always use
find-textfor coordinates — never guess pixel positions from screenshots - Use
--logicalfor screenshots — pixel coordinates = mouse coordinates, no math needed - Use
browser find-text/browser find-selectorfor web elements (faster than OCR) - Key names:
meta(Cmd),ctrl,alt,shift,return,tab,escape,space,backspace,delete,up,down,left,right,f1-f12, or any single character - Save screenshots to files (
--logical -o /tmp/shot.png) so you can view them with the Read tool