name: device-operator description: Use when controlling a visible target device UI through screenshots, touch, mouse, or keyboard. metadata: preferred_model: primary allowed_tools: [screenshot, quick_action, touch_gesture, mouse_click, mouse_move, mouse_scroll, keyboard_tap, keyboard_text, shell]
Use this skill when interacting with the connected device screen, app UI, keyboard, touch input, or mouse pointer.
Core Loop
Always operate through a visual feedback loop:
- Observe the current screen with
screenshot. - Decide the smallest next UI action.
- Act with one input tool.
- Inspect the resulting screenshot.
- Continue only after confirming what changed.
Do not perform multiple blind UI actions in a row.
Screenshot Failure Recovery
If screenshot fails, a post-action screenshot fails, or the tool output mentions SERVICE_RECOVERING, socket errors, empty image data, or invalid screenshot JSON, stop UI actions. Do not tap, type, swipe, or guess from stale visual state.
Recovery sequence:
- Retry
screenshotonce if the error suggests transient recovery, such asSERVICE_RECOVERING. - If the second screenshot fails, diagnose the frame service with
shell:/etc/init.d/S52frame_service status frame_service_cli --socket /run/frame_service/frame_service.sock health ls -l /run/frame_service/frame_service.sock - If health reports a bad state or the service is recovering, request a capture-manager restart first:
frame_service_cli --socket /run/frame_service/frame_service.sock restart - Verify recovery with
health, then callscreenshotagain. - If CLI restart fails, the socket is missing, or the service is not running, restart the init service:
/etc/init.d/S52frame_service restart - After service restart, verify in order:
status,health, thenscreenshot.
If recovery still fails, inspect recent logs before asking the user to intervene:
tail -n 80 /var/log/frame_service/frame_service.log
When reporting a blocker, include the screenshot error, which recovery commands were tried, and the latest health or log signal. Do not claim the device UI task is complete without a fresh screenshot confirming the target screen.
Action Choice
- Use
quick_actionfirst when the goal matches a catalog shortcut (back, home, app switch, search, copy/paste, browser ops, etc.). Pass the correctplatform(ios/android/mac). - If
quick_actionis reserved, returnsok=false, or the screen does not change as expected: do not retry the same binding. Tryalternative=trueonce when listed, then fall back to direct input tools and continue. - Use
touch_gesturefor taps, swipes, drag, and mobile-style navigation. - Use
keyboard_textfor ASCII text entry after confirming the input field is focused. Never pass Chinese or emoji; use pinyin/English keywords and select on-screen candidates. - Use
keyboard_tapfor keys such as enter, escape, tab, arrows, or shortcuts not covered by quick_action. - Use
mouse_click,mouse_move, andmouse_scrollonly when touch gestures are not appropriate.
Prefer semantic shortcuts and gestures when available:
- Use
quick_actionfor cataloged keyboard/touch bindings before improvising rawkeyboard_tapor custom swipes. - Use
touch_gesturewithtype: "back"ortype: "home"when quick_action is unavailable or ineffective. - Use scroll or swipe instead of repeatedly tapping uncertain controls.
Coordinate Discipline
Before using coordinates:
- Inspect the screenshot.
- Identify the intended target visually.
- Use
coord_space: "normalized"with 0-1000 coordinates when possible: (0,0) is top-left, (1000,1000) is bottom-right, and (500,500) is center. Do not use 0-1 coordinates. - Avoid edges unless performing an edge gesture.
- Do not guess a coordinate if the target is not visible.
If a tap misses, do not repeat the exact same coordinate blindly.
Failed Attempt Handling
Track failed UI actions during the current task.
A failed attempt is any action where:
- The expected screen change did not happen.
- The same control still appears unchanged.
- Text was not entered.
- Navigation did not move.
- The screen changed to an unexpected state.
- The action output or screenshot indicates an error.
After a failed attempt:
- Observe with
screenshot. - Compare expected vs observed result.
- Do not repeat the exact same action more than once.
- Change one variable at a time: target location, gesture type, coordinate space, navigation path, or input method.
- After 2 failed attempts on the same goal, choose a different strategy.
- After 3 failed attempts total, summarize what was tried and ask the user or switch to diagnosis.
Keep an internal attempt log:
Goal:
Attempt:
Expected:
Observed:
Next adjustment:
Only report the log when the task is blocked or the user asks.
Recovery Strategies
If a tap does not work:
- Observe the screen again before retrying.
- If the app may be slow, allow one extra observation before changing strategy.
- Retry at most once with a slightly adjusted target.
- If it still does not work, use another visible control, back navigation, or a different path.
If a swipe does not work:
- If the target is a partial scrollable region (picker, embedded list, modal, or in-page scroll area), place both start and end points well inside the visible bounds of that region. A gesture that starts on a fixed header, bottom navigation bar, or outside the scrollable area will be consumed by the outer container and the inner control will not move.
- For list scrolling and search results, use calibrated strength:
- Start with
strength: "medium"and take a screenshot immediately after. - Use
image_diffor visual inspection to confirm scrolling occurred and estimate rows moved. - If far from target: use
large. If close: usesmallortiny. - If you overshoot, reverse direction and drop one strength level.
- If the screen did not move at all (same content, no diff), the gesture likely missed the scrollable area — adjust the start point inward, not just the distance.
- Do not repeat the same
distanceorstrengthif it failed once. Change one variable per attempt.
- Start with
- Change the start point away from screen edges, fixed headers, or bottom navigation bars.
- If the content appears to be at an edge, try the opposite direction once.
- If the same list boundary appears again, stop searching that direction.
If the current screen is unrelated to the task:
- Use
touch_gesturewithtype: "back"to return when possible. - If back does not change the screen, look for a visible back, close, cancel, or X control.
- After recovery, observe again before continuing the original task.
Search, Lists, and Choices
Prefer search or filtering controls before long manual browsing when looking for an app, contact, setting, file, item, or page content.
If the target is not found:
- Try one alternate search term when reasonable.
- Check each relevant tab, list, or section once before repeating any of them.
- Do not repeatedly search or scroll the same unchanged list.
- If multiple plausible matches appear, ask the user to choose instead of guessing.
When selecting from a list, verify the selected item matches the user's requested name, label, or visible details before acting on it.
Sensitive Actions
Stop and ask the user before actions involving:
- payment, purchase, order placement, or subscription
- deleting data or changing account settings
- login, verification code, captcha, or identity verification
- privacy permissions, contacts, photos, microphone, camera, or location
- sending messages, emails, posts, or comments on behalf of the user
- starting calls, video calls, or other real-world communication
Do not confirm sensitive dialogs unless the user explicitly asked for that exact action.
Text Entry
Before typing:
- Confirm the text field is focused.
keyboard_textis US-keyboard ASCII only (letters, digits, common punctuation). Do not pass Chinese, emoji, or other non-ASCII characters — the tool errors without typing anything.- For Chinese content, use pinyin or English keywords in
keyboard_text(e.g.weixin,zhangsan), then tap the matching on-screen candidate or search result. - Prefer one
keyboard_textcall for normal ASCII text. - Use
keyboard_tapfor submit or enter only after verifying the text appears. - If text does not appear, stop and re-check focus before typing again.
Navigation
For navigation tasks:
- First identify the current screen.
- Prefer visible buttons and semantic gestures.
- After back, home, or navigation, verify the destination.
- If navigation loops or returns to the same screen twice, stop and reassess.
App Switching
When the task requires switching apps or opening recents:
Step 0 — prefer quick_action when platform is known.
- iOS/Android/macOS: try
quick_actionwithaction=app_switchand the correctplatformfirst. - Android search: try
quick_actionspotlight_search(Meta) before manual UI search. - If quick_action fails or the screen does not change, continue with the probe flow below without asking the user.
Step 1 — recall cached method first.
Call recall_memory with tags ["app-switch", "device"]. If a record exists for the current device, use it directly and skip probing.
Step 2 — if no cache, identify OS from the screenshot:
- iOS/iPadOS: home bar at bottom, no nav buttons
- Android gesture nav: thin gesture bar, no buttons
- Android 3-button nav: visible Back / Home / Recents buttons at bottom
- Unknown: treat as gesture nav and probe
Step 3 — probe for the task switcher. Try in order, stop at first success:
- Bottom-edge swipe and hold:
type: "swipe", start y≈990, end y≈550,hold_after_ms: 500. Take a screenshot — if the task switcher appeared, this is the method. - If 3-button nav is visible: tap the Recents button (bottom-right square icon).
- If still on the same app screen, try home first (
type: "home"), then retry method 1 from the home screen.
After each probe take a screenshot to confirm whether the switcher appeared before trying the next method.
Step 4 — once the switcher is open:
- If the target app card is visible, tap its center.
- If not visible, swipe left/right in the switcher to find it.
- If still not found, dismiss the switcher (
type: "home") and find the app icon on the home screen or via system search (Spotlight swipe-down on iOS, app drawer swipe-up on Android).
Step 5 — save on success.
After successfully opening the switcher, call save_memory with: device name or model (from screenshot or prior context), OS, the method that worked, and the exact gesture parameters used. Tags: ["app-switch", "device"].
If switching fails after all probes:
- Do not loop. Report the blocker to the user and suggest they open the target app manually.
Completion
A device operation is complete only when the screenshot confirms the requested outcome.
Before saying the task is complete:
- Observe the screen one last time.
- Check that the requested target, selection, text, or destination is correct.
- Check for wrong selections, missing selections, duplicate selections, and unfinished dialogs.
- If a failed action was skipped, mention it in the final answer.
If the outcome cannot be verified visually, say what was done and what remains uncertain.