name: phoneuse description: | Control an Android phone remotely via OpenClaw PhoneUse node. Use when: - User asks to interact with their phone (open app, tap, type, swipe, screenshot) - User wants to automate a phone task (order food, fill forms, send messages) - User asks to see what's on their phone screen - User needs files from their phone or wants to transfer files - User wants to unlock/lock their phone NOT for: tasks that don't involve an Android device, or when no PhoneUse node is connected.
PhoneUse: Android Remote Control
Control a connected Android phone via the nodes tool with action: invoke.
Context Cost Management
Screenshots and recordings are expensive in context. Follow these rules:
Rule 1: Use a sub-agent for PhoneUse tasks
Spawn an isolated sub-agent with sessions_spawn(mode: "run") for phone operations. This keeps screenshots/UI trees out of the main conversation context.
Main agent receives: "帮我打开B站看看热门视频"
→ sessions_spawn({task: "Use phoneUse on node <nodeId> to open Bilibili and report the top 3 trending videos", mode: "run"})
→ Sub-agent executes all phoneUse commands internally (screenshots stay in sub-agent context)
→ Sub-agent returns text result: "已打开B站,热门Top3: 1. xxx 2. xxx 3. xxx"
→ Sub-agent exits immediately after completing the task
The sub-agent MUST exit after completing its task — do not keep it running.
Rule 2: Minimize screenshots
- Prefer
getUITree(5KB text) over100KB image) when you need to find elementsscreenshot( - Only screenshot to verify visual state or when UI tree alone is insufficient
- Use lower quality when possible:
{maxWidth: 480, quality: 30}→ ~20KB - Never reference raw base64 in your responses — describe what you see in words
Rule 3: Use nodes tool actions for media (not invoke)
For screenshots and recordings, use nodes tool actions (camera_snap, screen_record) instead of invoke. The Gateway automatically saves media to server-side files, so only a file path enters your context — not the raw image/video data.
| Method | Context cost | Use when |
|---|---|---|
nodes camera_snap (action) |
✅ File path only | Camera photo (front/back) |
nodes screen_record (action) |
✅ File path only | Default for recordings |
nodes invoke phoneUse.screenshot |
⚠️ base64 in payload | Custom params needed |
Before You Start
- Check node is connected:
nodes describe— look for a node withphoneUsein caps - Note the
nodeId— you'll need it for every invoke call - Ensure Gateway allows phoneUse commands. Run once:
openclaw config set gateway.nodes.allowCommands '["phoneUse.tap","phoneUse.doubleTap","phoneUse.longTap","phoneUse.swipe","phoneUse.pinch","phoneUse.setText","phoneUse.typeText","phoneUse.findAndClick","phoneUse.screenshot","phoneUse.getUITree","phoneUse.getScreenInfo","phoneUse.launch","phoneUse.back","phoneUse.home","phoneUse.recents","phoneUse.openNotifications","phoneUse.openQuickSettings","phoneUse.scrollDown","phoneUse.scrollUp","phoneUse.scrollLeft","phoneUse.scrollRight","phoneUse.waitForElement","phoneUse.inputKey","phoneUse.requestScreenCapture","phoneUse.unlock","phoneUse.lockScreen","phoneUse.wakeScreen","phoneUse.isScreenOn","phoneUse.listApps","phoneUse.getForegroundApp","phoneUse.openUrl","phoneUse.startActivity","phoneUse.clipboard","phoneUse.getDeviceStatus","phoneUse.openAllApps","phoneUse.queryIntents","camera.snap","camera.clip","screen.record","system.run","system.notify","file.read","file.write","file.info","file.list","file.delete"]'
This only needs to be done once per Gateway.
Core Workflow: Sense → Think → Act → Verify
Every phone interaction follows this loop:
1. getUITree → understand current screen (package, elements, coordinates)
2. Think → decide action based on UI tree
3. Act → tap/swipe/type based on UI tree data
4. getUITree → verify the action worked (check package changed, element appeared)
UI Tree first, always. getUITree is cheap (~5KB text), tells you the current app (package), all elements, and their coordinates. Only use screenshot when you need to see visual content that isn't in the UI tree (images, videos, layout verification).
Never screenshot as the first step. Screenshot is expensive (~100KB), may be black on lock screen, and UI tree gives you more actionable data (exact coordinates, element IDs, text content).
Invoking Commands
All commands use the nodes tool:
{"action": "invoke", "node": "<nodeId>", "invokeCommand": "<command>", "invokeParamsJson": "<json>"}
Essential Commands
See the screen
Use the nodes tool's camera_snap action (not invoke):
{"action": "camera_snap", "node": "<nodeId>"}
Gateway automatically saves the image to a server-side file and returns a local path. The raw image data does NOT enter your context — you get a MEDIA: path reference.
For custom resolution/quality, use invoke:
{"action": "invoke", "node": "<nodeId>", "invokeCommand": "phoneUse.screenshot", "invokeParamsJson": "{\"maxWidth\": 480, \"quality\": 30}"}
Get interactive elements
{"invokeCommand": "phoneUse.getUITree", "invokeParamsJson": "{\"interactiveOnly\": true}"}
Returns JSON with all clickable/editable elements, their text, bounds, and center coordinates. Use this to find tap targets precisely instead of guessing coordinates.
Tap
{"invokeCommand": "phoneUse.tap", "invokeParamsJson": "{\"x\": 540, \"y\": 1200}"}
Type text
{"invokeCommand": "phoneUse.setText", "invokeParamsJson": "{\"text\": \"Hello world\"}"}
Sets text in the currently focused input field. Tap the field first if needed.
Find and click by text
{"invokeCommand": "phoneUse.findAndClick", "invokeParamsJson": "{\"text\": \"Submit\"}"}
Finds a visible element containing the text and clicks it. Preferred over coordinate tapping when text is known.
Swipe / scroll
{"invokeCommand": "phoneUse.swipe", "invokeParamsJson": "{\"x1\":540,\"y1\":1500,\"x2\":540,\"y2\":500,\"duration\":300}"}
Or use shortcuts: phoneUse.scrollDown, phoneUse.scrollUp.
Launch app
{"invokeCommand": "phoneUse.launch", "invokeParamsJson": "{\"app\": \"Settings\"}"}
Launches by display name or package name.
Navigate
phoneUse.back— back buttonphoneUse.home— home button
Screen Management & Unlock
Step-by-step unlock flow (follow this exactly):
1. getUITree {}
→ Check the "package" field in response:
- "com.android.systemui" = lock screen
- anything else = already unlocked, skip to your task
2. If locked and you have a PIN:
phoneUse.unlock {pin: "1234"}
→ This command handles everything: wake screen, swipe up, enter PIN
3. Verify unlock:
getUITree {}
→ package should NOT be "com.android.systemui"
→ If still locked, try unlock one more time
4. If unlock fails twice: report failure, do NOT keep retrying
Important:
- Never screenshot on lock screen — Android security blocks it, you'll get a black image
- Use getUITree to check lock state — look at
packagefield phoneUse.unlockalready callswakeScreeninternally — don't call it separately- Omit
pinfor swipe/no-lock screens
File Operations
Read, write, and browse files on the phone. Paths are typically under /sdcard/.
{"invokeCommand": "file.list", "invokeParamsJson": "{\"path\": \"/sdcard/DCIM\"}"}
{"invokeCommand": "file.read", "invokeParamsJson": "{\"path\": \"/sdcard/DCIM/photo.jpg\", \"offset\": 0, \"size\": 2097152}"}
{"invokeCommand": "file.write", "invokeParamsJson": "{\"path\": \"/sdcard/Download/note.txt\", \"base64\": \"SGVsbG8=\"}"}
For large files, use chunked transfer. See references/file-transfer.md.
Navigation Rules (MUST follow)
Always use package name for launch — never display names like "哔哩哔哩"
- ✅
phoneUse.launch {package: "tv.danmaku.bili"} - ❌
phoneUse.launch {app: "哔哩哔哩"} - If package unknown →
phoneUse.listApps {}first
- ✅
Don't press Home as recovery — use
backto stay in context- Already in B站? Stay there, navigate from within
- Pressing Home loses all app state
Verify after every critical action —
getUITreeto confirm- Check
packagefield to confirm which app you're in - Don't assume an action worked
- Check
Use deep links when available —
openUrl("bilibili://search?keyword=xxx")beats manual UI navigationCheck context before acting —
getUITreefirst to see where you are- Text "搜索" exists on home screen AND in apps — wrong one = wrong app
- Always confirm package before clicking
Common Patterns
Pattern: Open app and navigate
1. getUITree {} — where am I?
2. phoneUse.launch {package: "tv.danmaku.bili"} — always use package name
3. waitForElement {text: "首页", timeout: 5000} — wait for app to load
4. getUITree {interactiveOnly: true} — find navigation targets
5. findAndClick {text: "搜索"} — navigate
6. getUITree {} — verify (check package!)
Pattern: Fill a form
1. getUITree {interactiveOnly: true} — find all input fields + coordinates
2. phoneUse.tap {x, y} — tap first field
3. phoneUse.setText {text: "value"} — enter text
4. phoneUse.tap {x, y} — next field
5. phoneUse.setText {text: "value"}
6. findAndClick {text: "Submit"}
7. getUITree {} — verify submission result
Pattern: Scroll to find content
1. getUITree {} — check visible elements
2. If target not found: scrollDown {}
3. getUITree {} — check again
4. Repeat until found or 5 scrolls max
Pattern: Unlock → Open app → Do task
1. getUITree {} — check if locked (package = com.android.systemui?)
2. phoneUse.unlock {pin: "1234"} — only if locked
3. getUITree {} — verify unlocked
4. phoneUse.launch {package: "..."} — open target app
5. getUITree {} — verify app loaded
6. ... continue with task
App & System Info
List installed apps
{"invokeCommand": "phoneUse.listApps", "invokeParamsJson": "{}"}
Returns app names and package names. Use package name with phoneUse.launch.
Get current foreground app
{"invokeCommand": "phoneUse.getForegroundApp", "invokeParamsJson": "{}"}
Device status (battery, WiFi, storage)
{"invokeCommand": "phoneUse.getDeviceStatus", "invokeParamsJson": "{}"}
Clipboard
{"invokeCommand": "phoneUse.clipboard", "invokeParamsJson": "{}"}
{"invokeCommand": "phoneUse.clipboard", "invokeParamsJson": "{\"set\": \"copied text\"}"}
Open URL or deep link
{"invokeCommand": "phoneUse.openUrl", "invokeParamsJson": "{\"url\": \"https://bilibili.com\"}"}
Open system settings or send Intent
{"invokeCommand": "phoneUse.startActivity", "invokeParamsJson": "{\"action\": \"android.settings.WIFI_SETTINGS\"}"}
For Intent patterns and deep links, see references/intents.md.
Discover app capabilities
{"invokeCommand": "phoneUse.queryIntents", "invokeParamsJson": "{\"package\": \"tv.danmaku.bili\"}"}
Tips
- getUITree is your eyes — use it before and after every action. It's cheap (~5KB).
- Screenshot only for visual content — photos, videos, layouts. Never as the default "see" step.
- Never screenshot on lock screen — it returns black. Use getUITree to check lock state.
- Prefer
findAndClickover coordinate tapping when button text is known. - Wait for loading: Use
phoneUse.waitForElementafter launching apps. getForegroundAppis the fastest way to check which app is active (1 field vs full UI tree).- Don't retry endlessly — if an action fails twice, report failure and ask the user.
system.runcannot launch Activities — no shell permissions. UsephoneUse.launchorphoneUse.startActivity.
Best Practice Cases
Case 1: Search for a video on Bilibili
Task: "在B站搜索凡人修仙传"
Step 1: getUITree {}
→ package: com.android.systemui → locked!
Step 2: phoneUse.unlock {pin: "654312"}
→ "Unlocked with PIN"
Step 3: getUITree {}
→ package: com.miui.home → unlocked, on home screen ✅
Step 4: phoneUse.openUrl {url: "bilibili://search?keyword=凡人修仙传"}
→ Deep link directly to search results (fastest path!)
Step 5: getUITree {}
→ package: tv.danmaku.bili, search results visible ✅
→ Done!
Why this is optimal:
- Deep link skips: launch app → wait → find search → tap → type → submit
- 5 steps instead of 10+
- If
openUrlis not available, fallback plan:
Step 4-alt: phoneUse.launch {package: "tv.danmaku.bili"}
Step 5-alt: waitForElement {text: "搜索", timeout: 5000}
Step 6-alt: getUITree {interactiveOnly: true}
→ Find search entry field, note its coordinates
→ VERIFY package is still tv.danmaku.bili before clicking!
Step 7-alt: phoneUse.tap {x, y} on the search entry
Step 8-alt: phoneUse.setText {text: "凡人修仙传"}
Step 9-alt: phoneUse.inputKey {key: "enter"}
Step 10-alt: getUITree {} → verify results
Case 2: Send a WeChat message
Task: "给张三发一条微信:明天下午3点开会"
Step 1: getUITree {}
→ Determine current state
Step 2: phoneUse.launch {package: "com.tencent.mm"}
Step 3: waitForElement {text: "微信", timeout: 5000}
Step 4: getUITree {interactiveOnly: true}
→ Find search icon/entry at top
Step 5: phoneUse.tap on search icon
Step 6: phoneUse.setText {text: "张三"}
Step 7: waitForElement {text: "张三", timeout: 3000}
Step 8: getUITree {interactiveOnly: true}
→ Find the contact entry, tap it
Step 9: getUITree {interactiveOnly: true}
→ Find message input field at bottom
Step 10: phoneUse.tap on input field
Step 11: phoneUse.setText {text: "明天下午3点开会"}
Step 12: findAndClick {text: "发送"}
Step 13: getUITree {}
→ Verify message appears in chat ✅
Case 3: Take and share a photo
Task: "拍一张照片发给我"
Step 1: camera.snap {facing: "back"}
→ Returns photo saved to server-side file
Step 2: Done! Photo is already on the server, send to user directly.
→ No need to open camera app on phone
Note: camera.snap uses the physical camera (Camera2 API), NOT a screenshot. The photo is transferred to the server automatically. For selfies use facing: "front".
Case 4: Check and report phone status
Task: "看看手机电量和存储"
Step 1: phoneUse.getDeviceStatus {}
→ Returns: battery level, WiFi status, storage info, screen state
Step 2: Report to user.
1 command, no UI interaction needed.
Case 5: Install and open a new app
Task: "帮我打开小红书"
Step 1: phoneUse.listApps {}
→ Search results for "小红书" or "com.xingin.xhs"
→ If found: note package name
Step 2: phoneUse.launch {package: "com.xingin.xhs"}
Step 3: getUITree {}
→ Verify app loaded
If NOT found in listApps:
→ Tell user: "小红书未安装,需要先从应用商店下载"
→ phoneUse.openUrl {url: "market://details?id=com.xingin.xhs"} to open store
Case 6: Navigate system settings
Task: "帮我打开WiFi设置"
Step 1: phoneUse.startActivity {action: "android.settings.WIFI_SETTINGS"}
→ Directly opens WiFi settings page
Step 2: getUITree {}
→ Verify we're in settings ✅
Don't manually navigate Settings → WiFi. Use Intent actions for direct access.
Case 7: Copy text from screen
Task: "帮我复制屏幕上的验证码"
Step 1: getUITree {}
→ Find the text element containing the verification code
Step 2: If code is visible in UI tree text:
→ Extract it directly, done!
Step 3: If code is in an image (not in UI tree):
→ phoneUse.screenshot {}
→ Analyze image to read the code
→ phoneUse.clipboard {set: "123456"}
Case 8: Multi-step form with scrolling
Task: "帮我填写注册表单"
Step 1: getUITree {interactiveOnly: true}
→ Map all visible input fields
Step 2: For each field:
→ tap field → setText → verify
Step 3: If more fields below:
→ scrollDown {}
→ getUITree {interactiveOnly: true}
→ Continue filling
Step 4: findAndClick {text: "提交"} or {text: "注册"}
Step 5: getUITree {}
→ Verify success page / error message
Key rules for forms:
- Fill fields top-to-bottom
- After scrolling, re-read UI tree (coordinates change!)
- Check for error messages after submit
Error Recovery
When things go wrong, follow this decision tree:
Action failed?
├── Check getUITree → Did a dialog/popup appear?
│ └── Yes → Dismiss it (back, or tap close button)
├── Wrong app? (package mismatch)
│ └── phoneUse.back {} (1-3 times to return)
├── Screen off?
│ └── phoneUse.wakeScreen {} → getUITree {}
├── Locked?
│ └── phoneUse.unlock {pin}
├── App crashed?
│ └── phoneUse.launch {package: "..."} (relaunch)
└── Failed twice?
└── STOP. Report to user. Do not loop.
Never retry more than 2 times. Infinite retry loops waste tokens and time.