name: phoneagent description: Control a connected iPhone, iOS simulator, Android emulator, or Android device from macOS through PhoneAgent's JSON-RPC bridge. Use when users ask to automate mobile UI actions, inspect accessibility trees, toggle Settings switches, navigate apps, or capture screenshots by sending RPC methods like get_tree, get_screen_image, get_context, tap_element, enter_text, scroll, swipe, and open_app.
PhoneAgent
Use this workflow to drive iOS or Android UI through PhoneAgent's JSON-RPC bridge.
All shell commands below assume you are in the repo root:
cd "$(git rev-parse --show-toplevel)"
Start the RPC bridge
- Choose a platform bridge (both listen on
127.0.0.1:45678by default).
# iOS (XCTest-hosted bridge)
./.agents/skills/phoneagent/scripts/start_rpc_bridge_local.sh
# Android (adb bridge; emulator or physical device)
./.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh
Notes:
start_rpc_bridge_local.shis interactive and will show a numbered list of iOS devices/simulators. Enter the number for the destination you want.start_rpc_bridge_local.shstarts a localhost-only forwarder.- On Xcode "Connect via network", it uses the CoreDevice tunnel automatically (no extra deps).
- For USB fallback forwarding, install
pymobiledevice3into a local venv:python3 -m venv .venv && ./.venv/bin/python -m pip install -U pip && ./.venv/bin/python -m pip install pymobiledevice3 start_android_rpc_bridge_local.shusesadb; if multiple devices are connected it prompts for the serial.
- Keep the bridge process running.
- Wait for
PHONEAGENT_RPC_READY ...in logs before sending RPC calls. - Confirm socket readiness before first RPC:
./.agents/skills/phoneagent/scripts/rpc.py get-tree >/dev/null && echo rpc-ready
Resolve host and port
- Always use
127.0.0.1:45678as the RPC endpoint (orrpc.py --port <port>if customized).
Notes:
- Both bridges are localhost-only.
- iOS physical-device flow uses a localhost forwarder.
- If you need to forward manually, first get a device UDID via
xcrun devicectl list devices, then run:python3 ./.agents/skills/phoneagent/scripts/forward_rpc_localhost.py --udid <UDID>(binds127.0.0.1:45678)
Send RPC calls
Use the helper CLI:
# iOS bundle identifier
./.agents/skills/phoneagent/scripts/rpc.py open-app com.apple.Preferences
# Android package name
./.agents/skills/phoneagent/scripts/rpc.py open-app com.android.settings
./.agents/skills/phoneagent/scripts/rpc.py get-tree | head
# Use coordinates copied from the tree (XCUI frame string).
./.agents/skills/phoneagent/scripts/rpc.py enter-text \
--coordinate '{{33.0, 861.0}, {364.0, 38.0}}' \
--text 'Display'
./.agents/skills/phoneagent/scripts/rpc.py tap-element \
--coordinate '{{37.7, 969.7}, {199.7, 29.0}}'
Core operating loop
- Call
get_tree. - Identify the best target element in the tree (label/identifier) and copy its frame coordinate string.
- Prefer coordinate-based actions (
tap_element/enter_text). - Use the returned
treefrom the action response to verify the UI changed as expected. - Repeat until complete.
- When the task is complete, always capture a screenshot for the user:
- Prefer
get_contextand writeresult.screenshot_base64to a PNG (or use./.agents/skills/phoneagent/scripts/rpc.py get-screen-image, which writes PNG files to/tmp/phoneagent-artifacts). - Include the PNG path in your final message so the user can open it.
- Prefer
Use swipe to reveal off-screen content, then use the returned tree (or call get_tree if needed).
Use one request at a time per server. Do not fire concurrent batches.
Split long keyboard input into chunks; do not send giant enter_text payloads in one call.
RPC method reference
All RPC requests are newline-delimited JSON objects with this shape:
{"id":1,"method":"<method>","params":{...}}
All success responses look like:
{"id":1,"result":{...}}
get_tree
- Does: Returns the accessibility tree of the currently focused app.
- Params: none.
- Returns:
{"tree": "<string>"}
Example:
{"id":1,"method":"get_tree","params":{}}
get_screen_image
- Does: Captures the current screen as a base64-encoded PNG plus image dimensions (when available).
- Params: none.
- Returns:
{"screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}
Example:
{"id":2,"method":"get_screen_image","params":{}}
get_context
- Does: Convenience method that returns both the current accessibility tree and the current screen image.
- Params: none.
- Returns:
{"tree":"<string>","screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}
Example:
{"id":3,"method":"get_context","params":{}}
open_app
- Does: Brings the specified app to the foreground (and makes it the focused app for subsequent calls).
- Params:
bundle_identifier(string, required).- iOS: pass bundle identifier (example
com.apple.Preferences). - Android: pass package name (example
com.android.settings).
- iOS: pass bundle identifier (example
- Returns:
{"bundle_identifier":"<string>", "tree":"<string>"}(Android also includespackage_name).
Example:
{"id":4,"method":"open_app","params":{"bundle_identifier":"com.apple.Preferences"}}
tap
- Does: Taps an absolute point in the current app.
- Params:
x(number, required),y(number, required). Coordinates are in absolute screen points as reported by the tree. - Returns:
{"tree":"<string>"}
Example:
{"id":5,"method":"tap","params":{"x":120,"y":300}}
tap_element
- Does: Taps the center of an element using its XCUI frame string from the accessibility tree.
- Params:
coordinate(string, required). Must look like{{x, y}, {w, h}}(copied from the tree).count(integer, optional; default 1). Use 2 for double-tap.longPress(boolean, optional; default false). When true, performs a long-press gesture.- Returns:
{"coordinate":"<string>", "count":<number>, "longPress":<bool>, "tree":"<string>"}
Example:
{"id":6,"method":"tap_element","params":{"coordinate":"{{20.0, 165.0}, {390.0, 90.0}}","count":1,"longPress":false}}
enter_text
- Does: Taps the center of the target element (to focus it), waits briefly for the keyboard, then types the provided text followed by a newline (Return).
- Params:
coordinate(string, required). Must look like{{x, y}, {w, h}}(copied from the tree).text(string, required).- Returns:
{"coordinate":"<string>", "tree":"<string>"}
Example:
{"id":7,"method":"enter_text","params":{"coordinate":"{{33.0, 861.0}, {364.0, 38.0}}","text":"hello"}}
scroll
- Does: Scrolls by dragging from a starting point by the provided deltas.
- Params:
x(number, required),y(number, required),distanceX(number, required),distanceY(number, required). - Returns:
{"tree":"<string>"}
Example:
{"id":8,"method":"scroll","params":{"x":215,"y":760,"distanceX":0,"distanceY":-460}}
swipe
- Does: Swipes in a direction starting from a given point (implemented as a bounded drag gesture).
- Params:
x(number, required),y(number, required),direction(string, required; one ofup,down,left,right). - Returns:
{"tree":"<string>"}
Example:
{"id":9,"method":"swipe","params":{"x":215,"y":760,"direction":"up"}}
stop
- Does: Stops the RPC server test (ends the
xcodebuild testsession). - Params: none.
- Returns:
{}
Example:
{"id":10,"method":"stop","params":{}}
iOS app bundle IDs
- Settings:
com.apple.Preferences - Camera:
com.apple.camera - Photos:
com.apple.mobileslideshow - Messages:
com.apple.MobileSMS - Home Screen:
com.apple.springboard
Android package names
- Settings:
com.android.settings - Camera (AOSP):
com.android.camera2 - Photos (Google):
com.google.android.apps.photos - Messages (Google):
com.google.android.apps.messaging - Home Screen: launcher package varies by emulator/device
Recovery playbook
- If RPC hangs after
open_app, restart the test-hosted server and retry with a known-good bundle id. - If taps fail due stale UI, call
get_treeagain and recalculate target. - If iOS bridge becomes unresponsive, stop/restart
xcodebuild testand resume from latest verified app state. - If Android bridge becomes unresponsive, restart
adb(adb kill-server && adb start-server), relaunch the bridge, and retry.
End session
- Send
stoponly when the task is complete. - If
stopis not sent, terminate thexcodebuildsession manually.