name: android-use description: Control Android devices via ADB. Use when user asks to tap, swipe, type, launch apps, take screenshots, or automate Android device interactions. version: 0.1.0 author: iurysza license: MIT keywords: - android - adb - device-control - automation category: automation tech-stack: - Bun - ADB allowed-tools: Bash(android-use:*)
Android Device Control
CLI for Android device control via ADB. Requires device connected with USB debugging enabled.
Prerequisites
Before using this skill:
- Android device with USB debugging enabled
- ADB installed (
adb versionshould work) - Device connected via USB or emulator running
Agent Installation (Autonomous Setup)
When this skill is not installed, agents should:
Run installer (recommended):
curl -fsSL https://raw.githubusercontent.com/iurysza/android-use/main/install.sh | bashInstalled structure:
~/.config/opencode/skill/android-use/SKILL.md~/.config/opencode/skill/android-use/scripts/android-use~/.config/opencode/skill/android-use/references/~/.config/opencode/skill/android-use/assets/~/.config/opencode/skill/android-use/repo/
Verify installation:
~/.config/opencode/skill/android-use/scripts/android-use check-device
Quick Start
android-use check-device # List connected devices
android-use get-screen # Get UI hierarchy (compact JSON)
android-use tap 540 960 # Tap at coordinates
android-use type-text "Hello" # Type text
android-use key HOME # Press key
android-use screenshot ./screen.png # Capture screen
Core Agent Workflow
STEP 1: Check device
android-use check-device
- Lists all connected devices
- Note the serial number if multiple devices
- For "phone" use [PHYSICAL] device
- For "emulator" use [EMULATOR] device
STEP 2: Get screen state
android-use get-screen
- Returns compact JSON with pre-calculated tap coordinates
- Search for:
text,contentDesc,resourceId - Use the
centerfield for coordinates (e.g.,"center": [540, 289]) - Cache location:
/tmp/.ai-artifacts/skills/android-use/screen.json
STEP 3: Execute action
- Tap:
android-use tap <center_x> <center_y> - Type:
android-use tap <field_x> <field_y>thenandroid-use type-text "text" - Swipe:
android-use swipe <x1> <y1> <x2> <y2> [duration_ms] - Key:
android-use key <KEY_NAME>(HOME, BACK, ENTER, etc.)
STEP 4: Verify and repeat
- Run
get-screenagain to verify state change - Handle any dialogs that appeared
- Repeat until goal achieved
Commands
Device & Screen
| Command | Args | Description |
|---|---|---|
check-device |
[serial] |
List/verify connected devices |
wake |
[serial] |
Wake device + dismiss lock |
get-screen |
[serial] |
Dump UI accessibility tree (compact JSON) |
screenshot |
[path] [serial] |
Capture screen image |
Input Actions
| Command | Args | Description |
|---|---|---|
tap |
<x> <y> [serial] |
Tap at coordinates |
type-text |
<text> [serial] |
Type text string |
swipe |
<x1> <y1> <x2> <y2> [ms] [serial] |
Swipe gesture |
key |
<keycode|name> [serial] |
Press key |
App Management
| Command | Args | Description |
|---|---|---|
launch-app |
<package> [serial] |
Launch app by package name |
install-apk |
<path> [serial] |
Install APK file |
Global Options
| Option | Description |
|---|---|
-s, --serial <id> |
Target specific device |
--json |
Output as JSON |
--verbose |
Verbose logging |
--timeout <ms> |
Timeout (default: 15000) |
--adb-path <path> |
Path to ADB binary |
--full |
Full XML output (for get-screen) |
Common App Package Names
Use these package names with launch-app:
| App Name | Package Name |
|---|---|
| Chrome | com.android.chrome |
| Settings | com.android.settings |
| Phone / Dialer | com.android.dialer |
| Messages / SMS | com.google.android.apps.messaging |
| Camera | com.android.camera |
| Photos | com.google.android.apps.photos |
| Gmail | com.google.android.gm |
| Maps | com.google.android.apps.maps |
| YouTube | com.google.android.youtube |
| Play Store | com.android.vending |
| Calendar | com.google.android.calendar |
| Clock | com.google.android.deskclock |
| Calculator | com.google.android.calculator |
| Contacts | com.android.contacts |
| Files | com.google.android.documentsui |
com.whatsapp |
|
com.instagram.android |
|
com.facebook.katana |
|
| Twitter / X | com.twitter.android |
| Spotify | com.spotify.music |
| Netflix | com.netflix.mediaclient |
| Telegram | org.telegram.messenger |
| Discord | com.discord |
| Slack | com.Slack |
| Zoom | us.zoom.videomeetings |
| Teams | com.microsoft.teams |
| Outlook | com.microsoft.office.outlook |
| Drive | com.google.android.apps.docs |
| Keep / Notes | com.google.android.keep |
com.reddit.frontpage |
|
| Bluesky | xyz.blueskyweb.app |
Reading Screen Data
Compact JSON (default):
get-screen outputs pre-calculated tap coordinates and filtered elements (99% smaller than XML):
{
"elements": [
{
"text": "Settings",
"resourceId": "com.android.settings:id/title",
"contentDesc": "",
"clickable": true,
"scrollable": false,
"focused": false,
"bounds": [42, 234, 1038, 345],
"center": [540, 289]
}
],
"clickable": [...],
"scrollable": [...],
"withText": [...],
"withContentDesc": [...]
}
Key attributes:
text- Visible textcontentDesc- Accessibility description (icons)resourceId- Element identifierclickable/scrollable- Interaction statescenter- Pre-calculated tap coordinates[x, y]bounds- Original bounds[left, top, right, bottom]
Use pre-calculated center:
- Already calculated:
tap 540 289(no manual math needed)
Full XML (when needed):
- Use
get-screen --fullfor raw XML - Saves to
/tmp/.ai-artifacts/skills/android-use/screen.xml
Cache location (memory-backed):
/tmp/.ai-artifacts/skills/android-use/screen.json(compact)/tmp/.ai-artifacts/skills/android-use/screen.xml(full)
Key Names
HOME, BACK, MENU, POWER, ENTER, TAB, DEL, ESCAPE, VOLUME_UP, VOLUME_DOWN, DPAD_UP, DPAD_DOWN, DPAD_LEFT, DPAD_RIGHT
Multi-Device Support
When multiple devices connected:
- Run
check-deviceto see all devices with types - User says "phone/physical" -> use
[PHYSICAL]device - User says "emulator" -> use
[EMULATOR]device - Pass
-s <serial>to ALL subsequent commands
android-use check-device
# Multiple devices connected (2):
# [PHYSICAL] 1A051FDF6007PA - Pixel 6
# [EMULATOR] emulator-5554 - sdk_gphone64_arm64
android-use -s 1A051FDF6007PA get-screen
android-use -s 1A051FDF6007PA tap 540 960
Common Patterns
Tap a button
android-use get-screen # Get JSON with pre-calculated centers
# Search JSON for button, find center: [540, 289]
android-use tap 540 289 # Tap at center
Enter text in field
android-use tap 540 184 # Focus field
android-use type-text "search term"
android-use key ENTER # Submit
Scroll to find content
android-use get-screen # Check if visible
android-use swipe 540 1500 540 500 # Swipe up (scroll down)
android-use get-screen # Check again
Handle dialogs
# Look for "OK", "Allow", "Accept" buttons in XML
android-use tap <button-center>
# Or dismiss with back
android-use key BACK
Open app and navigate
android-use launch-app com.android.chrome
android-use get-screen
# Find URL bar, tap it
android-use tap 540 184
android-use type-text "example.com"
android-use key ENTER
Agent Examples
See the references/ directory for:
AGENTS_GETTING_STARTED.md- Setup and command guideAGENT_WORKFLOWS.md- End-to-end agent workflows01-taking-a-screenshot.md- Capture and verify screenshots02-opening-an-app.md- App launch and navigation03-tapping-a-button.md- Tap flow from screen JSON04-filling-a-form.md- Multi-step form input pattern05-scrolling-to-find-content.md- Scroll/search interaction loop06-handling-dialogs.md- Popup detection and dismissal
JSON Output
Use --json for structured output:
android-use --json check-device
{
"success": true,
"exitCode": 0,
"data": { "devices": [...], "count": 1 },
"message": "Found 1 device(s)",
"trace": { ... }
}
Error Handling
- No device: Check USB, verify USB debugging enabled, accept "Allow USB debugging?" prompt
- Element not found: Get fresh screen dump, try scrolling
- Action didn't work: Add minimal delay (300ms max) only if retrying, then verify coordinates, check for popups/dialogs, get fresh screen dump
- Device offline: Reconnect USB, run
adb kill-server && adb start-server
Agent Best Practices
- Always get-screen first - understand current UI state
- No artificial delays needed - Commands execute synchronously; UI is ready for next command immediately
- Check your work - get-screen after each action to verify
- Use screenshots - when JSON doesn't capture enough info
- Be consistent - use same serial for all commands in session
- Compact JSON default - 99% smaller, pre-calculated tap coords, cached to
/tmp/.ai-artifacts/skills/android-use/ - Handle dialogs - popups often block interactions
- Use center coordinates - from JSON output, no manual calculation needed
Compact JSON Filter Logic
The compact JSON includes elements with ANY of:
- Non-empty
text(visible labels) - Non-empty
contentDesc(accessibility descriptions) clickable = true(interactive elements)scrollable = true(scrollable containers)
This filters ~336 raw nodes → ~55 useful elements (4x smaller)
Repository
- GitHub: https://github.com/iurysza/android-use
- Issues: https://github.com/iurysza/android-use/issues
- References: See
references/folder