Cookbook¶
Recipes for driving Android with hs. Each one is copy-pasteable into a
real terminal.
These build on the agent loop in the README and the verb
table in hs --help. If you've never run hs use, start there. The
shared action flags (--timeout, --retries, --visible, --unique,
--nth, --json, …) work on every action verb; this page shows them in
context rather than restating the table.
Fast agent loop¶
Use the UI table as the primary model input. It is small, text-first, and already contains the center coordinates Handsets will tap.
Only add an image when the visual layout matters. Keep it agent-sized:
hs see out.jpg without --size is a native-resolution export for bug
reports and pixel debugging. It is slower and much larger than the 768px
agent path. PNG is lossless/debug only:
Logging in¶
The canonical flow: find the email field, fill it, find the password field, fill it, submit, and wait for the dashboard.
hs use
hs wait idle 200ms
hs tap "Sign in" --visible --unique --timeout 5s
hs fill 'EditText[resource-id=com.example:id/email]' "you@example.com"
hs fill 'EditText[resource-id=com.example:id/password]' "hunter2"
hs submit
hs wait "Dashboard" --timeout 15s
If the app has two "Sign in" buttons (header and footer, for example),
--unique will fail with exit 4 (AMBIGUOUS); pick the one you want
with --nth 1 or tighten the selector. When IDs aren't stable, lean on
the relational pseudo-classes:
hs fill 'EditText:below(TextView[text=Email])' "you@example.com"
hs fill 'EditText:below(TextView[text=Password])' "hunter2"
If the field already has focus and you want the real keyboard path rather
than atomic set-text, use type:
Waiting for the next screen¶
hs wait is overloaded by argument shape so it stays readable:
hs wait idle 200ms # block until the UI settles for 200 ms
hs wait "Welcome" # block until that text appears anywhere
hs wait com.foo # block until that package is in the foreground
hs wait com.foo/.MainAct # block on a specific activity
hs wait 500ms # client-side sleep (no daemon hop)
All forms honour --timeout. The default is 10 s — bump it for slow
launches. Prefer these waits over shell sleep; a fixed sleep is both
slower on fast devices and flaky on slow ones.
Do then verify in one call¶
Tap a button and only succeed if the expected next state appears. This is
the usual replacement for tap && sleep && wait chains:
hs act --tap "Send" --until "Sent" --timeout 8s
hs act --fill '[id=q]' "llm" --until-selector RecyclerView --timeout 5s
hs act --swipe up 300 --until-idle --timeout 2s
Exit 0 only if both halves succeeded. TIMEOUT means the predicate never
appeared; NOT_FOUND means the action itself could not execute.
Retrying a flaky tap¶
Daemon-side waits get retried automatically when a TIMEOUT comes back,
so the script stays linear:
hs tap "Refresh" --retries 4 --retry-delay 500ms --timeout 3s
hs wait "Synced" --retries 4 --timeout 5s
For unattended jobs, branch on the exit code instead of parsing stderr.
There are only three codes worth a case arm:
hs tap "Submit" --unique --timeout 5s
case $? in
0) echo "submitted" ;;
2) echo "no Submit button on screen" ;; # NOT_FOUND
3) echo "tap acked but UI hung — escalating" # TIMEOUT
hs see --size 768 /tmp/hung.jpg ;;
4) echo "ambiguous — narrow the selector" ;; # AMBIGUOUS
*) echo "unexpected failure"; exit 1 ;;
esac
The full structured ErrCode (DAEMON_ERROR, PRECONDITION,
SECURE_WINDOW, …) is preserved in --json output as error.code —
use that when you need to dispatch on the long tail.
Scrolling until something is visible¶
A 12-attempt scroll-then-tap loop in five lines:
for _ in $(seq 1 12); do
hs tap "Settings" --visible --unique --timeout 500ms && break
hs swipe up 250
done
If you already know one scroll is enough, batch the scroll and tap over a single warm socket:
hs run executes one command per line. Do not put multiple verbs on one
line with semicolons. Use the shell for loop above when you need
conditional looping.
Debugging selectors¶
Start broad, inspect the JSON, then tighten. This is faster than guessing at coordinates.
hs find 'Button:has-text("Continue")' --json | jq .
hs find '*[text=Continue], *[text=Next]' --visible --json | jq .
If there are multiple matches, either make the selector structural or pick an index explicitly:
hs tap 'Button:has-text("Continue"):in(LinearLayout[id~=footer])' --unique
hs tap 'Button:has-text("Continue")' --visible --nth 2
Two-factor SMS¶
hs sms reads the inbox directly via IContentProvider — no installed
helper app, no permission grants.
hs wait "Enter the code"
CODE=$(hs sms inbox --limit 1 --json | jq -r '.[0].body' | grep -oE '[0-9]{6}')
hs type "$CODE"
hs submit
For apps that put the code on the clipboard, read or paste it directly:
Clipboard watch is useful for magic-link and OTP handoff scripts:
Notifications and push flows¶
Trigger a server-side push, then read recent notifications. Filter by package when you know it:
For login links that open a browser or app, wait on the resulting package or activity instead of sleeping:
Running against many devices¶
hs fan runs the same verb against each serial in parallel and exits
non-zero if any device fails:
For machine-readable per-device output, add the global --json:
Screenshotting on failure¶
A one-line bash -e trap that captures whatever was on screen the
moment a verb failed. Use the fast 768px JPEG path for unattended runs:
set -e
trap 'hs see --size 768 /tmp/handsets-$$-failure.jpg' ERR
hs tap "Sign in" --unique
hs wait "Dashboard" --timeout 10s
When you need a full artifact bundle for CI, collect the UI, device state, logs, and screenshot together:
DIR=/tmp/handsets-failure-$$
mkdir -p "$DIR"
trap 'hs ui --json > "$DIR/ui.json"; hs ui --xml > "$DIR/ui.xml"; hs info > "$DIR/info.txt"; hs logs --tail 200 > "$DIR/logs.txt"; hs see --size 768 "$DIR/screen.jpg"; echo "$DIR"' ERR
If you expect a black screenshot from a banking, password, or incognito screen, opt into the slower FLAG_SECURE check:
Selectors when text alone isn't enough¶
Real Android UIs rarely give you stable resource IDs. Lean on the relational pseudo-classes:
# The OK button inside a specific dialog:
hs tap '*[text="OK"]:in(LinearLayout[id~=dialog])' --unique
# The EditText below the "Email" label, visible only:
hs tap 'EditText:below(TextView[text=Email]):visible' --unique
# Buttons near a specific icon:
hs find 'Button:near(ImageView[desc~=cart], 200)'
# OR groups — "Continue" or "Next", whichever shipped this build:
hs tap '*[text=Continue], *[text=Next]' --visible --unique --timeout 5s
# Playwright-style sugar:
hs tap 'Button:has-text("Sign in")'
hs tap 'Button:text-is("Sign in")'
Driving from Python¶
There's a first-party SDK that wraps the CLI as a context manager and maps the exit codes to typed exceptions:
# pip install handsets
from handsets import Session, NotFound, Timeout
with Session() as d:
d.tap("Continue", visible=True, unique=True, timeout="5s")
d.fill("[resource-id=com.example:id/email]", "you@example.com")
d.submit()
try:
d.wait(text="Dashboard", timeout="15s")
except Timeout:
d.go("back")
For tight loops, the SDK opens a warm-socket batching context that
shares one hs run - subprocess across calls — per-call process
startup collapses into one startup for the whole batch:
with Session() as d:
with d.batch(timeout="5s", retries=2) as b:
for label in labels:
b.tap(label, visible=True)
b.wait(text="Done")
For Node, Go, or anything else, drive hs --json as a subprocess and
parse one JSON line per call:
import json, subprocess
r = subprocess.run(
["hs", "--json", "tap", "Sign in", "--visible", "--unique", "--timeout", "5s"],
capture_output=True, text=True,
)
match r.returncode:
case 0: print("tapped at", json.loads(r.stdout)["result"]["x"])
case 2: raise RuntimeError("no Sign-in button on screen")
case 3: raise TimeoutError("tap timed out")
Batch scripts¶
For longer flows where the per-call socket churn matters, hs run reads
verb lines from a file (or stdin) and executes them over one warm
socket. The set directives raise defaults without touching every line:
# flow.hs
set timeout=8s
set retries=2
set continue-on-error
set dump-ttl=200ms
wait idle
tap "Continue" --visible --unique
fill [id=email] "you@example.com"
fill [id=pw] "hunter2"
submit
wait "Dashboard" --timeout 15s
For ad-hoc batching, pipe directly:
App lifecycle setup¶
Bring an app to the foreground, close it, or use raw shell commands for
setup that is not yet a first-class verb. Pipe shell commands into
hs shell so streamed command output is drained correctly:
hs open com.example.app/.MainActivity
hs close com.example.app
printf '%s\n' 'pm clear com.example.app' | hs shell
For deeplink-driven flows, inspect what the APK declares: