<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Elliot Gao</title>
    <description>The latest articles on Forem by Elliot Gao (@elliotgao2).</description>
    <link>https://forem.com/elliotgao2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949216%2F30aefaac-3307-4d31-8c63-c21343f6e9b3.png</url>
      <title>Forem: Elliot Gao</title>
      <link>https://forem.com/elliotgao2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/elliotgao2"/>
    <language>en</language>
    <item>
      <title>Mobile RPA on Android Without Root</title>
      <dc:creator>Elliot Gao</dc:creator>
      <pubDate>Tue, 26 May 2026 12:31:10 +0000</pubDate>
      <link>https://forem.com/elliotgao2/mobile-rpa-on-android-without-root-45l0</link>
      <guid>https://forem.com/elliotgao2/mobile-rpa-on-android-without-root-45l0</guid>
      <description>&lt;p&gt;Mobile RPA usually starts with a simple sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We need to do the same thing in this Android app every day.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Maybe it is checking an account. Maybe it is downloading a report. Maybe it is entering a code, reading a status, or moving data between an app and an internal system.&lt;/p&gt;

&lt;p&gt;If the app has no API, automation falls back to the UI.&lt;/p&gt;

&lt;p&gt;You can automate many Android app workflows without root. The trick is to keep the workflow close to what a human sees: tap labels, fill fields, wait for screens, and save evidence when something breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer
&lt;/h2&gt;

&lt;p&gt;A no-root Android RPA flow can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs go com.example.app
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Sign in"&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 15s
hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_EMAIL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Password"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt; &lt;span class="nt"&gt;--visible&lt;/span&gt; &lt;span class="nt"&gt;--unique&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 20s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It runs through normal Android debugging access. No root. No custom ROM. No app integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why root is the wrong default
&lt;/h2&gt;

&lt;p&gt;Root can make automation powerful, but it also changes the device.&lt;/p&gt;

&lt;p&gt;For business workflows, that is often a problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The device no longer behaves like a normal user device.&lt;/li&gt;
&lt;li&gt;Some apps detect rooted environments.&lt;/li&gt;
&lt;li&gt;Security assumptions change.&lt;/li&gt;
&lt;li&gt;Maintenance becomes harder.&lt;/li&gt;
&lt;li&gt;Enterprise teams get nervous.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the workflow can be driven through visible UI, start there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model the workflow as states
&lt;/h2&gt;

&lt;p&gt;Good mobile RPA scripts are not just tap sequences. They are state transitions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Launch app
Wait for Sign in
Fill credentials
Tap Continue
Wait for Dashboard
Open Reports
Download file
Capture confirmation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In shell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs go com.example.app
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Sign in"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_EMAIL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Password"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;wait&lt;/code&gt; commands are as important as the actions. They make the workflow resilient to slow devices and network delays.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prefer labels over coordinates
&lt;/h2&gt;

&lt;p&gt;Coordinate automation is tempting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adb shell input tap 540 860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But RPA workflows need to survive small layout changes.&lt;/p&gt;

&lt;p&gt;Use visible labels instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs tap &lt;span class="s2"&gt;"Download"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Saved"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When labels repeat, tighten the selector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs tap &lt;span class="s1"&gt;'Button:has-text("Download")'&lt;/span&gt; &lt;span class="nt"&gt;--visible&lt;/span&gt; &lt;span class="nt"&gt;--unique&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If there are multiple matches, &lt;code&gt;--unique&lt;/code&gt; fails instead of tapping the wrong one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capture proof
&lt;/h2&gt;

&lt;p&gt;Business automation needs evidence.&lt;/p&gt;

&lt;p&gt;At the end of a successful run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs see &lt;span class="nt"&gt;--size&lt;/span&gt; 768 &lt;span class="s2"&gt;"/tmp/run-success.jpg"&lt;/span&gt;
hs ui &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"/tmp/run-ui.txt"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On failure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;ARTIFACTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/tmp/mobile-rpa-&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ARTIFACTS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;trap&lt;/span&gt; &lt;span class="s1"&gt;'hs ui &amp;gt; "$ARTIFACTS/ui.txt"; hs see --size 768 "$ARTIFACTS/screen.jpg"; hs logs --tail 200 &amp;gt; "$ARTIFACTS/logs.txt"; echo "$ARTIFACTS"'&lt;/span&gt; ERR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a screenshot, a UI dump, and recent logs for debugging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handle OTP and notification flows
&lt;/h2&gt;

&lt;p&gt;Many mobile workflows involve one-time passwords, push notifications, or deep links.&lt;/p&gt;

&lt;p&gt;When the app shows a code field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Enter the code"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Code"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$OTP_CODE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Verify"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a push opens the app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs &lt;span class="nb"&gt;wait &lt;/span&gt;com.example.app &lt;span class="nt"&gt;--timeout&lt;/span&gt; 15s
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Approve"&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 15s
hs tap &lt;span class="s2"&gt;"Approve"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is still the same: wait for real UI state instead of sleeping.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where LLM agents fit
&lt;/h2&gt;

&lt;p&gt;Some RPA workflows are too variable for a fixed script.&lt;/p&gt;

&lt;p&gt;An LLM agent can help decide the next action when the screen changes. But the tool surface should still be small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tap   Button    "Approve"  #approve  540,860
fill  EditText  "Code"     #otp      540,640
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent should choose labels and actions, not raw pixels. That keeps the run auditable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;No-root mobile RPA still follows Android's security model.&lt;/p&gt;

&lt;p&gt;Some things may require device-owner policy, app integration, or root:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reading private app data.&lt;/li&gt;
&lt;li&gt;Bypassing secure windows.&lt;/li&gt;
&lt;li&gt;Changing protected system settings.&lt;/li&gt;
&lt;li&gt;Automating apps that intentionally block accessibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many workflows, though, visible UI automation is enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can Android RPA run without root?
&lt;/h3&gt;

&lt;p&gt;Yes. Many Android workflows can be automated without root by using ADB, visible UI labels, text input, waits, screenshots, and logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is this the same as Appium?
&lt;/h3&gt;

&lt;p&gt;No. Appium is a full mobile testing framework. A CLI workflow with Handsets is smaller and better suited to scripts, RPA jobs, and agent loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can this run on real devices?
&lt;/h3&gt;

&lt;p&gt;Yes. It works on real Android devices and emulators as long as &lt;code&gt;adb&lt;/code&gt; can reach the device.&lt;/p&gt;

&lt;h3&gt;
  
  
  What should I log for compliance?
&lt;/h3&gt;

&lt;p&gt;At minimum: start time, device/session id, action timeline, final status, screenshots for important states, and logs around failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/how-to-automate-android-without-appium/" rel="noopener noreferrer"&gt;How to Automate Android Without Appium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/fast-android-ui-automation-with-adb/" rel="noopener noreferrer"&gt;Fast Android UI Automation with ADB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/how-to-run-mobile-qa-tests-without-rooted-phones/" rel="noopener noreferrer"&gt;How to Run Mobile QA Tests Without Rooted Phones&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://handsets.dev/blog/mobile-rpa-android-without-root/" rel="noopener noreferrer"&gt;https://handsets.dev/blog/mobile-rpa-android-without-root/&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>android</category>
      <category>rpa</category>
      <category>automation</category>
      <category>testing</category>
    </item>
    <item>
      <title>How to Debug LLM-Driven Android Automation Runs</title>
      <dc:creator>Elliot Gao</dc:creator>
      <pubDate>Tue, 26 May 2026 12:21:43 +0000</pubDate>
      <link>https://forem.com/elliotgao2/how-to-debug-llm-driven-android-automation-runs-3eej</link>
      <guid>https://forem.com/elliotgao2/how-to-debug-llm-driven-android-automation-runs-3eej</guid>
      <description>&lt;p&gt;LLM-driven Android automation fails in strange ways.&lt;/p&gt;

&lt;p&gt;The model may tap the wrong label. The screen may change between observation and action. A keyboard may cover the button. A permission dialog may appear. The app may still be loading. The UI dump may expose two identical "Continue" buttons.&lt;/p&gt;

&lt;p&gt;If all you saved is the final screenshot, debugging is painful.&lt;/p&gt;

&lt;p&gt;You need a run trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer
&lt;/h2&gt;

&lt;p&gt;For every Android agent step, save:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the compact UI dump&lt;/li&gt;
&lt;li&gt;the screenshot when needed&lt;/li&gt;
&lt;li&gt;the model's chosen action&lt;/li&gt;
&lt;li&gt;the actual device command&lt;/li&gt;
&lt;li&gt;the result or structured error&lt;/li&gt;
&lt;li&gt;recent logs&lt;/li&gt;
&lt;li&gt;the top package/activity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The minimum useful trace looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;observe: tap Button "Continue" #continue 540,860
model:   tap "Continue"
action:  hs tap "Continue" --visible --unique
result:  ok
wait:    hs wait "Dashboard" --timeout 15s
result:  TIMEOUT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is much easier to debug than "the agent failed."&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure modes
&lt;/h2&gt;

&lt;p&gt;Android agent failures usually fall into a few buckets.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NOT_FOUND&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The target label or selector was not visible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AMBIGUOUS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;More than one visible node matched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TIMEOUT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The expected next state never appeared&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SECURE_WINDOW&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Android blocked screenshots for the current window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrong action&lt;/td&gt;
&lt;td&gt;The model chose a bad label or command&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stale observation&lt;/td&gt;
&lt;td&gt;The UI changed after the model saw it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Good tooling should preserve which bucket happened.&lt;/p&gt;

&lt;p&gt;If everything becomes "click failed", the agent cannot recover intelligently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Save the UI dump before the action
&lt;/h2&gt;

&lt;p&gt;The UI dump is the agent's view of the world.&lt;/p&gt;

&lt;p&gt;Save it before each model decision:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs ui &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; run/0007-ui.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For LLM agents, a compact action table is usually better than full XML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a model picks the wrong action, this file tells you whether the model had a reasonable choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Save screenshots selectively
&lt;/h2&gt;

&lt;p&gt;Screenshots are valuable, but you do not need a full native PNG on every step.&lt;/p&gt;

&lt;p&gt;For most agent debugging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs see &lt;span class="nt"&gt;--size&lt;/span&gt; 768 run/0007-screen.jpg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use screenshots when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the UI dump has too little information&lt;/li&gt;
&lt;li&gt;the app renders custom controls&lt;/li&gt;
&lt;li&gt;visual layout matters&lt;/li&gt;
&lt;li&gt;a failure needs human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the text UI as the default. Use screenshots as evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Record the model action separately
&lt;/h2&gt;

&lt;p&gt;Do not only save the final command.&lt;/p&gt;

&lt;p&gt;Save what the model actually emitted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model_action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Continue&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool_call"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"hs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Continue"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--visible"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--unique"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The login form is filled and Continue is visible."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters because the bug may be in translation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model chose the right label, but the tool call used the wrong selector.&lt;/li&gt;
&lt;li&gt;The model chose a coordinate when a label was available.&lt;/li&gt;
&lt;li&gt;The model ignored an ambiguity warning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep the model layer and tool layer separate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prefer structured errors
&lt;/h2&gt;

&lt;p&gt;Exit codes and error codes are better than stderr scraping.&lt;/p&gt;

&lt;p&gt;Handsets has common exit codes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0  ok
2  NOT_FOUND
3  TIMEOUT
4  AMBIGUOUS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In JSON mode, preserve the structured error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs &lt;span class="nt"&gt;--json&lt;/span&gt; tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt; &lt;span class="nt"&gt;--visible&lt;/span&gt; &lt;span class="nt"&gt;--unique&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then your agent can decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;NOT_FOUND&lt;/code&gt;: dump UI again or scroll&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AMBIGUOUS&lt;/code&gt;: ask for a narrower selector&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TIMEOUT&lt;/code&gt;: capture screenshot and logs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SECURE_WINDOW&lt;/code&gt;: continue without screenshot&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Keep logs close to the failing step
&lt;/h2&gt;

&lt;p&gt;Android logs are noisy. A small tail near the failure is usually enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs logs &lt;span class="nt"&gt;--tail&lt;/span&gt; 200 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; run/0007-logcat.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pair logs with the UI dump and screenshot from the same step. Otherwise you end up with artifacts that are technically present but hard to correlate.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple artifact layout
&lt;/h2&gt;

&lt;p&gt;Use numbered files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;run/
  0001-ui.txt
  0001-action.json
  0001-result.json
  0002-ui.txt
  0002-screen.jpg
  0002-action.json
  0002-result.json
  0002-logcat.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not fancy. That is the point.&lt;/p&gt;

&lt;p&gt;Before building a dashboard, make the run inspectable with plain files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replay is the next step
&lt;/h2&gt;

&lt;p&gt;Once you have traces, replay becomes possible.&lt;/p&gt;

&lt;p&gt;The useful replay is not pixel-perfect video. It is a timeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: observed Sign in
Step 2: tapped Sign in
Step 3: filled Email
Step 4: filled Password
Step 5: tapped Continue
Step 6: timed out waiting for Dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For teams, this timeline becomes the product. It lets an engineer see whether the model, the tool, or the app caused the failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why are LLM Android agents hard to debug?
&lt;/h3&gt;

&lt;p&gt;Because failures can come from the model, the app, the Android UI state, the automation tool, or timing. A final screenshot does not tell you which layer failed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I save screenshots for every step?
&lt;/h3&gt;

&lt;p&gt;Not always. Save compact UI dumps for every step. Add screenshots for visual states, failures, and custom-rendered screens.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the most important artifact?
&lt;/h3&gt;

&lt;p&gt;The pre-action UI dump. It shows what the model saw when it chose the action.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does this help reliability?
&lt;/h3&gt;

&lt;p&gt;Structured traces let you build targeted recovery: scroll on &lt;code&gt;NOT_FOUND&lt;/code&gt;, narrow selectors on &lt;code&gt;AMBIGUOUS&lt;/code&gt;, capture logs on &lt;code&gt;TIMEOUT&lt;/code&gt;, and avoid retrying blindly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/android-automation-for-llm-agents/" rel="noopener noreferrer"&gt;Android Automation for LLM Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/android-device-cloud-for-llm-agents/" rel="noopener noreferrer"&gt;Android Device Cloud for LLM Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/stop-wasting-tokens-on-android-automation/" rel="noopener noreferrer"&gt;Stop Wasting Tokens on Android Automation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://handsets.dev/blog/debug-llm-android-automation-runs/" rel="noopener noreferrer"&gt;https://handsets.dev/blog/debug-llm-android-automation-runs/&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>android</category>
      <category>debugging</category>
      <category>automation</category>
      <category>ai</category>
    </item>
    <item>
      <title>Android Device Cloud for LLM Agents</title>
      <dc:creator>Elliot Gao</dc:creator>
      <pubDate>Tue, 26 May 2026 12:21:40 +0000</pubDate>
      <link>https://forem.com/elliotgao2/android-device-cloud-for-llm-agents-lh9</link>
      <guid>https://forem.com/elliotgao2/android-device-cloud-for-llm-agents-lh9</guid>
      <description>&lt;p&gt;Browser agents have a clear infrastructure model now.&lt;/p&gt;

&lt;p&gt;You create a browser session, give it to a model, watch the actions, collect logs, and tear it down when the run ends.&lt;/p&gt;

&lt;p&gt;Android agents need the same thing, but the device is harder.&lt;/p&gt;

&lt;p&gt;An Android session is not just a webpage. It has an OS, apps, permissions, push notifications, keyboards, secure screens, package state, and a UI tree that was not designed for language models.&lt;/p&gt;

&lt;p&gt;If you want reliable Android agents, you eventually need an Android device cloud built for agents, not just a generic mobile testing grid.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer
&lt;/h2&gt;

&lt;p&gt;An Android device cloud for LLM agents should provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ephemeral Android sessions.&lt;/li&gt;
&lt;li&gt;Fast &lt;code&gt;tap&lt;/code&gt;, &lt;code&gt;fill&lt;/code&gt;, &lt;code&gt;swipe&lt;/code&gt;, and &lt;code&gt;wait&lt;/code&gt; actions.&lt;/li&gt;
&lt;li&gt;Compact UI observations for prompts.&lt;/li&gt;
&lt;li&gt;Screenshots only when visual context matters.&lt;/li&gt;
&lt;li&gt;Logs, screenshots, and UI dumps for every step.&lt;/li&gt;
&lt;li&gt;Session recording and replay.&lt;/li&gt;
&lt;li&gt;Isolation between runs.&lt;/li&gt;
&lt;li&gt;A Python/API surface simple enough for agent loops.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The device is the runtime. The UI dump is the observation. The action API is the actuator.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why mobile agents need different infrastructure
&lt;/h2&gt;

&lt;p&gt;Traditional device clouds were built for test suites.&lt;/p&gt;

&lt;p&gt;The core workflow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload an app.&lt;/li&gt;
&lt;li&gt;Start a test.&lt;/li&gt;
&lt;li&gt;Run a framework such as Appium, Espresso, or XCTest.&lt;/li&gt;
&lt;li&gt;Collect a report.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That model is useful, but LLM agents behave differently.&lt;/p&gt;

&lt;p&gt;An agent may inspect the screen dozens or hundreds of times. It may need to retry actions, ask for a screenshot, inspect notifications, or recover from an unexpected permission dialog.&lt;/p&gt;

&lt;p&gt;The infrastructure has to support a tight loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;observe -&amp;gt; decide -&amp;gt; act -&amp;gt; wait -&amp;gt; observe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If each loop step is slow, verbose, or hard to debug, the agent becomes expensive and unreliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observation: compact first, visual second
&lt;/h2&gt;

&lt;p&gt;Most Android agent loops start with a screenshot or UIAutomator XML.&lt;/p&gt;

&lt;p&gt;Both are useful. Neither should be the only observation.&lt;/p&gt;

&lt;p&gt;Screenshots are great for visual layout, but they are heavy. XML is structured, but it contains a lot of layout noise.&lt;/p&gt;

&lt;p&gt;For agents, a better default observation is an action table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That tells the model what it can do. Add a screenshot when the text UI is not enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs ui
hs see &lt;span class="nt"&gt;--size&lt;/span&gt; 768 /tmp/screen.jpg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the prompt smaller and the action easier to audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Action: labels beat pixels
&lt;/h2&gt;

&lt;p&gt;A generic device cloud can expose raw taps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;540&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;860&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is sometimes necessary, but it is not the best default.&lt;/p&gt;

&lt;p&gt;For agent runs, label-based actions are easier to understand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Continue"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The run transcript is now readable. A human can see what the agent intended. A retry policy can distinguish &lt;code&gt;NOT_FOUND&lt;/code&gt; from &lt;code&gt;AMBIGUOUS&lt;/code&gt; from &lt;code&gt;TIMEOUT&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging is the product
&lt;/h2&gt;

&lt;p&gt;The hard part of agent infrastructure is not only running the device. It is understanding why a run failed.&lt;/p&gt;

&lt;p&gt;A useful Android agent cloud should keep a timeline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Data&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Observation&lt;/td&gt;
&lt;td&gt;UI table, screenshot, top activity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model output&lt;/td&gt;
&lt;td&gt;Intended action and reasoning if available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Action&lt;/td&gt;
&lt;td&gt;Tap/fill/swipe/wait payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Result&lt;/td&gt;
&lt;td&gt;Success or structured error&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Artifacts&lt;/td&gt;
&lt;td&gt;Logs, screenshot, UI dump&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When the agent fails, you should be able to replay the path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. saw "Sign in"
2. tapped "Sign in"
3. filled "Email"
4. filled "Password"
5. tapped "Continue"
6. timed out waiting for "Dashboard"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without that, every failure becomes a mystery screenshot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolation matters
&lt;/h2&gt;

&lt;p&gt;Android sessions carry state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installed apps.&lt;/li&gt;
&lt;li&gt;Login sessions.&lt;/li&gt;
&lt;li&gt;Runtime permissions.&lt;/li&gt;
&lt;li&gt;Clipboard.&lt;/li&gt;
&lt;li&gt;Notifications.&lt;/li&gt;
&lt;li&gt;System settings.&lt;/li&gt;
&lt;li&gt;Cached data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An agent cloud has to decide how sessions are reset. Emulators are easier to snapshot. Real devices are harder but closer to production.&lt;/p&gt;

&lt;p&gt;For most agent experiments, emulator sessions are enough. For mobile RPA or app-store reality checks, real devices become important.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Handsets fits
&lt;/h2&gt;

&lt;p&gt;Handsets is not a full device cloud by itself.&lt;/p&gt;

&lt;p&gt;It is the control plane a cloud can build on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no-root device actions through ADB&lt;/li&gt;
&lt;li&gt;compact UI dumps&lt;/li&gt;
&lt;li&gt;label-based selectors&lt;/li&gt;
&lt;li&gt;screenshots and logs&lt;/li&gt;
&lt;li&gt;Python and subprocess integration&lt;/li&gt;
&lt;li&gt;a terminal UI for human debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The local loop looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs ui
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A hosted version would wrap that in session management, auth, billing, isolation, recording, and replay.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is an Android device cloud the same as Appium cloud testing?
&lt;/h3&gt;

&lt;p&gt;Not exactly. Appium clouds are usually optimized for test suites. An Android agent cloud needs lower-latency observations, compact prompt-friendly UI output, and better step-by-step replay for model-driven runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do LLM agents need real Android devices?
&lt;/h3&gt;

&lt;p&gt;Sometimes. Emulators are enough for many app flows and experiments. Real devices matter when hardware behavior, OEM skins, push delivery, biometrics, or production-like behavior matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not just use screenshots?
&lt;/h3&gt;

&lt;p&gt;Screenshots are useful, but they are expensive and ambiguous. A compact UI table gives the model actionable labels and controls. Use screenshots as an additional observation, not the only one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this require root?
&lt;/h3&gt;

&lt;p&gt;No. A useful Android agent runtime can operate through ADB and the shell user for normal UI automation. Some protected screens and app-private data remain protected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/android-automation-for-llm-agents/" rel="noopener noreferrer"&gt;Android Automation for LLM Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/stop-wasting-tokens-on-android-automation/" rel="noopener noreferrer"&gt;Stop Wasting Tokens on Android Automation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/android-terminal-ui/" rel="noopener noreferrer"&gt;A Terminal UI for Driving Android Apps&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://handsets.dev/blog/android-device-cloud-for-llm-agents/" rel="noopener noreferrer"&gt;https://handsets.dev/blog/android-device-cloud-for-llm-agents/&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>android</category>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
    </item>
    <item>
      <title>How to Automate Android Without Appium</title>
      <dc:creator>Elliot Gao</dc:creator>
      <pubDate>Mon, 25 May 2026 09:00:39 +0000</pubDate>
      <link>https://forem.com/elliotgao2/how-to-automate-android-without-appium-4fn3</link>
      <guid>https://forem.com/elliotgao2/how-to-automate-android-without-appium-4fn3</guid>
      <description>&lt;p&gt;You do not need Appium for every Android automation task.&lt;/p&gt;

&lt;p&gt;Appium is the right tool when you need a full WebDriver-based mobile testing framework. But many Android workflows are smaller than that. You may only need to open an app, tap visible buttons, type into fields, wait for a result, and collect a screenshot on failure.&lt;/p&gt;

&lt;p&gt;For those jobs, a CLI can be enough.&lt;/p&gt;

&lt;p&gt;Handsets lets you automate Android from the terminal without root and without installing a visible helper app on the phone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer
&lt;/h2&gt;

&lt;p&gt;The fastest way to automate Android without Appium is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"you@example.com"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you the core automation loop: connect to the device, act on visible UI labels, and wait for the next state. You still use normal Android debugging access. You do not need root, WebDriver, or an Appium server.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you need
&lt;/h2&gt;

&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Android phone or emulator.&lt;/li&gt;
&lt;li&gt;USB debugging enabled.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;adb&lt;/code&gt; on your path.&lt;/li&gt;
&lt;li&gt;Handsets installed on your host machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/elliotgao2/handsets/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can control the device with commands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tap by text
&lt;/h2&gt;

&lt;p&gt;Raw &lt;code&gt;adb&lt;/code&gt; can tap coordinates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adb shell input tap 540 860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is brittle. It depends on screen size, density, orientation, and layout.&lt;/p&gt;

&lt;p&gt;With Handsets, tap the visible label:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;--visible&lt;/code&gt; and &lt;code&gt;--unique&lt;/code&gt; when a script should fail rather than guess:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt; &lt;span class="nt"&gt;--visible&lt;/span&gt; &lt;span class="nt"&gt;--unique&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 5s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the main difference from raw &lt;code&gt;adb shell input tap&lt;/code&gt;. The script says what it means. It does not encode where a button happened to be on one device.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fill fields
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;fill&lt;/code&gt; for text fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"you@example.com"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Password"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Sign in"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If labels are repeated, use selectors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs fill &lt;span class="s1"&gt;'EditText:below(TextView[text=Email])'&lt;/span&gt; &lt;span class="s2"&gt;"you@example.com"&lt;/span&gt;
hs fill &lt;span class="s1"&gt;'EditText:below(TextView[text=Password])'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The selector syntax is CSS-like and built for real Android UI trees.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wait for the next screen
&lt;/h2&gt;

&lt;p&gt;Do not write sleeps unless you truly need a fixed delay.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 15s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the script faster on fast devices and more reliable on slow devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  A complete script
&lt;/h2&gt;

&lt;p&gt;Here is a no-Appium Android login script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

hs use
hs go com.example.app
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Sign in"&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 15s

hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_EMAIL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Password"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt; &lt;span class="nt"&gt;--visible&lt;/span&gt; &lt;span class="nt"&gt;--unique&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 20s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add failure artifacts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;ARTIFACTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/tmp/android-run-&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ARTIFACTS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;trap&lt;/span&gt; &lt;span class="s1"&gt;'hs ui &amp;gt; "$ARTIFACTS/ui.txt"; hs see --size 768 "$ARTIFACTS/screen.jpg"; hs logs --tail 200 &amp;gt; "$ARTIFACTS/logs.txt"; echo "$ARTIFACTS"'&lt;/span&gt; ERR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the script leaves behind the UI, screenshot, and logs when something breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Can I just use adb?
&lt;/h2&gt;

&lt;p&gt;Sometimes, yes.&lt;/p&gt;

&lt;p&gt;Raw &lt;code&gt;adb&lt;/code&gt; is great for low-level device commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adb shell am start &lt;span class="nt"&gt;-n&lt;/span&gt; com.example/.MainActivity
adb shell input keyevent BACK
adb shell wm size
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It becomes awkward when you need semantic UI automation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tap the button labeled "Continue".&lt;/li&gt;
&lt;li&gt;Fill the password field below "Password".&lt;/li&gt;
&lt;li&gt;Wait until "Dashboard" appears.&lt;/li&gt;
&lt;li&gt;Fail if there are two matching buttons.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where a higher-level tool helps. Handsets still uses &lt;code&gt;adb&lt;/code&gt; underneath, but it gives you label-based actions and structured failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Appium is still better
&lt;/h2&gt;

&lt;p&gt;Use Appium if you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;iOS support.&lt;/li&gt;
&lt;li&gt;WebDriver compatibility.&lt;/li&gt;
&lt;li&gt;A full test framework.&lt;/li&gt;
&lt;li&gt;Cloud device farm integrations.&lt;/li&gt;
&lt;li&gt;Rich reports and recorders.&lt;/li&gt;
&lt;li&gt;A large QA ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are real strengths.&lt;/p&gt;

&lt;p&gt;But if your goal is Android-only CLI automation, Appium may be more stack than you need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for LLM agents
&lt;/h2&gt;

&lt;p&gt;LLM-driven Android automation benefits from a small text interface.&lt;/p&gt;

&lt;p&gt;Handsets can print the current screen as an action table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is easier for a model to consume than a full XML tree. It also reduces prompt size for long trajectories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;You can automate Android without Appium if your workflow is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Android only.&lt;/li&gt;
&lt;li&gt;CLI-first.&lt;/li&gt;
&lt;li&gt;Label-based.&lt;/li&gt;
&lt;li&gt;No-root.&lt;/li&gt;
&lt;li&gt;Scripted from shell or Python.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs ui
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Welcome"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That covers more Android automation work than you might expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do I need Appium to automate Android?
&lt;/h3&gt;

&lt;p&gt;No. Appium is useful for full mobile test frameworks, especially cross-platform suites, but Android can also be automated from the command line with &lt;code&gt;adb&lt;/code&gt; and tools like Handsets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I automate Android without root?
&lt;/h3&gt;

&lt;p&gt;Yes. For normal UI automation, root is not required. You can tap, type, swipe, inspect visible UI, wait for text, and capture screenshots when the current app allows it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is this better than Appium?
&lt;/h3&gt;

&lt;p&gt;It depends. Handsets is better for Android-only CLI scripts, LLM agents, and fast tap-heavy flows. Appium is better for cross-platform QA infrastructure and WebDriver-based test suites.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I run this in CI?
&lt;/h3&gt;

&lt;p&gt;Yes, as long as your CI runner can access an Android emulator or connected device with &lt;code&gt;adb&lt;/code&gt;. The commands are shell-friendly and return normal exit codes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/fast-android-ui-automation-with-adb/" rel="noopener noreferrer"&gt;Fast Android UI Automation with ADB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/tap-android-buttons-by-text-command-line/" rel="noopener noreferrer"&gt;How to Tap Android Buttons by Text from the Command Line&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/handsets-vs-appium/" rel="noopener noreferrer"&gt;Handsets vs Appium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/best-appium-alternative-for-android-automation/" rel="noopener noreferrer"&gt;Best Appium Alternative for Android Automation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/how-to-automate-android-apps-without-root/" rel="noopener noreferrer"&gt;How to Automate Android Apps Without Root&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://handsets.dev/blog/how-to-automate-android-without-appium/" rel="noopener noreferrer"&gt;https://handsets.dev/blog/how-to-automate-android-without-appium/&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>android</category>
      <category>automation</category>
      <category>appium</category>
      <category>testing</category>
    </item>
    <item>
      <title>Handsets vs Appium: Which Android Automation Tool Should You Use?</title>
      <dc:creator>Elliot Gao</dc:creator>
      <pubDate>Mon, 25 May 2026 08:55:10 +0000</pubDate>
      <link>https://forem.com/elliotgao2/handsets-vs-appium-which-android-automation-tool-should-you-use-2ddd</link>
      <guid>https://forem.com/elliotgao2/handsets-vs-appium-which-android-automation-tool-should-you-use-2ddd</guid>
      <description>&lt;p&gt;Appium is the default answer for mobile automation.&lt;/p&gt;

&lt;p&gt;It is mature, cross-platform, WebDriver-compatible, and supported by a large ecosystem. If a QA team needs one framework for Android and iOS, reports, Selenium-style infrastructure, and cloud device farms, Appium is usually the right place to start.&lt;/p&gt;

&lt;p&gt;Handsets solves a smaller problem.&lt;/p&gt;

&lt;p&gt;It is an Android-only CLI for driving phones from shell scripts, Python, or LLM agents. It does not try to be a test-management platform. It tries to make &lt;code&gt;tap&lt;/code&gt;, &lt;code&gt;fill&lt;/code&gt;, &lt;code&gt;wait&lt;/code&gt;, screenshots, and UI inspection fast enough that the automation layer disappears from the critical path.&lt;/p&gt;

&lt;p&gt;The short version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Appium&lt;/strong&gt; when you need a full cross-platform mobile test framework.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Handsets&lt;/strong&gt; when you need fast Android UI control from the command line, especially for tap-heavy scripts and LLM agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you searched for "Handsets vs Appium" or "Appium alternative for Android automation", the practical answer is this: Appium is the safer default for broad QA infrastructure, while Handsets is the sharper tool for Android-only automation where speed, scripting, and prompt size matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best answer by use case
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Better choice&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cross-platform Android + iOS test suite&lt;/td&gt;
&lt;td&gt;Appium&lt;/td&gt;
&lt;td&gt;One WebDriver-style framework for both platforms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Android-only shell automation&lt;/td&gt;
&lt;td&gt;Handsets&lt;/td&gt;
&lt;td&gt;Small CLI, no server ceremony, easy CI scripts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM-driven Android agent&lt;/td&gt;
&lt;td&gt;Handsets&lt;/td&gt;
&lt;td&gt;Compact UI table and low per-action latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise device farm with reports&lt;/td&gt;
&lt;td&gt;Appium&lt;/td&gt;
&lt;td&gt;Larger ecosystem and reporting integrations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tap-heavy RPA workflow&lt;/td&gt;
&lt;td&gt;Handsets&lt;/td&gt;
&lt;td&gt;Warm daemon path keeps repeated calls cheap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Existing Selenium/WebDriver team&lt;/td&gt;
&lt;td&gt;Appium&lt;/td&gt;
&lt;td&gt;Familiar mental model and tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That table is the whole comparison in one place. The rest of this post explains the tradeoffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Need&lt;/th&gt;
&lt;th&gt;Appium&lt;/th&gt;
&lt;th&gt;Handsets&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Android support&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iOS support&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol&lt;/td&gt;
&lt;td&gt;WebDriver / HTTP&lt;/td&gt;
&lt;td&gt;Length-prefixed frames over &lt;code&gt;adb forward&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Install on device&lt;/td&gt;
&lt;td&gt;Driver/helper APKs&lt;/td&gt;
&lt;td&gt;One small jar, no visible app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Root required&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tap by visible text&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI-first workflow&lt;/td&gt;
&lt;td&gt;Not really&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM-friendly UI dump&lt;/td&gt;
&lt;td&gt;No, usually XML/page source&lt;/td&gt;
&lt;td&gt;Yes, compact action table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical tap latency&lt;/td&gt;
&lt;td&gt;100-500 ms&lt;/td&gt;
&lt;td&gt;2-7 ms after daemon warmup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best fit&lt;/td&gt;
&lt;td&gt;QA infrastructure&lt;/td&gt;
&lt;td&gt;Scripts, agents, fast Android control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Appium is broader. Handsets is narrower and faster.&lt;/p&gt;

&lt;p&gt;That is the tradeoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup difference
&lt;/h2&gt;

&lt;p&gt;An Appium setup usually has several moving parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Node.js.&lt;/li&gt;
&lt;li&gt;Install Appium.&lt;/li&gt;
&lt;li&gt;Install the Android driver.&lt;/li&gt;
&lt;li&gt;Start the Appium server.&lt;/li&gt;
&lt;li&gt;Configure desired capabilities.&lt;/li&gt;
&lt;li&gt;Connect a client library.&lt;/li&gt;
&lt;li&gt;Run a test session.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is normal for a full framework. It is also more machinery than you want for a small script.&lt;/p&gt;

&lt;p&gt;Handsets starts from the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/elliotgao2/handsets/main/install.sh | bash
hs use
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The device side is a small jar started through &lt;code&gt;app_process&lt;/code&gt; as the Android shell user. There is no root step and no visible app to install.&lt;/p&gt;

&lt;h2&gt;
  
  
  API difference
&lt;/h2&gt;

&lt;p&gt;An Appium test usually looks like WebDriver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xpath&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;//*[@text=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Continue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Handsets keeps the same action as a CLI verb:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or from Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;handsets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Session&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;visible&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Welcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference is not just syntax. It changes how easy it is to compose automation from shell scripts, CI jobs, and LLM tool calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance difference
&lt;/h2&gt;

&lt;p&gt;Appium's architecture is designed around WebDriver. That buys compatibility and ecosystem support, but every action passes through an HTTP session layer.&lt;/p&gt;

&lt;p&gt;For normal test suites, that overhead is often fine. A test that waits for screens, network calls, animations, and assertions will not notice every 100 ms.&lt;/p&gt;

&lt;p&gt;For tap-heavy workflows, it matters.&lt;/p&gt;

&lt;p&gt;In Handsets benchmarks, a warm &lt;code&gt;tap("Continue")&lt;/code&gt; including text lookup runs in roughly &lt;strong&gt;2-7 ms&lt;/strong&gt;. Appium calls commonly land around &lt;strong&gt;100-500 ms&lt;/strong&gt; depending on the device, driver, and session state.&lt;/p&gt;

&lt;p&gt;That difference matters when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An LLM agent takes many small actions.&lt;/li&gt;
&lt;li&gt;A script taps through hundreds of rows.&lt;/li&gt;
&lt;li&gt;A mobile RPA flow spends most of its time in UI actions.&lt;/li&gt;
&lt;li&gt;You want fast failure feedback in a CLI loop.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It matters less when your test spends most of its time waiting on network requests, animations, or backend state. In those suites, Appium's overhead may be a small part of total runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  UI dump difference
&lt;/h2&gt;

&lt;p&gt;Appium usually exposes the Android UI tree as page source. That is useful for tools, but verbose for LLM agents.&lt;/p&gt;

&lt;p&gt;Handsets has a compact UI table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For one Settings screen, a UIAutomator XML dump measured &lt;strong&gt;5,762 tokens&lt;/strong&gt;. The compact Handsets table measured &lt;strong&gt;729 tokens&lt;/strong&gt;. The model still gets the labels and actions it needs.&lt;/p&gt;

&lt;p&gt;That matters if your Android automation is driven by an LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Appium is better
&lt;/h2&gt;

&lt;p&gt;Choose Appium if you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Android and iOS in one framework.&lt;/li&gt;
&lt;li&gt;WebDriver compatibility.&lt;/li&gt;
&lt;li&gt;Cloud device farm integrations.&lt;/li&gt;
&lt;li&gt;Recorders and reporting.&lt;/li&gt;
&lt;li&gt;A mature QA ecosystem.&lt;/li&gt;
&lt;li&gt;Team workflows built around Selenium-style tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Appium is not slow because it is bad. It is slower because it solves a bigger problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Handsets is better
&lt;/h2&gt;

&lt;p&gt;Choose Handsets if you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast Android-only automation.&lt;/li&gt;
&lt;li&gt;Shell-first commands.&lt;/li&gt;
&lt;li&gt;No-root device control.&lt;/li&gt;
&lt;li&gt;Label-based tapping without coordinate scripts.&lt;/li&gt;
&lt;li&gt;A small tool surface for LLM agents.&lt;/li&gt;
&lt;li&gt;Python or subprocess integration without a WebDriver server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core loop is small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs ui
hs tap &lt;span class="s2"&gt;"Sign in"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"you@example.com"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Password"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the lane Handsets is built for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommendation
&lt;/h2&gt;

&lt;p&gt;If you are building a company-wide mobile QA platform, start with Appium.&lt;/p&gt;

&lt;p&gt;If you are building Android-only scripts, LLM agents, CLI automation, RPA flows, or fast smoke checks, Handsets is worth trying first.&lt;/p&gt;

&lt;p&gt;The tools are not enemies. They are optimized for different jobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Handsets a full Appium replacement?
&lt;/h3&gt;

&lt;p&gt;No. Handsets is Android-only and CLI-first. It does not replace Appium for iOS, WebDriver infrastructure, cloud device farms, or report-heavy QA platforms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Handsets faster than Appium?
&lt;/h3&gt;

&lt;p&gt;For small Android UI actions, yes. A warm Handsets text lookup tap is typically in the &lt;strong&gt;2-7 ms&lt;/strong&gt; range, while Appium actions commonly land around &lt;strong&gt;100-500 ms&lt;/strong&gt; depending on setup and device state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Handsets require root?
&lt;/h3&gt;

&lt;p&gt;No. Handsets runs through &lt;code&gt;adb&lt;/code&gt; and a small device-side daemon under the Android shell user. The phone does not need to be rooted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Handsets from Python?
&lt;/h3&gt;

&lt;p&gt;Yes. You can use the Python package with &lt;code&gt;from handsets import Session&lt;/code&gt;, or call &lt;code&gt;hs --json&lt;/code&gt; from any language that can run a subprocess.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which tool should I choose for LLM agents?
&lt;/h3&gt;

&lt;p&gt;For Android-only LLM agents, Handsets is usually the better fit because it can provide a compact action table instead of a large XML tree, and because each action has low overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/best-appium-alternative-for-android-automation/" rel="noopener noreferrer"&gt;Best Appium Alternative for Android Automation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/how-to-automate-android-without-appium/" rel="noopener noreferrer"&gt;How to Automate Android Without Appium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/tapping-android-in-5ms-vs-appium-uiautomator2/" rel="noopener noreferrer"&gt;Tapping Android in 5 ms&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://handsets.dev/blog/handsets-vs-appium/" rel="noopener noreferrer"&gt;https://handsets.dev/blog/handsets-vs-appium/&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>android</category>
      <category>testing</category>
      <category>automation</category>
      <category>appium</category>
    </item>
    <item>
      <title>A Terminal UI for Driving Android Apps</title>
      <dc:creator>Elliot Gao</dc:creator>
      <pubDate>Mon, 25 May 2026 08:55:09 +0000</pubDate>
      <link>https://forem.com/elliotgao2/a-terminal-ui-for-driving-android-apps-1g0d</link>
      <guid>https://forem.com/elliotgao2/a-terminal-ui-for-driving-android-apps-1g0d</guid>
      <description>&lt;p&gt;Most Android automation tools make you choose between two awkward modes.&lt;/p&gt;

&lt;p&gt;You can write scripts, which are repeatable but slow to discover:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs ui
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"you@example.com"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or you can use a visual tool, which is easier to explore but often separate from the thing you later automate.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;hs tui&lt;/code&gt; is the missing middle: a terminal UI that lets you drive an Android app from the keyboard while showing the same action rows you would put in a script.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://handsets.dev/assets/tui.gif?v=20260525" rel="noopener noreferrer"&gt;
    &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhandsets.dev%2Fassets%2Ftui.gif%3Fv%3D20260525" alt="hs tui driving Android from the terminal" width="600" height="413"&gt;
  &lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;It is not a remote desktop. It is not a recorder. It is a live, keyboard-driven inspector for Android's interactive UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TUI opens in your terminal and shows the current interactive elements on the device:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Move with the keyboard. Press Enter to act. If the selected row is a button, it taps. If it is an input field, it opens a small text modal and fills the field.&lt;/p&gt;

&lt;p&gt;The useful part is that the TUI speaks the same vocabulary as the CLI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tap Button "Continue" #continue&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fill EditText "Email" #email&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;swipe up&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;back&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the flow works by hand in &lt;code&gt;hs tui&lt;/code&gt;, you already know what the script should look like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a terminal UI?
&lt;/h2&gt;

&lt;p&gt;Android automation has a discovery problem.&lt;/p&gt;

&lt;p&gt;When a script fails, you often ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does the device see right now?&lt;/li&gt;
&lt;li&gt;What text is actually exposed?&lt;/li&gt;
&lt;li&gt;Is this button clickable?&lt;/li&gt;
&lt;li&gt;Is there more than one matching "Continue"?&lt;/li&gt;
&lt;li&gt;Did the screen change after the tap?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The usual answer is to bounce between commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs ui
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs ui
hs see screen.jpg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works, but it has friction. You are copying labels out of one command and pasting them into another.&lt;/p&gt;

&lt;p&gt;The terminal UI removes that loop. It keeps the UI list on screen and lets you act on the highlighted row.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keyboard model
&lt;/h2&gt;

&lt;p&gt;The controls are intentionally boring:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Key&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;↑&lt;/code&gt; / &lt;code&gt;↓&lt;/code&gt; or &lt;code&gt;j&lt;/code&gt; / &lt;code&gt;k&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Move through interactive elements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Enter&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tap or fill the selected element&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;PgDn&lt;/code&gt; / &lt;code&gt;PgUp&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Swipe the device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Shift+J&lt;/code&gt; / &lt;code&gt;Shift+K&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Swipe faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;←&lt;/code&gt; / &lt;code&gt;Esc&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Android back&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;q&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Quit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is enough for a surprising amount of app navigation.&lt;/p&gt;

&lt;p&gt;The point is not to replace touch. The point is to make exploratory Android automation feel like using a terminal tool instead of a mouse, a screenshot viewer, and a pile of copy-paste.&lt;/p&gt;

&lt;h2&gt;
  
  
  Live UI, not stale dumps
&lt;/h2&gt;

&lt;p&gt;The TUI watches the device state in the background. It polls the accessibility tree and refreshes the list as the app changes.&lt;/p&gt;

&lt;p&gt;That matters because Android screens are not static:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keyboards appear and disappear.&lt;/li&gt;
&lt;li&gt;Lists scroll.&lt;/li&gt;
&lt;li&gt;Buttons enable after validation.&lt;/li&gt;
&lt;li&gt;Loading states replace content.&lt;/li&gt;
&lt;li&gt;Animations keep the app from becoming "idle".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional scripts often wait for idle, dump the tree, act, and repeat. That is safe, but it makes exploration feel choppy.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;hs tui&lt;/code&gt; keeps the display live so you can tap, type, swipe, and watch the list update.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rows are the API
&lt;/h2&gt;

&lt;p&gt;The TUI is built on the same compact UI model as &lt;code&gt;hs ui&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each row is an action-shaped description:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tap   Button    "Continue"  #continue  540,860
fill  EditText  "Email"     #email     540,540
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That format is doing two jobs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It is readable enough for a human in a terminal.&lt;/li&gt;
&lt;li&gt;It is structured enough to turn into automation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the main design choice. The TUI does not show the full XML tree because that is not what you act on. It shows the controls you can use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this helps scripting
&lt;/h2&gt;

&lt;p&gt;Many automation flows start with exploration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the app.&lt;/li&gt;
&lt;li&gt;Find the sign-in path.&lt;/li&gt;
&lt;li&gt;Learn the labels.&lt;/li&gt;
&lt;li&gt;Discover which waits are needed.&lt;/li&gt;
&lt;li&gt;Turn that into a script.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without a TUI, you do that with repeated dumps and screenshots.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;hs tui&lt;/code&gt;, you can walk the app once from the keyboard, then write the script using the labels you saw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs tap &lt;span class="s2"&gt;"Sign in"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_EMAIL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Password"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The manual path and the scripted path share the same model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this helps LLM agents
&lt;/h2&gt;

&lt;p&gt;LLM agents need good observations and cheap actions.&lt;/p&gt;

&lt;p&gt;A screenshot is useful, but it is heavy and often makes the model infer text visually. A full Android XML dump is faithful, but it can be thousands of tokens of layout noise.&lt;/p&gt;

&lt;p&gt;The action table is smaller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TUI uses the same representation humans and agents can both understand. That makes it a useful debugging surface for agent runs: if the model picked the wrong label, you can open the same screen and see what choices it had.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation notes
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;hs tui&lt;/code&gt; is a sibling binary to the core CLI.&lt;/p&gt;

&lt;p&gt;The main &lt;code&gt;hs&lt;/code&gt; binary stays small. When you run &lt;code&gt;hs tui&lt;/code&gt;, it locates and launches &lt;code&gt;handsets-tui&lt;/code&gt; with the current daemon host and port.&lt;/p&gt;

&lt;p&gt;The TUI itself is built with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rust&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ratatui&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crossterm&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;the same length-prefixed Handsets wire protocol&lt;/li&gt;
&lt;li&gt;the same interactive-node filtering used by &lt;code&gt;hs ui&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The device side is still the normal Handsets daemon: one small jar running as the Android shell user through &lt;code&gt;adb&lt;/code&gt;. No root required.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it is not
&lt;/h2&gt;

&lt;p&gt;It is not meant to replace Appium, Espresso, or a full QA platform.&lt;/p&gt;

&lt;p&gt;It is also not a pixel-perfect remote desktop. If you need a visual mirror, &lt;code&gt;hs see&lt;/code&gt; can open the viewer or save screenshots.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;hs tui&lt;/code&gt; is for the moment before and between scripts: when you want to drive the device quickly, learn the UI, and turn that knowledge into repeatable automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/fast-android-ui-automation-with-adb/" rel="noopener noreferrer"&gt;Fast Android UI Automation with ADB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/tap-android-buttons-by-text-command-line/" rel="noopener noreferrer"&gt;How to Tap Android Buttons by Text from the Command Line&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handsets.dev/blog/android-automation-for-llm-agents/" rel="noopener noreferrer"&gt;Android Automation for LLM Agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://handsets.dev/blog/android-terminal-ui/" rel="noopener noreferrer"&gt;https://handsets.dev/blog/android-terminal-ui/&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>android</category>
      <category>automation</category>
      <category>terminal</category>
      <category>rust</category>
    </item>
    <item>
      <title>Stop Wasting Tokens on Android Automation</title>
      <dc:creator>Elliot Gao</dc:creator>
      <pubDate>Sun, 24 May 2026 15:19:41 +0000</pubDate>
      <link>https://forem.com/elliotgao2/stop-wasting-tokens-on-android-automation-1mep</link>
      <guid>https://forem.com/elliotgao2/stop-wasting-tokens-on-android-automation-1mep</guid>
      <description>&lt;h2&gt;
  
  
  Stop Wasting Tokens on Android Automation
&lt;/h2&gt;

&lt;p&gt;Most LLM-driven Android automation starts by showing the model a screen.&lt;/p&gt;

&lt;p&gt;That sounds reasonable. A human looks at the phone, decides what to tap, and taps it. Give the model the same view.&lt;/p&gt;

&lt;p&gt;The problem is that "the same view" is expensive.&lt;/p&gt;

&lt;p&gt;A full screenshot is expensive. A raw Android UI XML dump is also expensive, just in a quieter way. The model reads thousands of tokens of layout machinery before it reaches the handful of labels that matter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Email
Password
Continue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For one step, that waste is easy to ignore. For a 50-step mobile agent trajectory, it becomes the bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  The loop
&lt;/h2&gt;

&lt;p&gt;An Android agent usually does this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read the current screen.&lt;/li&gt;
&lt;li&gt;Decide what to do.&lt;/li&gt;
&lt;li&gt;Tap, type, or swipe.&lt;/li&gt;
&lt;li&gt;Wait for the next screen.&lt;/li&gt;
&lt;li&gt;Repeat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first step is where the token leak begins.&lt;/p&gt;

&lt;p&gt;If you use &lt;code&gt;uiautomator dump&lt;/code&gt;, the model gets XML like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;node&lt;/span&gt; &lt;span class="na"&gt;index=&lt;/span&gt;&lt;span class="s"&gt;"0"&lt;/span&gt; &lt;span class="na"&gt;text=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="na"&gt;resource-id=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;
      &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"android.widget.FrameLayout"&lt;/span&gt;
      &lt;span class="na"&gt;package=&lt;/span&gt;&lt;span class="s"&gt;"com.google.android.apps.nexuslauncher"&lt;/span&gt;
      &lt;span class="na"&gt;content-desc=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;
      &lt;span class="na"&gt;checkable=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt; &lt;span class="na"&gt;checked=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt;
      &lt;span class="na"&gt;clickable=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt; &lt;span class="na"&gt;enabled=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;
      &lt;span class="na"&gt;focusable=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt; &lt;span class="na"&gt;focused=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt;
      &lt;span class="na"&gt;scrollable=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt; &lt;span class="na"&gt;long-clickable=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt;
      &lt;span class="na"&gt;password=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt; &lt;span class="na"&gt;selected=&lt;/span&gt;&lt;span class="s"&gt;"false"&lt;/span&gt;
      &lt;span class="na"&gt;bounds=&lt;/span&gt;&lt;span class="s"&gt;"[0,0][1440,3120]"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is one layout node. It says almost nothing an agent can act on.&lt;/p&gt;

&lt;p&gt;It is not a bug in UIAutomator. XML is a faithful serialization of the accessibility tree. Faithful is not the same as useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;On a few ordinary Android screens, the difference looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;screen&lt;/th&gt;
&lt;th&gt;UIAutomator XML&lt;/th&gt;
&lt;th&gt;Handsets &lt;code&gt;hs ui -i&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Launcher home&lt;/td&gt;
&lt;td&gt;3,153 tokens&lt;/td&gt;
&lt;td&gt;246 tokens&lt;/td&gt;
&lt;td&gt;12.8x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Settings home&lt;/td&gt;
&lt;td&gt;5,762 tokens&lt;/td&gt;
&lt;td&gt;729 tokens&lt;/td&gt;
&lt;td&gt;7.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Settings -&amp;gt; Apps&lt;/td&gt;
&lt;td&gt;4,050 tokens&lt;/td&gt;
&lt;td&gt;320 tokens&lt;/td&gt;
&lt;td&gt;12.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Token counts are from &lt;code&gt;tiktoken&lt;/code&gt; with the GPT-4 encoding. The deeper write-up is &lt;a href="//2026-05-22-android-ui-dump-for-llms.md"&gt;An Android UI Dump for LLMs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The short version: a typical screen that costs 4,000-6,000 tokens as XML can often be represented in a few hundred tokens as an action table.&lt;/p&gt;

&lt;p&gt;Across 50 steps, that is the difference between sending roughly 250k tokens of screen state and sending roughly 25k-40k.&lt;/p&gt;

&lt;p&gt;The agent usually makes the same decision either way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the model actually needs
&lt;/h2&gt;

&lt;p&gt;For UI automation, the model does not need a DOM-shaped tree.&lt;/p&gt;

&lt;p&gt;It needs a list of things it can act on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That table gives the model the useful facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What action is available.&lt;/li&gt;
&lt;li&gt;What label a human sees.&lt;/li&gt;
&lt;li&gt;What type of control it is.&lt;/li&gt;
&lt;li&gt;Where the tool will tap or type.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model can now answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tap "Continue"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It does not have to parse layout ancestors, negative booleans, fully-qualified class names, or four-number bounds rectangles.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rule
&lt;/h2&gt;

&lt;p&gt;For LLM tool output, the optimization rule is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not serialize facts the model cannot use in its next action.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Android XML violates that rule constantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;clickable="false"&lt;/code&gt; on nodes the agent will never click.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;enabled="true"&lt;/code&gt; repeated on almost every node.&lt;/li&gt;
&lt;li&gt;Empty &lt;code&gt;FrameLayout&lt;/code&gt; and &lt;code&gt;LinearLayout&lt;/code&gt; containers.&lt;/li&gt;
&lt;li&gt;Full class names like &lt;code&gt;android.widget.TextView&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Bounds rectangles when the agent only needs a tap point.&lt;/li&gt;
&lt;li&gt;JSON-style key repetition when the reader is a language model, not a parser.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Handsets drops the defaults, shortens the names, computes the center point, and keeps the labels.&lt;/p&gt;

&lt;p&gt;The result is not a smaller XML file. It is a different interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs ui
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Screenshots are still useful
&lt;/h2&gt;

&lt;p&gt;This is not an argument against screenshots.&lt;/p&gt;

&lt;p&gt;Screenshots are useful when layout matters, when visual state matters, or when an app renders important information without accessible labels.&lt;/p&gt;

&lt;p&gt;But screenshots are a poor default for every step. They are large, slow to move, and often force the model to do OCR-like work for text that Android already exposes.&lt;/p&gt;

&lt;p&gt;A better loop is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs ui &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/screen.txt
hs see &lt;span class="nt"&gt;--size&lt;/span&gt; 768 /tmp/screen.jpg   &lt;span class="c"&gt;# only when visual context matters&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Give the model the text UI first. Add the image when the text is not enough.&lt;/p&gt;

&lt;p&gt;That usually saves tokens and makes the action easier to audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more for agents than tests
&lt;/h2&gt;

&lt;p&gt;Traditional mobile tests do not care much about token count. A test runner is not paying to read XML.&lt;/p&gt;

&lt;p&gt;LLM agents are different. Every loop step has a context budget and a cost. If half the prompt is a UI tree full of dead layout nodes, the model is spending attention on junk.&lt;/p&gt;

&lt;p&gt;This shows up in three places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; repeated screen state dominates long trajectories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; large prompts take longer to send and process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability:&lt;/strong&gt; shorter action-oriented context leaves less room for the model to latch onto irrelevant structure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best tool output for an agent is not the most complete representation of the system. It is the smallest representation that preserves the next correct action.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical pattern
&lt;/h2&gt;

&lt;p&gt;For Android, the pattern looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hs use
hs ui
hs tap &lt;span class="s2"&gt;"Sign in"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Email"&lt;/span&gt; &lt;span class="s2"&gt;"you@example.com"&lt;/span&gt;
hs fill &lt;span class="s2"&gt;"Password"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
hs tap &lt;span class="s2"&gt;"Continue"&lt;/span&gt;
hs &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For an LLM, the important handoff is even smaller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Here is the current Android UI. Pick the next action by label.

fill  EditText  "Email"     #email     540,540
fill  EditText  "Password"  #password  540,640  [password]
tap   Button    "Continue"  #continue  540,860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model does not need to know that these nodes live inside three nested &lt;code&gt;FrameLayout&lt;/code&gt;s. It needs to know that "Continue" is a button.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="//2026-05-24-uiautomator2-alternative-for-android-automation.md"&gt;uiautomator2 Alternative for Android Automation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//2026-05-22-android-ui-dump-for-llms.md"&gt;An Android UI Dump for LLMs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//2026-05-23-tapping-android-in-5ms-vs-appium-uiautomator2.md"&gt;Tapping Android in 5 ms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//2026-05-24-how-to-automate-android-apps-without-root.md"&gt;How to Automate Android Apps Without Root&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://handsets.dev/blog/stop-wasting-tokens-on-android-automation/" rel="noopener noreferrer"&gt;https://handsets.dev/blog/stop-wasting-tokens-on-android-automation/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>automation</category>
      <category>android</category>
    </item>
  </channel>
</rss>
