<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: krish pavuluri</title>
    <description>The latest articles on Forem by krish pavuluri (@krish_pavuluri).</description>
    <link>https://forem.com/krish_pavuluri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817490%2F05f4f8f8-0d38-4318-983f-f58bc03ff8ad.png</url>
      <title>Forem: krish pavuluri</title>
      <link>https://forem.com/krish_pavuluri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/krish_pavuluri"/>
    <language>en</language>
    <item>
      <title>How We Built a Chat AI Agent Into Live Device Testing Sessions</title>
      <dc:creator>krish pavuluri</dc:creator>
      <pubDate>Tue, 10 Mar 2026 20:40:36 +0000</pubDate>
      <link>https://forem.com/krish_pavuluri/how-we-built-a-chat-ai-agent-into-live-device-testing-sessions-eg2</link>
      <guid>https://forem.com/krish_pavuluri/how-we-built-a-chat-ai-agent-into-live-device-testing-sessions-eg2</guid>
      <description>&lt;p&gt;We ship a cloud device farm — real Android and iOS devices you can control from a browser. Our users are mostly SDETs and QA engineers running Appium tests.&lt;/p&gt;

&lt;p&gt;The problem we kept hearing: &lt;strong&gt;finding the right locator wastes too much time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The typical workflow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open a device session&lt;/li&gt;
&lt;li&gt;Notice an element on screen&lt;/li&gt;
&lt;li&gt;Switch to Appium Inspector&lt;/li&gt;
&lt;li&gt;Inspect the element tree&lt;/li&gt;
&lt;li&gt;Copy the locator&lt;/li&gt;
&lt;li&gt;Paste it into your test&lt;/li&gt;
&lt;li&gt;Run the test, fail, go back to step 3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We wanted to collapse that loop. So we built a Chat AI Agent that lives inside the device session.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;The agent can see the live device screen. You can ask it in plain English:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"What's the XPath for the equals button?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Give me a UIAutomator2 selector for the digit 7"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"What's the Accessibility ID of the login button?"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it responds instantly with working locators — in whatever language you're using (Java, Python, Swift, Kotlin, WebDriverIO).&lt;/p&gt;

&lt;p&gt;No switching tools. No Appium Inspector. Just ask.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Built It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Screen visibility
&lt;/h3&gt;

&lt;p&gt;Our sessions already stream device screens via WebRTC. We grab frames from the stream at the point of the user's question — a single screenshot at query time. This keeps latency low and avoids sending a continuous video feed to the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  The model
&lt;/h3&gt;

&lt;p&gt;We send the screenshot + user message to a vision-capable LLM. The prompt is structured to return locators in a specific format — we parse the response and render it with syntax highlighting in the UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Locator formats
&lt;/h3&gt;

&lt;p&gt;We support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XPath&lt;/li&gt;
&lt;li&gt;CSS Selector&lt;/li&gt;
&lt;li&gt;UIAutomator2 (Android)&lt;/li&gt;
&lt;li&gt;XCUITest (iOS)&lt;/li&gt;
&lt;li&gt;Accessibility ID&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model is instructed to return all applicable formats for the visible element, not just one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code output
&lt;/h3&gt;

&lt;p&gt;Users pick their language from a dropdown (Java, Python, Swift, Kotlin, WebDriverIO). We wrap the locator in idiomatic framework code for each:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Python / Appium
&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AppiumBy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;XPATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;//android.widget.Button[@content-desc=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;equals&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Java / Appium&lt;/span&gt;
&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findElement&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;By&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;xpath&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"//android.widget.Button[@content-desc='equals']"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  UI integration
&lt;/h3&gt;

&lt;p&gt;The panel sits alongside the device stream — it doesn't overlay the screen. Users can keep testing while asking questions. The conversation history stays within the session.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;The hardest part wasn't the AI integration — it was the prompt engineering. Getting the model to return clean, parseable locator output (not prose with embedded code) required iteration.&lt;/p&gt;

&lt;p&gt;We also found that grounding the model on the &lt;em&gt;visible&lt;/em&gt; screen state (not a DOM or accessibility tree) made responses feel more natural. Users think in terms of what they &lt;em&gt;see&lt;/em&gt;, not what's in the XML hierarchy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The Chat AI Agent is live now in the &lt;a href="https://robotactions.com" rel="noopener noreferrer"&gt;RobotActions portal&lt;/a&gt;. Free trial available.&lt;/p&gt;

&lt;p&gt;We'd love feedback from anyone doing Appium or mobile automation — especially if you've built similar tooling. Drop a comment or reach out directly.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>automation</category>
      <category>mobile</category>
    </item>
  </channel>
</rss>
