<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sharmin Sirajudeen</title>
    <description>The latest articles on Forem by Sharmin Sirajudeen (@sharminsirajudeen).</description>
    <link>https://forem.com/sharminsirajudeen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861545%2Feed19cc1-58d6-44b1-9667-eebf139712d5.jpg</url>
      <title>Forem: Sharmin Sirajudeen</title>
      <link>https://forem.com/sharminsirajudeen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sharminsirajudeen"/>
    <language>en</language>
    <item>
      <title>How a Web Worker Fixed My Dying-Battery Audio (And What I Learned About PWAs the Hard Way)</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 13 Apr 2026 11:54:40 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/how-a-web-worker-fixed-my-dying-battery-audio-and-what-i-learned-about-pwas-the-hard-way-3e0n</link>
      <guid>https://forem.com/sharminsirajudeen/how-a-web-worker-fixed-my-dying-battery-audio-and-what-i-learned-about-pwas-the-hard-way-3e0n</guid>
      <description>&lt;p&gt;I spent the last week modifying an open-source NES emulator to run in the browser as a PWA. I'm an Android developer by trade — Kotlin, Jetpack Compose, Flutter when the project calls for it. This was my first real dive into Web Workers, SharedArrayBuffer, and turning a browser tab into something that feels like a native app.&lt;/p&gt;

&lt;p&gt;Here's what I learned. Some of it was obvious in hindsight. Most of it wasn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem That Started Everything
&lt;/h2&gt;

&lt;p&gt;I wanted to add real-time game modification sliders to a browser-based NES emulator. Speed multiplier, firepower boost, infinite lives — the kind of thing that's trivial if you have access to the game's memory. The emulator (JSNES, open-source) gives you direct access to the NES CPU's RAM via JavaScript. Writing a slider that tweaks &lt;code&gt;cpu.mem[0x0487]&lt;/code&gt; every frame is maybe 10 lines of code.&lt;/p&gt;

&lt;p&gt;I set up a GitHub Codespace, got the emulator running, and tested it in the browser. Everything worked beautifully. Then I opened the same URL on an older Android phone sitting on my desk.&lt;/p&gt;

&lt;p&gt;The game visuals were smooth enough. But the audio — the iconic 8-bit music — sounded like a toy running out of battery. Slow, dragging, painful. Like someone was holding the NES's APU underwater.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Single-Threaded Was the Root Cause
&lt;/h2&gt;

&lt;p&gt;Here's what was happening. The NES generates audio samples at 44,100 Hz, tied directly to CPU emulation. Each frame of emulation produces ~735 audio samples. The browser's Web Audio API expects those samples delivered at a consistent rate.&lt;/p&gt;

&lt;p&gt;On a decent machine, the main thread easily ran the emulator at 60fps + rendered the canvas + fed audio samples. No contention. On the slow Android phone, canvas rendering was choking the main thread. Frames dropped to 30fps. Half the audio samples were generated per second. The Web Audio API played them at the expected rate but ran out halfway — producing that dying-battery sound.&lt;/p&gt;

&lt;p&gt;I tried every hack I could think of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive sample dropping&lt;/strong&gt; — monitored FPS and dropped audio samples when the device struggled. Result: choppy audio instead of slow audio. Not better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Rate Control&lt;/strong&gt; — stretched available samples via interpolation (the algorithm RetroArch uses). Result: alien communication sounds. The pitch was wrong because you can't stretch 22K samples to fill 44K slots without changing the fundamental frequency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-frame catch-up&lt;/strong&gt; — ran 2 NES frames per &lt;code&gt;requestAnimationFrame&lt;/code&gt; when the device fell behind. Result: even slower, because the device couldn't handle 2 frames if it was already struggling with 1.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of it worked because I was treating the symptom, not the disease. The disease was: &lt;strong&gt;audio generation and canvas rendering were fighting for the same thread.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Web Workers
&lt;/h2&gt;

&lt;p&gt;The solution was architecturally simple. Move the NES emulation (CPU + audio generation) to a &lt;strong&gt;Web Worker&lt;/strong&gt;. The main thread only handles canvas rendering, user input, and UI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Worker Thread (setInterval @ 60fps)
├── JSNES emulation (CPU, PPU, APU)
├── Audio sample generation → SharedArrayBuffer
├── Game mod logic (speed, firepower, lives)
└── Frame pixel conversion → postMessage (Transferable)

Main Thread (requestAnimationFrame)
├── Canvas rendering (receives pixels from Worker)
├── Audio playback (reads from SharedArrayBuffer)
├── Keyboard/touch input → postMessage to Worker
└── UI (sliders, toggles, save/load, fullscreen)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: &lt;code&gt;setInterval&lt;/code&gt; in a Web Worker is &lt;strong&gt;not throttled&lt;/strong&gt; when the tab is backgrounded. &lt;code&gt;requestAnimationFrame&lt;/code&gt; on the main thread is. This means the Worker keeps generating audio at a consistent rate regardless of what the renderer is doing. The audio buffer never starves.&lt;/p&gt;

&lt;h2&gt;
  
  
  SharedArrayBuffer: The Zero-Copy Audio Bridge
&lt;/h2&gt;

&lt;p&gt;This was the part I found most interesting, coming from a mobile background where inter-thread communication usually means &lt;code&gt;Handler.post()&lt;/code&gt; or Kotlin coroutine channels.&lt;/p&gt;

&lt;p&gt;The Worker generates ~735 audio samples per frame. Those samples need to reach the main thread's &lt;code&gt;ScriptProcessorNode&lt;/code&gt; with minimal latency. &lt;code&gt;postMessage&lt;/code&gt; adds serialization overhead and scheduling jitter — fine for input events, not great for 44,100 samples per second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SharedArrayBuffer&lt;/strong&gt; gives both threads access to the same memory. The Worker writes audio samples into a ring buffer. The main thread's audio processor reads from the same buffer. Zero copy, zero serialization, microsecond access.&lt;/p&gt;

&lt;p&gt;The layout is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SharedArrayBuffer:
[0-3]   Int32: write index (Worker writes via Atomics.store)
[4-7]   Int32: read index (Main reads via Atomics.store)
[8+]    Float32[]: interleaved L/R audio samples
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Worker writes samples after each &lt;code&gt;nes.frame()&lt;/code&gt; call. The &lt;code&gt;ScriptProcessorNode&lt;/code&gt; on the main thread reads them in its &lt;code&gt;onaudioprocess&lt;/code&gt; callback. The &lt;code&gt;Atomics&lt;/code&gt; operations provide memory ordering guarantees — no locks needed for a single-producer, single-consumer ring buffer.&lt;/p&gt;

&lt;p&gt;One gotcha that cost me an hour: &lt;strong&gt;interleaved audio samples must always be written in pairs&lt;/strong&gt; (left + right). If the available buffer space is odd, you write one L sample without its R, and every subsequent read is shifted by one channel. The fix is one line: &lt;code&gt;samplesToWrite = available &amp;amp; ~1&lt;/code&gt; — force even.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SharedArrayBuffer requires specific HTTP headers:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without these, &lt;code&gt;typeof SharedArrayBuffer === 'undefined'&lt;/code&gt; in every browser. I built a fallback path using &lt;code&gt;postMessage&lt;/code&gt; with Transferable &lt;code&gt;Float32Array&lt;/code&gt; for environments where the headers can't be set.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frame Transfer: Transferable Objects
&lt;/h2&gt;

&lt;p&gt;The NES outputs 256×240 pixels per frame. That's ~245KB of pixel data at 60fps. Copying it via &lt;code&gt;postMessage&lt;/code&gt; would be expensive. &lt;strong&gt;Transferable objects&lt;/strong&gt; solve this — the &lt;code&gt;ArrayBuffer&lt;/code&gt; is moved between threads, not copied. The sending thread loses access to it (it gets "neutered"), but the transfer is essentially free.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Worker: convert pixels and transfer&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pixels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint32Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;61440&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// ... fill pixels from JSNES frameBuffer ...&lt;/span&gt;
&lt;span class="nf"&gt;postMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;frame&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pixels&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="c1"&gt;// pixels.buffer is now neutered — length 0 in Worker&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I used double-buffering: two pixel arrays in the Worker, alternating which one gets filled and transferred. In practice, I found that just reallocating a new &lt;code&gt;Uint32Array(61440)&lt;/code&gt; after each transfer was simpler and fast enough — 245KB allocation at 60fps is well within V8's comfort zone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The PWA Part
&lt;/h2&gt;

&lt;p&gt;Turning this into a Progressive Web App was its own education. A few things I learned:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;iOS Safari has no Fullscreen API.&lt;/strong&gt; Not &lt;code&gt;requestFullscreen&lt;/code&gt;, not &lt;code&gt;webkitRequestFullscreen&lt;/code&gt;, not any variant. I discovered this when the fullscreen button simply did nothing on an iPhone. The only way to get "fullscreen" on iPhone is &lt;code&gt;display: standalone&lt;/code&gt; in your web manifest + adding to home screen. Even then, the status bar stays — Apple never lets you hide it.&lt;/p&gt;

&lt;p&gt;I ended up building a &lt;strong&gt;CSS-simulated fullscreen&lt;/strong&gt;: toggling a body class that hides everything except the game canvas and touch controls. But then the exit button didn't work. Turns out, on iOS, a &lt;code&gt;position: fixed&lt;/code&gt; button placed outside the main touch-responsive container &lt;strong&gt;silently fails to receive touch events&lt;/strong&gt;. The button renders, you can see it, but tapping does nothing. I had to move the exit control inside the same overlay that handles game input. That one cost me a few hours of confused debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PWA icons on iOS must be PNG, not SVG, and RGB not RGBA.&lt;/strong&gt; Safari ignores SVG &lt;code&gt;apple-touch-icon&lt;/code&gt; links entirely. And if your PNG has an alpha channel, iOS sometimes renders a blank or uses its default icon. My custom pixel-art icon only appeared after I converted it from RGBA to RGB using Pillow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service Worker caching is aggressive and separate from Safari's cache.&lt;/strong&gt; Deleting Safari data doesn't clear a PWA's cache. You have to delete the home screen app icon first, then clear Safari data, then re-add. Learned this the hard way when testers kept seeing old versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;viewport-fit: cover&lt;/code&gt; meta tag&lt;/strong&gt; is what lets your app extend under the iPhone notch. Without it, you get black bars.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Background Execution Control
&lt;/h2&gt;

&lt;p&gt;One thing I didn't expect — the Worker architecture gives you easy control over background behavior. Since the emulation loop runs on &lt;code&gt;setInterval&lt;/code&gt; inside a Web Worker (which browsers don't throttle in background tabs), the game keeps running even when the user switches apps or tabs. That's great for audio continuity, but terrible for battery life.&lt;/p&gt;

&lt;p&gt;The fix is trivial: listen for &lt;code&gt;visibilitychange&lt;/code&gt; on the main thread and send a pause/resume message to the Worker. The emulation stops completely when the app is backgrounded and picks up exactly where it left off when the user returns. No state loss, no audio glitch on resume. If you ever need background execution (say, for a music player or a long-running computation), just don't send the pause — the Worker keeps ticking regardless of what the main thread is doing. Having that as a conscious choice rather than a browser-imposed limitation is a nice side effect of the architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;On the same slow Android phone that produced dying-battery audio with the single-threaded architecture: smooth, consistent, correct-speed audio. The Web Worker generates samples at a steady 60fps via &lt;code&gt;setInterval&lt;/code&gt;, completely independent of the main thread's rendering frame rate. The SharedArrayBuffer bridge adds effectively zero latency.&lt;/p&gt;

&lt;p&gt;The visual frames might drop to 30fps on a slow device — the game looks a bit less smooth — but the audio is untouched. That's the right tradeoff. Humans tolerate choppy video far better than choppy audio.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Thread architecture is a day-one decision, not an optimization.&lt;/strong&gt; I built the single-threaded version first because it was faster to prototype. Then I spent more time patching audio hacks than the Worker migration ultimately took. If your app does real-time audio/video processing, put the producer on a separate thread from the start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SharedArrayBuffer is the right tool for high-frequency inter-thread data.&lt;/strong&gt; For audio at 44,100 samples/second, &lt;code&gt;postMessage&lt;/code&gt; adds too much jitter. For input events at 10-30/second, &lt;code&gt;postMessage&lt;/code&gt; is perfectly fine. Match the tool to the frequency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transferable objects are free.&lt;/strong&gt; If you're passing large &lt;code&gt;ArrayBuffer&lt;/code&gt;s between threads via &lt;code&gt;postMessage&lt;/code&gt;, mark them as transferable. Zero copy, zero overhead. Just remember the sender loses access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PWAs on iOS are a different platform entirely.&lt;/strong&gt; Don't assume web APIs work the same. The Fullscreen API doesn't exist. Touch events behave differently for fixed-position elements. Icons need specific formats. Test on an actual iPhone, not just Chrome DevTools mobile emulation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test on the slowest device first.&lt;/strong&gt; If I'd tested on the old Android phone on day one, I would have designed for Workers from the start. Testing only on fast hardware hides architectural problems that become very expensive to fix later.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm a mobile developer (Android/Kotlin, Flutter) exploring the browser as a platform for real-time applications. If you've dealt with Web Workers, SharedArrayBuffer, or PWA quirks on iOS, I'd love to hear about your experiences in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>pwa</category>
      <category>webworkers</category>
    </item>
    <item>
      <title>Building in Public: The Architecture of a Solo Rust Project</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:18:27 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/building-in-public-the-architecture-of-a-solo-rust-project-40jm</link>
      <guid>https://forem.com/sharminsirajudeen/building-in-public-the-architecture-of-a-solo-rust-project-40jm</guid>
      <description>&lt;h1&gt;
  
  
  Building in Public: The Architecture of a Solo Rust Project
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm a solo developer building a Rust project, and I want to talk about what that actually looks like. Not the polished "launched on Product Hunt and got 500 stars" version, but the real one — the architecture decisions made at midnight, the bugs that took days, and the strange irony of using AI to build AI tooling.&lt;/p&gt;

&lt;p&gt;Drengr started as a research question: can I give an AI agent a phone? No venture capital, no team, no timeline pressure. Just curiosity and a problem that felt important enough to spend months on. Building in public means sharing the journey honestly, including the parts that don't look impressive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm Building This Alone
&lt;/h2&gt;

&lt;p&gt;The honest answer is that this project started before I knew it was a project. After ten years of writing Android apps and watching every UI test suite I touched decay faster than we could maintain it, I started experimenting with whether AI could do better. I hacked together a Python script that captured screenshots and sent them to an LLM API with action instructions. It worked badly, but it worked. That script became a prototype, the prototype became an architecture, and the architecture demanded Rust.&lt;/p&gt;

&lt;p&gt;At no point did I sit down and say "I'm going to build a product." I kept solving the next problem. The next problem kept being interesting. Drengr is my first real tool — the first thing I've built that isn't an internal script or a weekend experiment. Six months later, I have about 6,300 lines of Rust, a working MCP server, and the beginning of something I think could matter.&lt;/p&gt;

&lt;p&gt;Solo development has real trade-offs. I don't have anyone to review my code. I don't have anyone to challenge my architectural decisions. When I make a mistake, there's no one to catch it until a user reports a bug. The upside is speed — I can refactor the entire transport layer on a Saturday without scheduling a meeting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Drengr's architecture is built around one core abstraction: the transport layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transport Trait
&lt;/h3&gt;

&lt;p&gt;A single Rust trait defines what it means to "talk to a device":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;Transport&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;capture_screen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="n"&gt;Screenshot&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;get_ui_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="n"&gt;UiElement&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;execute_action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;;()&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;query_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="n"&gt;StateResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three implementations exist: &lt;strong&gt;ADB&lt;/strong&gt; for Android devices and emulators, &lt;strong&gt;simctl&lt;/strong&gt; for iOS simulators, and &lt;strong&gt;Appium&lt;/strong&gt; for cloud device farms. Each speaks a completely different protocol. ADB uses shell commands and binary protocols. Simctl uses Apple's command-line tools. Appium uses HTTP/WebDriver.&lt;/p&gt;

&lt;p&gt;The rest of the codebase doesn't know or care which one is active. The MCP handler, the OODA loop, the screen annotation system — they all work through the trait. Adding a new platform means implementing four methods.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Handler
&lt;/h3&gt;

&lt;p&gt;The MCP server reads JSON-RPC from stdin and writes responses to stdout. This sounds simple until you realize that the device interactions also write to stdout (ADB commands, for instance, produce output). One of my earliest architectural decisions was redirecting child process I/O to avoid polluting the MCP channel.&lt;/p&gt;

&lt;p&gt;The handler routes incoming tool calls to one of three paths: &lt;code&gt;drengr_look&lt;/code&gt; triggers a screen capture and UI tree extraction, &lt;code&gt;drengr_do&lt;/code&gt; dispatches an action to the transport layer, and &lt;code&gt;drengr_query&lt;/code&gt; reads state without side effects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Screen Annotation
&lt;/h3&gt;

&lt;p&gt;When the agent calls &lt;code&gt;drengr_look&lt;/code&gt;, it doesn't just get a raw screenshot. Drengr extracts the UI hierarchy, identifies interactive elements, assigns each a number, and returns both the annotated information and the element metadata. The agent can then say "tap element 7" instead of "tap at coordinates (342, 891)."&lt;/p&gt;

&lt;p&gt;This annotation system is more important than it might seem. It bridges the gap between how the AI perceives the screen (as a visual field) and how the device accepts input (as structured commands). Without it, every interaction requires the agent to estimate pixel coordinates from visual inspection, which is unreliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 6,300 Lines of Rust Taught Me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The compiler is your strictest code reviewer.&lt;/strong&gt; I've lost count of the number of times the borrow checker rejected code that I was confident was correct, only to realize on reflection that it was catching a real problem. Not always a bug — sometimes a design issue. "You can't hold a mutable reference to the transport while also iterating over its UI tree results" is the compiler's way of saying "your data flow is tangled."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If it compiles, it probably works.&lt;/strong&gt; This cliche has limits — logic errors still exist, integration tests still matter — but the density of runtime bugs per line of code is lower than anything I've experienced in other languages. When I do hit a bug, it's almost always in my logic, not in my memory management, not in my error handling, and not in my concurrency model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ownership semantics forced better architecture.&lt;/strong&gt; In Python or JavaScript, I'd have passed the transport connection around freely, probably storing references in three different places. Rust forced me to think about who owns the connection and who borrows it. That constraint produced a cleaner architecture than I would have designed voluntarily.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hardest Bug
&lt;/h2&gt;

&lt;p&gt;MCP over stdio means Drengr reads JSON-RPC requests from stdin and writes responses to stdout. Simple enough — until you spawn an ADB shell command that also writes to stdout.&lt;/p&gt;

&lt;p&gt;The first time this happened, the MCP client received a response that started with a valid JSON-RPC frame, continued with "List of devices attached," and then had another JSON-RPC frame. The client understandably choked.&lt;/p&gt;

&lt;p&gt;The fix required redirecting all child process stdout to &lt;code&gt;/dev/null&lt;/code&gt; or to a captured buffer, using &lt;code&gt;os::unix::io&lt;/code&gt; and &lt;code&gt;dup2&lt;/code&gt; to manage file descriptors at the system call level. It's about 30 lines of code. It took me two full days to debug, because the symptoms were intermittent — ADB only writes to stdout under certain conditions, so the MCP corruption was sporadic.&lt;/p&gt;

&lt;p&gt;This is the kind of bug that doesn't exist in simpler architectures. If Drengr were an HTTP server instead of a stdio server, the problem would never have arisen. But MCP over stdio is the standard for local tool servers, and for good reason — it's simpler for the client, requires no port management, and works inside sandboxed environments. The complexity is justified; the bug was the price of admission.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Irony of Using AI to Build AI Tooling
&lt;/h2&gt;

&lt;p&gt;I use Claude Code daily to work on Drengr. Claude helps me write the code that teaches Claude to use phones. The recursion is not lost on me.&lt;/p&gt;

&lt;p&gt;It's genuinely productive. Claude is good at Rust — it understands ownership patterns, suggests idiomatic approaches, and catches issues I miss. When I was implementing the situation engine, Claude helped me think through the state comparison logic. When I was wrestling with async trait objects, Claude explained the &lt;code&gt;Pin&amp;amp;lt;Box&amp;amp;lt;dyn Future&amp;amp;gt;&amp;amp;gt;&lt;/code&gt; pattern in a way that finally clicked.&lt;/p&gt;

&lt;p&gt;The irony runs deeper, though. Every improvement I make to Drengr makes Claude slightly better at interacting with mobile devices. A better screen annotation system means Claude gets better information. A better situation engine means Claude makes fewer mistakes. I'm building a tool that improves the capability of the AI that helps me build the tool.&lt;/p&gt;

&lt;p&gt;I don't think this is unique to my project. Every developer using AI to build AI tools is in this feedback loop. But working on it daily makes the loop very tangible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's on the Roadmap
&lt;/h2&gt;

&lt;p&gt;Three things I'm actively working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dashboard.&lt;/strong&gt; A web interface for visualizing test runs, reviewing agent decisions, and correlating UI actions with network traffic. The technical spec is written; implementation is next.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real-time steering.&lt;/strong&gt; The ability to watch an agent run and redirect it mid-session. "Stop exploring settings, go test the checkout flow instead." This requires a WebSocket connection between the dashboard and the running Drengr process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Network monitoring.&lt;/strong&gt; An SDK that apps can integrate to capture network traffic during Drengr sessions. This lets the dashboard show what API calls happened alongside each UI action — invaluable for debugging integration issues.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Get Involved
&lt;/h2&gt;

&lt;p&gt;Drengr is proprietary, but the community is open. I've set up &lt;a href="https://github.com/SharminSirajudeen/drengr-community/discussions" rel="noopener noreferrer"&gt;GitHub Discussions&lt;/a&gt; for questions, feedback, and feature requests. The areas where I'd most appreciate input:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Testing on diverse devices.&lt;/strong&gt; I develop on a limited set of emulator configurations. Reports of how Drengr behaves on different Android versions, screen sizes, and manufacturer overlays are extremely valuable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt engineering for test scenarios.&lt;/strong&gt; The quality of Drengr's autonomous testing depends heavily on how the goal is expressed. I'm collecting effective prompts and would love contributions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bug reports and feature ideas.&lt;/strong&gt; The best way to shape Drengr's direction is to use it and tell me what's missing.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or just try it. &lt;code&gt;curl -fsSL https://drengr.dev/install.sh | bash&lt;/code&gt;. Connect a device. Point Claude at it. Tell me what happens.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The best feedback isn't "great project." It's "I tried this and it broke." That's how the tool gets better.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Building in public means accepting that people will see the rough edges. I'm okay with that. The rough edges are where the interesting problems live.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Why I Chose Rust Over C and C++ for Drengr</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:14:23 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/why-i-chose-rust-over-c-and-c-for-drengr-2fk3</link>
      <guid>https://forem.com/sharminsirajudeen/why-i-chose-rust-over-c-and-c-for-drengr-2fk3</guid>
      <description>&lt;h1&gt;
  
  
  Why I Chose Rust Over C and C++ for Drengr
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I tell people I built a mobile automation tool in Rust, the first question is always "why not Python?" I've written about that in &lt;a href="https://dev.to/blog/why-not-python"&gt;a separate post&lt;/a&gt;. But the question that actually kept me up at night during the early architecture phase was different: &lt;strong&gt;why not C or C++?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drengr is a CLI tool that talks to Android devices over ADB, iOS simulators over simctl, and cloud devices over Appium WebDriver. It parses UI trees, captures screenshots, manages concurrent device sessions, and serves as an MCP server over stdio. This is systems programming territory. C and C++ have owned this space for decades. So why Rust?&lt;/p&gt;

&lt;p&gt;This isn't a "Rust vs C++" holy war post. I've worked with C and C++ in different contexts over the years — JNI bridges and NDK modules at work when Java wasn't fast enough for real-time audio processing or custom camera pipelines, a raytracer in C++ during university that taught me more about segfaults than about light, and the usual Arduino/embedded experiments that every CS student does at some point. Enough to know what these languages are good at and where they hurt. This is an honest account of a specific decision for a specific project, with the trade-offs I actually faced.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Case for C
&lt;/h2&gt;

&lt;p&gt;C was tempting. ADB itself is written in C++. The Android debug bridge protocol is well-documented at the C level. I could have called into ADB's libraries directly, skipping the subprocess overhead entirely. A C binary would be tiny — potentially under 1MB with static linking and aggressive stripping.&lt;/p&gt;

&lt;p&gt;I seriously considered it. For about two days.&lt;/p&gt;

&lt;p&gt;The problem crystallized when I started sketching the MCP server. MCP is JSON-RPC 2.0 over stdio. That means parsing JSON, routing method calls, managing request/response correlation, handling concurrent tool invocations. In C, I'd need a JSON parser (jansson? cJSON? write my own?), string handling that doesn't segfault, and manual memory management for every request/response lifecycle.&lt;/p&gt;

&lt;p&gt;I've seen enough C codebases to know what this looks like. It looks like 60% of your code being memory management boilerplate, and the remaining 40% being the actual logic you care about. For a research project where I need to iterate fast and try experimental approaches to screen parsing and AI agent loops, that ratio is fatal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Case for C++
&lt;/h2&gt;

&lt;p&gt;C++ was a stronger contender. Modern C++ (17/20) has smart pointers, string_view, std::optional, std::variant — many of the ergonomic features that make Rust pleasant to write. The ADB ecosystem is native C++. I could use nlohmann/json for parsing. The standard library has threads, mutexes, condition variables.&lt;/p&gt;

&lt;p&gt;Three things killed it for me:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Build System Problem
&lt;/h3&gt;

&lt;p&gt;I wanted a single static binary that anyone could curl and run. No shared library dependencies, no runtime requirements, no "install libfoo-dev first." In Rust, this is &lt;code&gt;cargo build --release --target x86_64-unknown-linux-musl&lt;/code&gt;. Done.&lt;/p&gt;

&lt;p&gt;In C++, static linking is a odyssey. CMake or Meson? Which standard library — libstdc++ or libc++? Static linking glibc is technically possible but discouraged and produces larger binaries with potential compatibility issues. Musl works but you need a separate toolchain. Cross-compilation for Apple Silicon from Linux? I'd need a cross-compiler toolchain per target triple.&lt;/p&gt;

&lt;p&gt;Cargo handles all of this. I add a target, run the build, get a binary. The CI matrix in my GitHub Actions workflow is 20 lines. The equivalent CMake + cross-compilation setup would be 200+.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Concurrency Without Fear
&lt;/h3&gt;

&lt;p&gt;Drengr manages multiple concurrent operations: the MCP server handles requests while the SDK server listens for in-app network events, the OODA loop runs autonomous agent sessions, and the explore mode does BFS traversal with concurrent screen captures. These all share state — the current device transport, the screen annotation cache, the situation engine.&lt;/p&gt;

&lt;p&gt;In C++, shared mutable state across threads means choosing between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Raw mutexes with manual lock/unlock discipline (and hoping you never forget)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Atomic operations for primitives (and hoping your lock-free algorithm is actually correct)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Higher-level abstractions like folly::Synchronized (and adding Facebook's folly as a dependency)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data races in C++ are undefined behavior. Not "your program crashes." Undefined behavior. The compiler is allowed to do literally anything. Time travel. Nasal demons. In practice, it means subtle corruption that shows up three hours into a test session as a garbled screenshot or a silently wrong element count.&lt;/p&gt;

&lt;p&gt;In Rust, the type system prevents data races at compile time. If I try to share a mutable reference across threads without proper synchronization, it doesn't compile. Period. The compiler forces me to use &lt;code&gt;Arc&amp;amp;lt;Mutex&amp;amp;lt;T&amp;amp;gt;&amp;amp;gt;&lt;/code&gt; or channels or atomics explicitly. I can't accidentally share a raw pointer to a screen buffer across two async tasks.&lt;/p&gt;

&lt;p&gt;For a tool that manages real device sessions — where a bug could mean sending the wrong tap to the wrong device — this isn't a nice-to-have. It's a requirement.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Dependency Story
&lt;/h3&gt;

&lt;p&gt;Drengr depends on reqwest (HTTP client), tokio (async runtime), serde (serialization), image (screenshot processing), and about 30 other crates. Adding a dependency in Rust is one line in Cargo.toml. Cargo downloads, compiles, and statically links it. Version resolution is automatic. Security advisories are tracked by &lt;code&gt;cargo audit&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In C++, every dependency is a project. Do they use CMake? Meson? Autotools? Their own bespoke build system? Do they support static linking? Are their transitive dependencies compatible with mine? The Conan and vcpkg package managers have improved this, but they're still far from Cargo's "it just works" experience.&lt;/p&gt;

&lt;p&gt;I estimated that managing C++ dependencies alone would cost me 2-3 weeks of the early development timeline. In a solo project where every week counts, that's not acceptable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Miss From C/C++
&lt;/h2&gt;

&lt;p&gt;Honesty requires admitting what Rust costs me.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compile Times
&lt;/h3&gt;

&lt;p&gt;A clean build of Drengr takes about 90 seconds. An incremental build after touching one file takes 8-12 seconds. The equivalent C project would compile in under 5 seconds clean, under 1 second incremental. When I'm iterating on screen parsing logic and want to test against a real device, those seconds add up.&lt;/p&gt;

&lt;p&gt;I've mitigated this with &lt;code&gt;cargo watch&lt;/code&gt; and by structuring the crate to minimize recompilation, but it's a real cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Learning Curve
&lt;/h3&gt;

&lt;p&gt;I knew C and C++ before I knew Rust. The borrow checker's mental model — ownership, borrowing, lifetimes — took weeks to internalize. There were days early in the project where I spent more time fighting the compiler than writing features. Async Rust made it worse: pinning, Send/Sync bounds, the colored function problem.&lt;/p&gt;

&lt;p&gt;If I'd written Drengr in C++, the first prototype would have been done a week earlier. No question. But I believe the Rust version has fewer bugs, and I spend almost zero time debugging memory issues. That trade-off has compounded in my favor over the months since.&lt;/p&gt;

&lt;h3&gt;
  
  
  FFI Friction
&lt;/h3&gt;

&lt;p&gt;ADB is a C++ tool. Some interactions would be more natural in C++ — direct FFI into ADB's libraries, for example. Instead, I shell out to the adb binary as a subprocess. It works, but it adds latency (spawning a process per command) and complexity (parsing stdout). A C++ implementation could potentially link against libadb directly.&lt;/p&gt;

&lt;p&gt;In practice, the subprocess approach has been fine. ADB commands complete in 10-50ms typically, and the parsing is straightforward. But it's an architectural compromise I wouldn't need in C++.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;After six months of development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;~6,300 lines of Rust&lt;/strong&gt; — this includes the MCP server, three device transports (ADB, simctl, Appium), the OODA loop, the explore mode, the test runner, the SDK server, screen annotation, and the situation engine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero memory-related bugs&lt;/strong&gt; in production. Not one use-after-free, double-free, buffer overflow, or data race&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;189 tests&lt;/strong&gt;, all passing. The test suite runs in under 3 seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Binary size: ~15MB&lt;/strong&gt; stripped, with LTO fat optimization. A C equivalent might be 3-5MB, but 15MB for a tool that includes an HTTP client, JSON parser, image processing, and async runtime is reasonable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cold start: ~15ms&lt;/strong&gt; to first MCP response. This matters when AI agents are waiting&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;If I started over tomorrow, I'd still choose Rust. But I'd do a few things differently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with synchronous code, add async later.&lt;/strong&gt; I went async-first with tokio, which complicated the early prototyping phase. Many of the ADB interactions don't benefit from async — they're sequential command-response pairs. I could have started synchronous and migrated the concurrent parts later.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use fewer abstractions early.&lt;/strong&gt; I over-engineered the transport trait in the first version. Three concrete implementations of a simple interface would have been clearer than a trait with twelve methods and two associated types.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accept more unsafe.&lt;/strong&gt; I avoided unsafe entirely for the first four months. Some of the ADB binary protocol parsing would have been cleaner with unsafe pointer arithmetic in a well-tested, isolated module. Rust's unsafe isn't C — it's a clearly bounded region where you tell the compiler "I've verified this manually." I was too cautious.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Answer
&lt;/h2&gt;

&lt;p&gt;The real reason I chose Rust over C and C++ isn't any single technical argument. It's this: Rust lets me write systems-level code at the speed I think, with the confidence that the compiler has caught the classes of bugs that would otherwise cost me debugging days.&lt;/p&gt;

&lt;p&gt;For a solo developer building a research project that interacts with real hardware, manages concurrent sessions, and serves as infrastructure for AI agents — that confidence isn't a luxury. It's the difference between shipping and not shipping.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I don't have a team to review my pointer arithmetic. I don't have a QA department to catch my data races. I have the Rust compiler. And it's the most reliable colleague I've ever worked with.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;C and C++ are extraordinary languages. They power the systems Drengr sits on top of — the operating systems, the ADB daemon, the simctl infrastructure. I have deep respect for them. But for this project, at this scale, as a solo developer? Rust was the right call.&lt;/p&gt;

&lt;p&gt;The binary works. The code is correct. And I sleep well at night knowing the compiler has my back.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>From ADB Shell to AI Agent: The Quiet Revolution in Mobile Automation</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:12:22 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/from-adb-shell-to-ai-agent-the-quiet-revolution-in-mobile-automation-7jn</link>
      <guid>https://forem.com/sharminsirajudeen/from-adb-shell-to-ai-agent-the-quiet-revolution-in-mobile-automation-7jn</guid>
      <description>&lt;h1&gt;
  
  
  From ADB Shell to AI Agent: The Quiet Revolution in Mobile Automation
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Mobile test automation has a longer history than most developers realize, and the AI-driven approach I'm exploring with Drengr sits at the end of a progression that started with raw ADB shell commands in 2009. Understanding that progression matters — not because history is inherently interesting (though I think it is), but because each generation solved real problems while creating new ones. Every mobile automation tool, including mine, is a response to the limitations of what came before. Knowing those limitations helps evaluate what's genuinely new and what's just repackaging.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ADB Era (2009-2012)
&lt;/h2&gt;

&lt;p&gt;Android Debug Bridge shipped with the Android SDK, and it included a deceptively simple capability: &lt;code&gt;adb shell input&lt;/code&gt;. You could inject taps, swipes, and key events from a terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adb shell input tap 500 300
adb shell input text &lt;span class="s2"&gt;"hello"&lt;/span&gt;
adb shell input swipe 500 1500 500 500 300
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Developers wrote bash scripts that chained these commands together. Open the app, wait 2 seconds, tap the login button at coordinates (340, 780), type the username, tap the next field at (340, 860), type the password.&lt;/p&gt;

&lt;p&gt;The problems were immediate and severe. Coordinates were absolute pixels. A script written for a 1080p phone broke on a 720p phone. A script written for one app version broke when the developer moved a button 50 pixels down. There was no way to query the UI state — you sent commands blind and hoped for the best.&lt;/p&gt;

&lt;p&gt;But ADB shell automation proved something important: developers wanted to automate mobile testing, even with terrible tools. The demand was real.&lt;/p&gt;

&lt;h2&gt;
  
  
  UIAutomator and Espresso (2012-2015)
&lt;/h2&gt;

&lt;p&gt;Google responded with proper frameworks. UIAutomator provided black-box testing — you could find elements by resource ID, text, or description, rather than coordinates. Espresso provided white-box testing for Android — fast, deterministic tests that ran inside the app process.&lt;/p&gt;

&lt;p&gt;These were real, production-quality tools. Espresso, in particular, is excellent. Its automatic synchronization with the UI thread eliminates an entire category of flaky tests. If you're doing Android-only testing with access to the source code, Espresso remains hard to beat in 2026.&lt;/p&gt;

&lt;p&gt;The limitations: both are Android-only, language-locked to Java or Kotlin, and require compilation against the app. You can't use Espresso to test someone else's app. You can't use UIAutomator for iOS. And for teams building cross-platform products, maintaining separate test suites for each platform is expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Appium's Universal Vision (2013-2020)
&lt;/h2&gt;

&lt;p&gt;Appium had an ambitious idea: apply the WebDriver protocol — the same standard that powered Selenium for web testing — to mobile devices. Write tests in any language. Run them against any platform. One API to rule them all.&lt;/p&gt;

&lt;p&gt;The vision was compelling, and Appium built a real foundation. It proved that cross-platform mobile testing was possible. Major companies adopted it. A huge ecosystem of plugins, drivers, and integrations grew around it.&lt;/p&gt;

&lt;p&gt;But the architecture carried inherent weight. Appium runs a Node.js server that translates WebDriver commands into platform-specific actions through a chain of drivers. Setting up Appium meant installing Node.js, Java (for the Android driver), the appropriate SDKs, and getting all the versions to align. Session management was fragile. Tests that passed on one Appium version broke on the next. "Flaky tests" became almost synonymous with mobile automation in many teams.&lt;/p&gt;

&lt;p&gt;Appium built the foundation. I want to be clear about that — a lot of what exists today in mobile automation stands on Appium's groundwork.&lt;/p&gt;

&lt;h2&gt;
  
  
  Maestro's Simplification (2022-2024)
&lt;/h2&gt;

&lt;p&gt;Maestro, from mobile.dev, asked a sharp question: what if mobile testing was actually simple? Their answer was YAML-based test flows that you could write in minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;appId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;com.example.app&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;launchApp&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tapOn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sign&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;In"&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;inputText&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user@example.com"&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_field"&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tapOn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Continue"&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;assertVisible&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Welcome&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;back"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five-minute setup. No WebDriver server. No driver management. Just a CLI that talked directly to the device. Maestro proved that developer UX matters in testing tools — that a tool people actually enjoy using gets adopted, even if it has fewer features than the heavyweight alternative.&lt;/p&gt;

&lt;p&gt;What Maestro didn't change: you still wrote every test manually. Every flow, every assertion, every edge case had to be authored by a human who understood the app. The tool was simpler, but the work was the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Shift (2024-2026)
&lt;/h2&gt;

&lt;p&gt;Two things happened in 2024-2025 that opened a genuinely new direction for mobile automation.&lt;/p&gt;

&lt;p&gt;First, multimodal LLMs became good enough to reliably interpret screenshots. Not perfectly — I've written about the limitations — but well enough to identify buttons, text fields, navigation elements, and app state from a screenshot alone. The agent could &lt;em&gt;see&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Second, Anthropic published the Model Context Protocol. MCP gave those capable-but-isolated LLMs a standard way to discover and invoke external tools. An AI model could now say "I want to tap element 5 on this screen" and have that intention reliably translated into a device action through a well-defined protocol.&lt;/p&gt;

&lt;p&gt;These two ingredients — vision and tool use — are what make AI-driven mobile testing possible. Not just theoretically possible, but practically achievable by a solo developer building in Rust on weekends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I Think This Is Heading
&lt;/h2&gt;

&lt;p&gt;The progression I see is from &lt;strong&gt;imperative&lt;/strong&gt; to &lt;strong&gt;declarative&lt;/strong&gt; to &lt;strong&gt;goal-oriented&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;ADB was imperative: tap here, swipe there, type this. Espresso was declarative: find this element, verify this state. Maestro was declarative with better DX.&lt;/p&gt;

&lt;p&gt;Drengr is my attempt at goal-oriented: "verify that a user can sign up, log in, and post a message." The agent figures out the how. It adapts to the specific app. It handles UI variations and unexpected states. You describe what should work, not how to test it.&lt;/p&gt;

&lt;p&gt;I'm not claiming this is solved. The previous sections of this blog document the limitations in detail. But I do believe the direction is correct: AI agents that explore apps like humans do, finding bugs through curiosity rather than scripts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison
&lt;/h2&gt;

&lt;p&gt;This is my honest assessment of the current landscape. I've tried to be fair — every tool on this list solves real problems for real teams.&lt;/p&gt;

&lt;p&gt;Feature&lt;br&gt;
Appium&lt;br&gt;
Maestro&lt;br&gt;
Detox&lt;br&gt;
Espresso / XCUITest&lt;br&gt;
Drengr&lt;/p&gt;

&lt;p&gt;Setup complexity&lt;br&gt;
High&lt;br&gt;
Low&lt;br&gt;
Medium&lt;br&gt;
Medium&lt;br&gt;
Minimal&lt;/p&gt;

&lt;p&gt;Cross-platform&lt;br&gt;
Yes&lt;br&gt;
Yes&lt;br&gt;
React Native only&lt;br&gt;
No (platform-specific)&lt;br&gt;
Yes&lt;/p&gt;

&lt;p&gt;AI-driven&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
Yes&lt;/p&gt;

&lt;p&gt;Script-free testing&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
Yes&lt;/p&gt;

&lt;p&gt;Single binary install&lt;br&gt;
No&lt;br&gt;
Yes&lt;br&gt;
No&lt;br&gt;
N/A (built-in)&lt;br&gt;
Yes&lt;/p&gt;

&lt;p&gt;MCP support&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
No&lt;br&gt;
Native&lt;/p&gt;

&lt;p&gt;Deterministic results&lt;br&gt;
Mostly&lt;br&gt;
Yes&lt;br&gt;
Yes&lt;br&gt;
Yes&lt;br&gt;
No (AI-dependent)&lt;/p&gt;

&lt;p&gt;Test authoring effort&lt;br&gt;
High&lt;br&gt;
Low&lt;br&gt;
Medium&lt;br&gt;
Medium-High&lt;br&gt;
Minimal&lt;/p&gt;

&lt;p&gt;Maturity&lt;br&gt;
Very mature&lt;br&gt;
Mature&lt;br&gt;
Mature&lt;br&gt;
Very mature&lt;br&gt;
Early&lt;/p&gt;

&lt;p&gt;I want to call attention to the "Deterministic results" row. Drengr is the only "No" in that column, and that matters. When you run an Espresso test, you get the same result every time. When you run a Drengr exploration, you might get different paths, different findings, different coverage. That's a feature for exploratory testing and a limitation for regression testing. Both are valid use cases; the right tool depends on what you need.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Appium built the foundation. Maestro proved that developer UX matters. I built Drengr because I saw a gap: what if the test itself was intelligent?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Whether that intelligence proves more valuable than determinism in practice is still an open question. I have early evidence that it is, for certain types of testing. But I'd rather present the question honestly than claim to have answered it definitively.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Giving Claude a Phone: How I Built an MCP Server for Mobile Devices</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:08:19 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/giving-claude-a-phone-how-i-built-an-mcp-server-for-mobile-devices-270b</link>
      <guid>https://forem.com/sharminsirajudeen/giving-claude-a-phone-how-i-built-an-mcp-server-for-mobile-devices-270b</guid>
      <description>&lt;h1&gt;
  
  
  Giving Claude a Phone: How I Built an MCP Server for Mobile Devices
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After a decade of writing Android apps, I'd accepted a certain rhythm: write the code, build, deploy to a device, tap around manually, find the bug, go back to the IDE, fix it, repeat. When AI coding assistants arrived, they changed most of that loop — Claude could write a RecyclerView adapter with DiffUtil callbacks and proper coroutine scoping faster than I could type the class name. But it couldn't tap a single button on the emulator sitting right next to it. The code was flawless. The app was running. And the AI that wrote it had absolutely no way to verify its own work.&lt;/p&gt;

&lt;p&gt;That disconnect — combined with years of watching Espresso and Appium test suites rot faster than teams could maintain them — made me think there had to be a better way. What if the AI could see the screen, understand what it's looking at, and interact with the app directly? Not through brittle element IDs, but through actual comprehension. This is the story of how I built Drengr — my first real tool, an MCP server that gives AI agents eyes and hands on mobile devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Frustration That Started It
&lt;/h2&gt;

&lt;p&gt;If you've used Claude or any capable LLM for mobile development, you've hit this wall. The AI helps you write code, debug layouts, even architect entire features. But the moment you need to verify something on an actual device, you're on your own. Copy the code, build, deploy, tap around, find the bug, go back to the AI, describe what you saw in words.&lt;/p&gt;

&lt;p&gt;It's 2026, and the feedback loop between AI and mobile devices is still mediated entirely by human hands and human descriptions. As someone who spent ten years in that loop, it felt wrong to me. Not because automation is always better, but because the information loss is enormous. I can describe a broken layout to Claude, but Claude seeing the broken layout is fundamentally different.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Insight: MCP as the Bridge
&lt;/h2&gt;

&lt;p&gt;Anthropic's Model Context Protocol gave me the architecture I needed. MCP defines a standard way for AI models to discover and invoke tools — a JSON-RPC protocol over stdio or HTTP. Instead of building a bespoke integration, I could build an MCP server that exposes mobile device capabilities as tools that any MCP-compatible client can call.&lt;/p&gt;

&lt;p&gt;The key insight was constraint. I didn't need to expose every possible device operation. I needed exactly three tools that would give an AI agent enough capability to understand and interact with any mobile app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Tools, Three Verbs
&lt;/h3&gt;

&lt;p&gt;Drengr exposes exactly three MCP tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;drengr_look&lt;/strong&gt; — Observes the current screen. Captures a screenshot, extracts the UI hierarchy, and returns an annotated view where every interactive element is numbered. The agent sees what a user would see, but with machine-readable structure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;drengr_do&lt;/strong&gt; — Executes an action. Tap element 3, type "hello world", swipe up, press back. These are the hands.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;drengr_query&lt;/strong&gt; — Reads device state without side effects. Check if an element exists, read text content, get the current activity name. This is the quiet observer — it never changes anything.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. Three tools. Every mobile interaction I've needed — from opening apps to navigating complex flows to filling forms — reduces to sequences of look, do, and query.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Actually Does With a Phone
&lt;/h2&gt;

&lt;p&gt;Let me describe a real session. I asked Claude, through Drengr, to "open YouTube and find a video about the Model Context Protocol."&lt;/p&gt;

&lt;p&gt;Claude called &lt;code&gt;drengr_look&lt;/code&gt; first. It received back an annotated screenshot showing the home screen with numbered elements — the app drawer, status bar icons, and the YouTube icon labeled as element 14. Claude called &lt;code&gt;drengr_do&lt;/code&gt; with &lt;code&gt;{"action": "tap", "element": 14}&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;YouTube opened. Claude called &lt;code&gt;drengr_look&lt;/code&gt; again. Now it could see the YouTube home feed with a search icon at element 2. It tapped that, got a keyboard and search field, typed "Model Context Protocol MCP", and hit enter. Results appeared. Claude called &lt;code&gt;drengr_look&lt;/code&gt; one more time, identified the first relevant result, and tapped it.&lt;/p&gt;

&lt;p&gt;Total time: about 40 seconds. Total human intervention: zero. Claude navigated an app it had never been configured to use, adapting to whatever UI state it encountered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting It Up
&lt;/h2&gt;

&lt;p&gt;The MCP configuration is minimal. Here's what goes in your &lt;code&gt;claude_desktop_config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"drengr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"drengr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"DRENGR_PLATFORM"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"android"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire integration. Drengr ships as a single binary — no Python virtualenv, no npm dependencies, no Docker container. You install it, point your MCP client at it, and Claude gains the ability to interact with whatever device is connected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;I want to be transparent about where this breaks down, because it does break down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vision isn't perfect.&lt;/strong&gt; The UI hierarchy doesn't always capture everything visible on screen. Custom-drawn views, game canvases, and some Flutter widgets can appear as opaque rectangles. The agent can see the screenshot, but without structured element data, it's guessing at tap coordinates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Some gestures are hard to express.&lt;/strong&gt; A simple tap or swipe works reliably. But complex gestures — pinch to zoom, long-press-then-drag, multi-finger interactions — are difficult to represent in a tool call. I've implemented the common ones, but there's a long tail of interactions that don't map cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency adds up.&lt;/strong&gt; Each look-do cycle involves capturing a screenshot, extracting the UI tree, sending it to the AI, waiting for a decision, and executing the action. On a fast local setup, each cycle takes 3-5 seconds. Over a network to a cloud device, it can be 8-12 seconds. For a 20-step flow, that's minutes of wall time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token costs are real.&lt;/strong&gt; Screenshots and UI trees are not small. A single &lt;code&gt;drengr_look&lt;/code&gt; response can be several thousand tokens. A complex navigation flow might consume 50,000-100,000 tokens. This isn't free, and it's something I think about when designing how much context to include in each response.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Changes
&lt;/h2&gt;

&lt;p&gt;The immediate application is testing — give Claude a goal, let it explore the app, report what it finds. But I think the more interesting implication is broader. MCP mobile support means AI agents can participate in workflows that were previously human-only. Filing bug reports with actual screenshots. Verifying that a deployment worked on a real device. Walking through a user flow to understand it before writing code.&lt;/p&gt;

&lt;p&gt;The gap between "AI that understands code" and "AI that understands the product" has always been the device. Drengr is my attempt to close that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I'm working on a dashboard for visualizing test runs, real-time network monitoring so the agent can correlate UI actions with API calls, and a steering system that lets you redirect the agent mid-run. The core — three tools, one binary, MCP-native — won't change. Everything else is about making that core more useful.&lt;/p&gt;

&lt;p&gt;If you want to try it: &lt;code&gt;curl -fsSL https://drengr.dev/install.sh | bash&lt;/code&gt;. It takes about 10 seconds. I'd genuinely appreciate feedback on what works, what doesn't, and what you'd want it to do that it can't yet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Field Notes: How Drengr's Architecture Aligns with (and Diverges from) Current Research</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:06:18 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/field-notes-how-drengrs-architecture-aligns-with-and-diverges-from-current-research-1ifh</link>
      <guid>https://forem.com/sharminsirajudeen/field-notes-how-drengrs-architecture-aligns-with-and-diverges-from-current-research-1ifh</guid>
      <description>&lt;h1&gt;
  
  
  Field Notes: How Drengr's Architecture Aligns with (and Diverges from) Current Research
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I've been an Android engineer for about ten years. I still remember the first time I discovered Espresso. I was genuinely thrilled — here was a framework from Google, deeply integrated with the Android SDK, that could simulate real user behavior and verify UI state. I dove in headfirst. Wrote hundreds of tests. Felt like I was doing engineering the right way.&lt;/p&gt;

&lt;p&gt;Then reality set in. Tests that passed locally failed on CI because of animation timing. Tests that worked on a Pixel broke on a Samsung because of slightly different view hierarchies. A designer moved a button into a BottomSheet and forty tests turned red overnight — none of them testing anything related to that button. I spent more time maintaining the test suite than it saved me in bug prevention. And this was Google's own tool, built by one of the most capable engineering organizations on the planet. If Espresso was the best we had, the problem wasn't implementation — it was the entire approach.&lt;/p&gt;

&lt;p&gt;I moved through Appium, UIAutomator, tried Maestro when it came out. Each one was a refinement of the same fundamental idea: match elements by ID or XPath, perform actions, assert state. And each one broke the same way — the moment the UI evolved, the tests fossilized. I've sat in sprint retrospectives where someone says "the UI tests are red again" and everyone nods like it's weather. And the part that quietly frustrated me most: the decision-makers above me — experienced, respected leaders who'd built careers on shipping great mobile products — had accepted these tools as the ceiling. Not out of laziness, but out of familiarity. When every conference talk, every "best practices" blog post, and every Google I/O session tells you Espresso is the answer, questioning it feels like questioning gravity. So the test suites stayed brittle, the teams stayed frustrated, and the leadership stayed confident they were using the best tools available. After a while, you start to wonder whether brittle UI tests are almost as good as not having tests at all.&lt;/p&gt;

&lt;p&gt;That frustration is where Drengr started. Not from a paper. Not from a hackathon. From years of watching test suites rot faster than we could maintain them, and a quiet conviction that AI could do something fundamentally better — tests that understand what they're looking at instead of matching on fragile element IDs.&lt;/p&gt;

&lt;p&gt;I started prototyping in late 2024. A simple idea: what if an AI agent could look at a screen, understand what it sees, and interact with the app the way a human would? No hardcoded selectors. No XPath expressions that shatter on the next release. Just "navigate to the settings page and verify the toggle works." If the UI changes, the agent adapts. Self-evolving tests.&lt;/p&gt;

&lt;p&gt;Drengr is still early. I'm still figuring things out, still iterating, still learning what works and what doesn't. But recently I took some time to look at what the academic research community has been publishing — and I was surprised to find that researchers at Google, Meta, Microsoft, Tencent, and Princeton have been circling the same problems from different angles. Some of their solutions look like mine. Some are fundamentally different. A few of their insights are already changing how I think about what I'm building.&lt;/p&gt;

&lt;p&gt;This post is my attempt to map the territory honestly — where Drengr's early architecture converges with published research, where it diverges, and what I've learned from reading the papers after building the first version of the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Observe-Act Loop: Independent Convergence
&lt;/h2&gt;

&lt;p&gt;Drengr's core architecture is three MCP tools: &lt;code&gt;drengr_look&lt;/code&gt; (observe the screen), &lt;code&gt;drengr_do&lt;/code&gt; (execute an action), and &lt;code&gt;drengr_query&lt;/code&gt; (read structured data). An AI agent calls these in a loop — look at the screen, decide what to do, do it, look again.&lt;/p&gt;

&lt;p&gt;In late 2023, Zhang et al. at Tencent published &lt;strong&gt;AppAgent: Multimodal Agents as Smartphone Users&lt;/strong&gt; (arXiv:2312.13771). Their system does the same thing — observe the screen, decide, act — but as a Python agent framework. What struck me was their screen annotation approach: they number interactive elements on the screenshot so the LLM can reference them by ID. I'd independently arrived at the same design for Drengr's element numbering system. When two teams solve the same problem the same way without talking to each other, it usually means the solution is natural to the problem space.&lt;/p&gt;

&lt;p&gt;A month later, Wang et al. published &lt;strong&gt;Mobile-Agent&lt;/strong&gt; (arXiv:2401.16158), taking a purely vision-centric approach — no XML dumps, no accessibility tree, just screenshots plus detection and OCR models. Their finding that you &lt;em&gt;don't need&lt;/em&gt; system metadata to navigate apps effectively was an important validation. Drengr deliberately uses both screenshots and the accessibility tree — the tree is faster to parse, costs almost nothing in tokens, and gives precise element bounds that vision models still struggle with. But Mobile-Agent's results are a useful signal: as vision models improve, the tree may become optional, and Drengr's architecture is designed to make that transition seamless when the time is right.&lt;/p&gt;

&lt;p&gt;The key difference between Drengr and these systems: they're agent frameworks. Drengr is infrastructure. AppAgent and Mobile-Agent are Python applications that contain both the perception logic and the decision-making. Drengr separates these entirely — it handles perception and action, and delegates all decision-making to whatever LLM is on the other end of the MCP connection. This is a fundamentally different deployment model, and it's what lets Drengr work with Claude Desktop, Cursor, Windsurf, or any other MCP client without modification.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OODA Loop: Military Theory Meets AI Agents
&lt;/h2&gt;

&lt;p&gt;When I implemented &lt;code&gt;drengr run&lt;/code&gt; — the autonomous agent mode — I structured it as an OODA loop: Observe (capture screen), Orient (situation engine analyzes what changed), Decide (LLM picks an action), Act (execute it). I chose OODA because it maps cleanly to the problem. The alternatives — simple while loops, state machines, behavior trees — all felt either too rigid or too unstructured.&lt;/p&gt;

&lt;p&gt;I was genuinely surprised to find that Schneier and Raghavan published &lt;strong&gt;Agentic AI's OODA Loop Problem&lt;/strong&gt; in IEEE Security &amp;amp; Privacy in 2025, analyzing exactly this pattern from a security perspective. Their key insight is that every stage of the OODA loop is a distinct attack surface. Prompt injection corrupts the Observe phase. Data poisoning corrupts Orient. Probabilistic decision-making without output verification corrupts Act. They specifically mention MCP and tool-calling systems as creating compounded vulnerabilities.&lt;/p&gt;

&lt;p&gt;Reading this paper directly influenced Drengr's security model. The &lt;code&gt;drengr_look&lt;/code&gt; observation phase cross-references the visual screenshot against the accessibility tree — if the two disagree (an element is visible but not in the tree, or vice versa), that inconsistency is surfaced in the situation report. It's not full tamper-evidence yet, but the dual-source design gives Drengr a foundation that purely vision-based systems don't have. Schneier and Raghavan's framing helped me see that as a security property, not just an implementation detail.&lt;/p&gt;

&lt;p&gt;More recently, Yasuno published &lt;strong&gt;RAPTOR-AI for Disaster OODA Loop&lt;/strong&gt; (arXiv:2602.00030) in early 2026, applying the OODA pattern to disaster response with entropy-aware strategy selection. The concept of adjusting confidence thresholds based on situational entropy maps directly to what Drengr's situation engine does — detecting when the screen hasn't changed (stuck detection), when the app has crashed, or when the agent is in an unfamiliar state.&lt;/p&gt;

&lt;h2&gt;
  
  
  BFS App Exploration: An Old Idea, Reimagined
&lt;/h2&gt;

&lt;p&gt;Drengr's &lt;code&gt;drengr explore&lt;/code&gt; mode does BFS traversal of an app — systematically tapping every interactive element, recording the resulting screens, and building a navigation graph. I built this because I needed a way to map unfamiliar apps before writing test suites for them.&lt;/p&gt;

&lt;p&gt;The academic lineage here goes back to &lt;strong&gt;DroidBot&lt;/strong&gt; by Li et al. (IEEE/ACM ICSE-C 2017), which built state transition models from live UI interactions. DroidBot used hard-coded heuristics to decide what to tap next. Drengr replaces those heuristics with an LLM decision layer — the agent can reason about whether a button is likely to navigate somewhere useful or just dismiss a dialog.&lt;/p&gt;

&lt;p&gt;Wen et al. at Microsoft Research took this further with &lt;strong&gt;AutoDroid&lt;/strong&gt; (ACM MobiCom 2024), combining LLM-driven exploration with a reusable knowledge graph. Their publication at MobiCom — a top-tier systems conference — establishes this as a recognized systems contribution, not just an ML exercise. Drengr's approach is architecturally simpler — a single Rust binary versus a Python/LLM stack — but the core insight is the same: BFS exploration is dramatically more effective when guided by a language model than by heuristics.&lt;/p&gt;

&lt;h2&gt;
  
  
  ReAct and Tool Use: The Conceptual Foundation
&lt;/h2&gt;

&lt;p&gt;Two papers form the conceptual bedrock of what Drengr enables, even though I hadn't read either when I started building.&lt;/p&gt;

&lt;p&gt;Yao et al. at Princeton published &lt;strong&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;/strong&gt; (ICLR 2023, arXiv:2210.03629). ReAct interleaves chain-of-thought reasoning with executable actions — the model reasons about what to do, issues an action, observes the result, reasons again. Every time Claude calls &lt;code&gt;drengr_look&lt;/code&gt;, reasons about what to tap, then calls &lt;code&gt;drengr_do&lt;/code&gt;, it's executing a ReAct loop. Drengr is, architecturally, a ReAct-compatible tool suite for mobile devices.&lt;/p&gt;

&lt;p&gt;Schick et al. at Meta published &lt;strong&gt;Toolformer&lt;/strong&gt; (NeurIPS 2023, arXiv:2302.04761), demonstrating that LLMs can learn when and how to call external tools. Toolformer's tools were information retrieval APIs — calculators, search engines, QA systems. Drengr's tools have physical side effects. When &lt;code&gt;drengr_do&lt;/code&gt; taps a button, a real device changes state. That distinction matters — the consequences of a wrong action are much more significant than a wrong search query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Screen Understanding: Where the Field Is Heading
&lt;/h2&gt;

&lt;p&gt;Two papers from Google Research point to where Drengr's perception layer might evolve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ScreenAI&lt;/strong&gt; (Baechler et al., IJCAI 2024, arXiv:2402.04615) is a 4.6B-parameter vision-language model fine-tuned specifically for UI understanding. It identifies UI elements — buttons, text fields, images — at the pixel level from raw screenshots. Currently, Drengr uses the Android accessibility tree alongside screenshots for element identification. ScreenAI suggests that the screenshot alone might eventually be sufficient, which would make Drengr's perception layer identical across Android, iOS, and any other platform with a display.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spotlight&lt;/strong&gt; (Li and Li, arXiv:2209.14927, 2023) goes even further — a vision-only model for mobile UI understanding that &lt;em&gt;outperforms&lt;/em&gt; methods using both screenshots and view hierarchies. This directly challenges Drengr's current design of using the accessibility tree as a primary data source. If vision-only models can outperform metadata-enhanced models, then Drengr's &lt;code&gt;drengr_query&lt;/code&gt; tool (which reads the UI tree) might eventually become redundant — replaced by richer visual understanding from the LLM itself.&lt;/p&gt;

&lt;p&gt;For now, the accessibility tree remains the right default — it's reliable, fast, and doesn't require a specialized vision model. But Drengr's perception layer is designed as a swappable trait, so when vision-only models reach the point where they consistently outperform metadata-enhanced approaches across device types and screen densities, the switch is an implementation change, not an architectural one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Survey: Situating Drengr in the Field
&lt;/h2&gt;

&lt;p&gt;Wang et al. published &lt;strong&gt;GUI Agents with Foundation Models: A Comprehensive Survey&lt;/strong&gt; (arXiv:2411.04890, 2024) — a systematic review of 100+ papers on LLM-based GUI agents across web, desktop, and mobile. Reading this survey was like looking at a map after you've already hiked the trail. I recognized the landmarks.&lt;/p&gt;

&lt;p&gt;Drengr's three-tool architecture fits cleanly into the survey's taxonomy of perception-grounding-action pipelines. What the survey made clear is that most systems in this space are tightly coupled — the perception, grounding, and action components are part of the same codebase, usually Python. Drengr's contribution is decoupling these: it handles perception and action, and lets any MCP-compatible LLM handle grounding and reasoning. This is a systems architecture choice, not an ML innovation — but it's one that the survey suggests is underexplored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Drengr Diverges
&lt;/h2&gt;

&lt;p&gt;After reading all of this, here's what I think Drengr is doing differently — or at least trying to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Infrastructure, not framework.&lt;/strong&gt; Almost every paper describes an end-to-end agent. Drengr is deliberately not an agent — it's the hands and eyes that agents use. This separation came from ten years of watching tightly-coupled testing tools become unmaintainable. It's a pattern I saw repeated across every mobile organization I worked in — teams with brilliant leadership, seasoned VPs, directors who'd shipped apps to hundreds of millions of users — and yet the testing infrastructure always calcified the same way. The tooling forced coupling, but I also think there was a deeper issue — the mental model at the top often stopped at "we need more test coverage" without questioning whether the testing paradigm itself was the bottleneck. When you've shipped successful apps for years using a certain approach, it takes a particular kind of intellectual honesty to ask whether that approach has a ceiling. Most organizations optimized within the paradigm rather than questioning it. When your test framework is also your test runner is also your assertion library is also your device manager, everything breaks together. I think the industry internalized that pain as normal. Drengr's hypothesis is that it doesn't have to be. Separate the perception and action layer from the intelligence layer, and each can evolve independently. The agent will change. The tools should remain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rust, not Python.&lt;/strong&gt; Every system cited above is Python. Drengr is a single static Rust binary. As an Android engineer, I know what it's like to ask a team to install a tool with twelve dependencies. I wanted &lt;code&gt;curl | bash&lt;/code&gt; and done. That choice has trade-offs — I wrote about them in &lt;a href="https://dev.to/blog/why-not-python"&gt;a separate post&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP first — by design.&lt;/strong&gt; People ask why I released Drengr as an MCP server before building a standalone CLI agent. The answer comes from watching this industry long enough to know what survives and what doesn't. AI models improve every few months. The agent that's state-of-the-art today will be obsolete by next year. But the ability to observe a screen, tap a button, and read a UI tree? That's stable. That's the invariant. By releasing the tool layer first — as an MCP server that any AI client can consume — I'm building on the part that lasts. Claude Desktop uses it today. Cursor uses it today. Whatever comes next year will use it too, because the interface is standardized. If I'd built a monolithic agent instead, I'd be rewriting it every time a better model dropped. The Model Context Protocol didn't exist when most of these papers were written. Drengr's bet is that a standard protocol between AI agents and tools is more valuable than another custom agent framework. I might be wrong. But ten years of watching tightly-coupled tools age badly makes me think this bet is right.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Born from the field, not the lab.&lt;/strong&gt; This isn't a research project with a team, compute budget, and publication timeline. It's one Android engineer who got tired of writing tests that broke every sprint and decided to try a different approach. The architecture reflects that — pragmatic, incremental, shaped by what I actually needed rather than what's theoretically optimal.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Reading these papers after building the first version taught me something I didn't expect: the problems I was solving alone — in my apartment, after work, on weekends — are the same problems that well-funded research teams at Google and Microsoft are working on. That's both humbling and encouraging.&lt;/p&gt;

&lt;p&gt;The convergence gives me confidence that I'm not building something crazy. The divergence — particularly Drengr's choice to be infrastructure rather than an agent, and to use the accessibility tree alongside vision rather than vision alone — reflects deliberate trade-offs, not gaps. Where the academic work explores what's theoretically optimal, Drengr is built around what's practically reliable today while keeping the architecture open to what's coming.&lt;/p&gt;

&lt;p&gt;I'm not an academic. I don't have a lab or a publication record. I'm an Android engineer with a decade of scar tissue from brittle test suites, building a tool shaped by what I actually needed in the field. The researchers cited here are formalizing the theory behind problems I've been solving through iteration and observation. We're approaching the same territory from different directions — and I think both directions produce insights the other can't.&lt;/p&gt;

&lt;p&gt;If you're working in this area — whether you're writing papers or building tools or just frustrated with your own test suite — I'd love to hear from you. This space is wide open, and I think we're all just getting started.&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Zhang et al. "AppAgent: Multimodal Agents as Smartphone Users." arXiv:2312.13771, 2023.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wang et al. "Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception." arXiv:2401.16158, 2024.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wen et al. "AutoDroid: LLM-powered Task Automation in Android." ACM MobiCom, 2024.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Schneier &amp;amp; Raghavan. "Agentic AI's OODA Loop Problem." IEEE Security &amp;amp; Privacy, 2025.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yasuno. "RAPTOR-AI for Disaster OODA Loop." arXiv:2602.00030, 2026.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Li et al. "DroidBot: A Lightweight UI-Guided Test Input Generator for Android." IEEE/ACM ICSE-C, 2017.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wang et al. "GUI Agents with Foundation Models: A Comprehensive Survey." arXiv:2411.04890, 2024.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Schick et al. "Toolformer: Language Models Can Teach Themselves to Use Tools." NeurIPS, 2023.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR, 2023.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Baechler et al. "ScreenAI: A Vision-Language Model for UI and Infographics Understanding." IJCAI, 2024.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Li &amp;amp; Li. "Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus." arXiv:2209.14927, 2023.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>What Happens When You Let AI Test Your App for a Week</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 07:51:06 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/what-happens-when-you-let-ai-test-your-app-for-a-week-53md</link>
      <guid>https://forem.com/sharminsirajudeen/what-happens-when-you-let-ai-test-your-app-for-a-week-53md</guid>
      <description>&lt;h1&gt;
  
  
  What Happens When You Let AI Test Your App for a Week
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AI mobile testing is either the future of QA or an expensive way to generate false bug reports, depending on who you ask. I decided to find out for myself. I pointed Drengr's OODA-loop agent at three different apps — a calculator, a weather app, and a social media client — and let it run autonomously for a week. Here's what happened, including the parts that didn't work.&lt;/p&gt;

&lt;p&gt;This wasn't a controlled experiment in any scientific sense. The sample size is tiny, the apps are specific, and the results may not generalize. I'm sharing this as a data point, not a proof. Autonomous mobile testing is genuinely new territory and I think honest reporting matters more than impressive claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;Each app got the same treatment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10 autonomous exploration runs&lt;/strong&gt; per day, each with a different high-level goal prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each run capped at 50 actions&lt;/strong&gt; to limit token costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goals ranged from specific&lt;/strong&gt; ("calculate 15% tip on $47.50") &lt;strong&gt;to open-ended&lt;/strong&gt; ("explore the app and report anything that seems broken")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4&lt;/strong&gt; as the decision-making model, chosen for the balance of capability and cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Android emulator&lt;/strong&gt;, Pixel 7 image, API 34&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total over the week: 210 runs across the three apps, approximately 2.1 million tokens consumed, about $14 in API costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Prompts
&lt;/h3&gt;

&lt;p&gt;I learned quickly that prompt design matters enormously. "Test the calculator" produced aimless tapping. "Verify that the calculator handles edge cases in arithmetic operations, including negative numbers, decimal precision, division by zero, and very large numbers" produced useful, targeted exploration.&lt;/p&gt;

&lt;p&gt;The sweet spot was specific enough to guide the agent but open enough to let it discover things I hadn't anticipated.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Found
&lt;/h2&gt;

&lt;h3&gt;
  
  
  App 1: Calculator — The Negative Number Bug
&lt;/h3&gt;

&lt;p&gt;The calculator app was a personal project, something I'd built and considered "done" for months. The agent found a bug on the second day that I'd never noticed: entering a negative number, then pressing the percent button, then pressing equals produced &lt;code&gt;NaN&lt;/code&gt; instead of a numeric result.&lt;/p&gt;

&lt;p&gt;I'd never tested that sequence manually. Why would I? Negative percent of a number isn't a common operation. But the agent, exploring combinations I wouldn't think to try, stumbled into it. The underlying issue was a missing absolute value check in the percentage calculation path.&lt;/p&gt;

&lt;p&gt;That alone made the experiment worthwhile for me. It's a trivial bug, but it had shipped. A real user could have hit it.&lt;/p&gt;

&lt;h3&gt;
  
  
  App 2: Weather App — The Broken Deep Link
&lt;/h3&gt;

&lt;p&gt;The weather app supported deep links for sharing forecast URLs. The agent, when given the goal "navigate to the settings page using every available path," discovered that the deep link &lt;code&gt;weather://settings/notifications&lt;/code&gt; crashed the app. The crash was caught by Drengr's logcat monitoring before the agent even had to report it — the situation engine flagged a fatal exception.&lt;/p&gt;

&lt;p&gt;The root cause was a missing null check on a fragment argument. The deep link handler assumed a bundle parameter would always be present, but the notifications settings fragment expected it to be passed by the parent activity, not by a deep link.&lt;/p&gt;

&lt;h3&gt;
  
  
  App 3: Social Media Client — The Accessibility Issue
&lt;/h3&gt;

&lt;p&gt;This was the most interesting finding. The social media client had several icon buttons — like, share, bookmark — that had no content descriptions. The agent reported them as "unlabeled interactive elements" because the UI hierarchy showed clickable views with no text and no accessibility labels.&lt;/p&gt;

&lt;p&gt;The agent wasn't doing accessibility testing on purpose. It was trying to describe what it saw, and it couldn't identify those buttons. The same problem that confused the AI would confuse a screen reader. Inaccessible UI is ambiguous UI, and ambiguity hurts both automated agents and human users who rely on assistive technology.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Missed
&lt;/h2&gt;

&lt;p&gt;Equally important is what the agent did &lt;em&gt;not&lt;/em&gt; catch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A timing-sensitive race condition.&lt;/strong&gt; The weather app had a bug where rapidly switching between cities while forecasts were loading could display the wrong city's data. This required specific timing — switching during the 200-400ms window between the API response arriving and the UI updating. The agent's action cycle was too slow (3-5 seconds between actions) to ever trigger this window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual alignment issues.&lt;/strong&gt; The social media client had a layout bug where long usernames caused text to overlap with the timestamp on certain screen widths. The UI hierarchy reported correct element bounds — the overlap was a rendering issue, not a layout issue. The elements were "correctly positioned" according to the layout engine but visually overlapping. The agent, which relies on the UI tree more than pixel-level analysis, didn't notice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subtle UX problems.&lt;/strong&gt; The calculator's history feature was confusing — it showed results in reverse chronological order with no clear timestamps, and old entries looked identical to new ones. A human tester would flag this as a usability issue. The agent, which has no concept of "confusing," saw a functioning list and moved on.&lt;/p&gt;

&lt;h2&gt;
  
  
  False Positives
&lt;/h2&gt;

&lt;p&gt;The agent reported 23 "issues" across the week. After manual review, 14 were genuine findings and 9 were false positives. That's a 39% false positive rate — high enough to require human review of every report.&lt;/p&gt;

&lt;p&gt;The most common false positive: &lt;strong&gt;interpreting slow loads as crashes.&lt;/strong&gt; The agent would tap a button, wait for the screen to change, and if nothing happened within its patience window (about 8 seconds), report a failure. Several of these were just slow network responses on the emulator.&lt;/p&gt;

&lt;p&gt;The second most common: &lt;strong&gt;misinterpreting intentional UI states as errors.&lt;/strong&gt; A dismissed bottom sheet was reported as "content disappeared unexpectedly." An empty search results page was reported as "app failed to load content." These are correct observations — the content did disappear, the page is empty — but the agent's interpretation was wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;Across 210 runs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total tokens:&lt;/strong&gt; ~2.1 million (input + output)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total API cost:&lt;/strong&gt; ~$14&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Average per run:&lt;/strong&gt; ~10,000 tokens, ~$0.07&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Average run time:&lt;/strong&gt; 3-4 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human review time:&lt;/strong&gt; ~5 hours total to evaluate all reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For comparison, manual QA testing of those three apps at a similar depth would have taken me roughly 15-20 hours. The AI testing took about 5 hours of my time (setup, prompt design, and report review) plus $14 in API costs.&lt;/p&gt;

&lt;p&gt;That's a meaningful efficiency gain, but it's not zero-effort. The human is still in the loop, reviewing reports and separating signal from noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Honest Take
&lt;/h2&gt;

&lt;p&gt;AI QA testing is not a replacement for human QA. It finds different kinds of bugs through different kinds of exploration. A human tester applies domain knowledge, aesthetic judgment, and intuition about what "feels wrong." An AI agent applies exhaustive combinatorial exploration, patience for repetitive tasks, and zero assumptions about how the app "should" work.&lt;/p&gt;

&lt;blockquote&gt;The most valuable bugs the agent found were the ones I'd never have thought to test for. The most valuable bugs it missed were the ones that required human judgment to even recognize as bugs.
&lt;/blockquote&gt;

&lt;p&gt;The two approaches are complementary. The agent explores the spaces I wouldn't think to explore. I evaluate the findings with context the agent doesn't have. Together, that coverage is better than either alone.&lt;/p&gt;

&lt;p&gt;I plan to keep running these experiments with more apps and more sophisticated prompting strategies. The 39% false positive rate is the number I most want to bring down — that's where the agent goes from "interesting research tool" to "practical QA assistant."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Why Not Python? The Language Everyone Expected Me to Use for Drengr</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 07:50:59 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/why-not-python-the-language-everyone-expected-me-to-use-for-drengr-3d8n</link>
      <guid>https://forem.com/sharminsirajudeen/why-not-python-the-language-everyone-expected-me-to-use-for-drengr-3d8n</guid>
      <description>&lt;h1&gt;
  
  
  Why Not Python? The Language Everyone Expected Me to Use for Drengr
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I started building Drengr — a tool that gives AI agents eyes and hands on mobile devices — the default choice was obvious. Every AI agent project in 2025-2026 is Python. LangChain is Python. CrewAI is Python. AutoGen is Python. Most MCP server implementations are Python. The ecosystem, the tutorials, the community, the hiring market — all Python.&lt;/p&gt;

&lt;p&gt;I chose Rust instead. This is the honest explanation of why, what it cost me, and whether I'd make the same choice again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Distribution Problem
&lt;/h2&gt;

&lt;p&gt;The single biggest reason I didn't use Python is distribution. Drengr is a developer tool that other people need to install on machines I'll never see. The install experience is the first impression. And with Python, that first impression is often painful.&lt;/p&gt;

&lt;p&gt;Consider what a Python-based Drengr install looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;drengr
&lt;span class="c"&gt;# ERROR: requires Python 3.11+, you have 3.9&lt;/span&gt;
&lt;span class="c"&gt;# or: conflicts with existing package versions&lt;/span&gt;
&lt;span class="c"&gt;# or: needs a virtual environment&lt;/span&gt;
&lt;span class="c"&gt;# or: pip install fails because of a C extension dependency&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every Android engineer has been on the receiving end of this. You follow the install instructions for some Python-based tooling — a test runner, a code generator, a device farm client — and you're greeted with a &lt;code&gt;ModuleNotFoundError&lt;/code&gt; or a version conflict with something else in your environment. I've lost count of the hours I've spent debugging other people's dependency trees instead of doing my actual work.&lt;/p&gt;

&lt;p&gt;The Rust alternative:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://drengr.dev/install.sh | bash
drengr doctor

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One binary. No runtime. No dependencies. No virtual environment. No version conflicts. It either works or it doesn't, and if it doesn't, it's a bug I can actually reproduce and fix — because the binary is the same on every machine.&lt;/p&gt;

&lt;p&gt;This isn't a theoretical concern. Drengr interacts with ADB, simctl, and Appium — tools that already have their own dependency and version requirements. Adding Python's dependency management on top of that would create a combinatorial explosion of "works on my machine" problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cold Start Matters for MCP
&lt;/h2&gt;

&lt;p&gt;Drengr runs as an MCP server. When Claude Desktop or Cursor connects to it, the server needs to start and respond to the first tool call. The user is waiting. The AI agent is waiting. Every millisecond of startup time is friction.&lt;/p&gt;

&lt;p&gt;Drengr's cold start to first MCP response: &lt;strong&gt;~15ms&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A Python MCP server with typical imports (json, asyncio, an HTTP client, a CLI framework) starts in &lt;strong&gt;200-500ms&lt;/strong&gt;. Add heavier libraries — image processing, XML parsing, the Anthropic SDK — and you're looking at &lt;strong&gt;1-2 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For a one-off script, nobody cares. For a tool that an AI agent might start and stop multiple times during a session, or that needs to respond to tool calls in real-time during an autonomous OODA loop, the difference is significant. The agent's thinking time is already the bottleneck — the tool layer shouldn't add to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory and Reliability
&lt;/h2&gt;

&lt;p&gt;Drengr manages long-running device sessions. An autonomous test run might interact with a device for 30 minutes or more, capturing hundreds of screenshots, parsing hundreds of UI trees, maintaining situation engine state. This is the kind of workload where Python's memory management gets interesting.&lt;/p&gt;

&lt;p&gt;Python's garbage collector is good enough for most applications. But "good enough" means occasional GC pauses. It means memory growing over time as objects are allocated and collected. It means that a screenshot buffer you thought was freed is actually being held by a reference cycle until the GC gets around to collecting it.&lt;/p&gt;

&lt;p&gt;Rust's ownership model means memory is freed deterministically — at the exact point where the owning variable goes out of scope. No GC pauses. No reference cycles. No "why is my process using 2GB after running for an hour?" investigations. Drengr's memory usage is flat and predictable regardless of session length.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Python Would Have Given Me
&lt;/h2&gt;

&lt;p&gt;I want to be fair. Python would have given me real advantages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prototyping speed.&lt;/strong&gt; The first working version of Drengr took me about three weeks in Rust. In Python, I estimate it would have taken one week. The borrow checker adds friction during exploration — when I'm trying three different approaches to screen parsing, Rust demands I think through ownership at each step. Python lets me hack first and clean up later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI ecosystem.&lt;/strong&gt; When I needed to add LLM integration for the OODA loop, the Python path was obvious: &lt;code&gt;pip install anthropic&lt;/code&gt;, call the API, get structured responses. In Rust, I made raw HTTP calls to the API and wrote my own response parsing. It works fine, but it was more work than it needed to be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Community and contributors.&lt;/strong&gt; More developers know Python than Rust. If Drengr were Python, more people could read the code, understand it, and potentially contribute. Rust's learning curve is a barrier to contribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Faster iteration on AI-adjacent features.&lt;/strong&gt; Some of Drengr's planned features — smarter situation analysis, better stuck detection, screen diffing — would benefit from rapid experimentation. Python is better for that kind of exploratory work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Python Would Have Cost Me
&lt;/h2&gt;

&lt;p&gt;But the costs are real too.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every user becomes a debugger.&lt;/strong&gt; With a Python tool, a meaningful percentage of support interactions would be "it doesn't install on my machine" or "it crashes with this import error." I've been that user enough times with other people's Python tools to know exactly how it goes. The first time someone opens an issue about a dependency conflict, I'd spend a day I could have spent building features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The packaging problem is unsolved.&lt;/strong&gt; PyInstaller, Nuitka, cx_Freeze, Briefcase — Python has many tools for creating standalone executables, and all of them have sharp edges. Platform-specific behavior, missing dependencies at runtime, binary size inflation. Rust's &lt;code&gt;cargo build --release&lt;/code&gt; produces a binary that just works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrency complexity.&lt;/strong&gt; Drengr's MCP server, SDK event listener, and OODA loop can all be active simultaneously, sharing state about the current device session. In Python, this means threading (with the GIL), multiprocessing (with serialization overhead), or asyncio (with the colored function problem). In Rust, the type system enforces safe concurrency. The compiler catches data races. I don't have to choose between correctness and performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Process weight.&lt;/strong&gt; A Python process carries the interpreter, the standard library, and all imported modules. Drengr running as a Rust binary uses about &lt;strong&gt;8MB&lt;/strong&gt; of resident memory. An equivalent Python process would use &lt;strong&gt;40-80MB&lt;/strong&gt;. For a background tool running alongside an IDE, a browser, and whatever else the developer has open, this matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Decision Framework
&lt;/h2&gt;

&lt;p&gt;Here's how I think about it now, after having shipped Drengr in Rust:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Python when:&lt;/strong&gt; You're building an AI application where the AI logic is the product. When you need rapid experimentation with LLM APIs, prompt engineering, and agent orchestration. When your users are data scientists or ML engineers who already have Python installed and configured. When distribution is pip or Docker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Rust when:&lt;/strong&gt; You're building infrastructure that AI applications consume. When the tool needs to install in one command on any machine. When cold start time matters. When the tool runs for long periods and memory predictability matters. When you're a solo developer and can't afford to spend time debugging environment issues on machines you've never seen.&lt;/p&gt;

&lt;p&gt;Drengr is infrastructure, not an application. It doesn't contain AI logic — it provides tools that AI logic consumes. That distinction made Rust the right choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Would I Do It Again?
&lt;/h2&gt;

&lt;p&gt;Yes. Without hesitation.&lt;/p&gt;

&lt;p&gt;The three weeks of additional development time cost me once. The zero-friction install experience pays off every time someone tries Drengr. The 15ms cold start pays off on every MCP tool call. The predictable memory usage pays off on every long-running test session.&lt;/p&gt;

&lt;p&gt;Python is the right language for AI applications. Rust is the right language for AI infrastructure. Drengr is infrastructure.&lt;/p&gt;

&lt;p&gt;If you want to read about why I chose Rust over C and C++ — the other systems languages I seriously considered — I've written about that &lt;a href="https://dev.to/blog/why-rust-not-c-cpp"&gt;here&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Corellium Sold for $170M. Here's What They Couldn't Do.</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 07:44:05 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/corellium-sold-for-170m-heres-what-they-couldnt-do-5bh1</link>
      <guid>https://forem.com/sharminsirajudeen/corellium-sold-for-170m-heres-what-they-couldnt-do-5bh1</guid>
      <description>&lt;h1&gt;
  
  
  Corellium Sold for $170M. Here's What They Couldn't Do.
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  $170M for a Virtual Phone
&lt;/h2&gt;

&lt;p&gt;Cellebrite — the company law enforcement calls when they need to crack a phone — just acquired Corellium for $170 million. Corellium virtualizes iOS and Android devices in the cloud. You get a full device image running on a remote server, with root access, JTAG debugging, and kernel introspection. Security researchers use it to hunt vulnerabilities. Governments use it for forensic analysis.&lt;/p&gt;

&lt;p&gt;$170M. For the ability to look inside a phone.&lt;/p&gt;

&lt;p&gt;That number tells you something: programmatic access to mobile devices is not a niche. It's infrastructure. And it's being valued like infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Corellium Does (and Doesn't)
&lt;/h2&gt;

&lt;p&gt;Corellium gives you a virtualized device. You can boot it, inspect its memory, modify its filesystem, attach a debugger. It's a microscope.&lt;/p&gt;

&lt;p&gt;What it can't do: use the phone like a human.&lt;/p&gt;

&lt;p&gt;It can't tap a button. It can't type a search query. It can't swipe through a feed, navigate a checkout flow, or verify that a login screen actually works after a deploy. It wasn't built for that. It was built for reverse engineering and security research — looking at the internals of the device, not interacting with its UI.&lt;/p&gt;

&lt;p&gt;That's a fundamentally different problem. Corellium answers: "What is this device doing internally?" Drengr answers: "Can an AI agent operate this device the way a user would?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Layer: Actuation
&lt;/h2&gt;

&lt;p&gt;The mobile device stack has three layers of programmatic access:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Observation&lt;/strong&gt; — See what's on screen. Take screenshots, read the UI tree, dump logs. Every testing tool does this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Virtualization&lt;/strong&gt; — Run the device as a virtual machine. Inspect memory, modify the OS, simulate hardware. This is Corellium's $170M business.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Actuation&lt;/strong&gt; — Interact with the device as a user. Tap, type, swipe, long press, launch apps, navigate flows. Not through scripts with hardcoded selectors, but through an AI agent that sees the screen and decides what to do.&lt;/p&gt;

&lt;p&gt;Layers 1 and 2 have billion-dollar companies behind them. Layer 3 — AI-driven actuation on real mobile devices — is where the gap is. That's the layer Drengr occupies.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Drengr Fills the Gap
&lt;/h2&gt;

&lt;p&gt;Drengr is a single Rust binary that exposes mobile devices to AI agents via the Model Context Protocol (MCP). Three tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;drengr_look&lt;/code&gt;&lt;/strong&gt; — The agent sees the screen. Either as a compact ~300 token text description or an annotated image with numbered elements. Text-first by default — 100x cheaper than sending screenshots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;drengr_do&lt;/code&gt;&lt;/strong&gt; — The agent acts. Tap, type, swipe, long press, back, home, launch, scroll — 13 actions that cover the full interaction surface. Each action returns a situation report: what changed, what appeared, what disappeared, whether the app crashed or got stuck.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;drengr_query&lt;/code&gt;&lt;/strong&gt; — The agent asks questions. What's the current activity? Did the app crash? What HTTP calls happened? What does the UI tree look like?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI client — Claude Desktop, Cursor, Windsurf, VS Code — is the brain. Drengr is the hands. The agent looks at a screen it has never seen before, reasons about what to do, and does it. No pre-programmed selectors. No XPath. No brittle scripts that break when the designer moves a button.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;com.example.app&lt;/span&gt;
&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;login&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Log&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;user@test.com&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;password123"&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;checkout&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;headphones&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cart&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;purchase"&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;90s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That YAML survived three redesigns. The AI adapted every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP Matters Here
&lt;/h2&gt;

&lt;p&gt;The MCP ecosystem is exploding. MCPNest indexes over 5,000 MCP servers. MCP Shield audits them for supply chain attacks. Scoring platforms rank them by quality. The protocol is becoming the standard interface between AI agents and external tools — the same way LSP became the standard between editors and language servers.&lt;/p&gt;

&lt;p&gt;Drengr is the MCP server for mobile devices. It connects to any MCP-compatible AI client without modification. When a better model comes out, you swap the brain. The hands stay the same. When someone builds a better orchestrator, it works with Drengr out of the box.&lt;/p&gt;

&lt;p&gt;This is why the architecture matters more than the features. Corellium is a proprietary platform — you use their cloud, their API, their tools. Drengr is a protocol-native server. It plugs into the ecosystem that's forming right now, not a walled garden.&lt;/p&gt;

&lt;h2&gt;
  
  
  The $170M Signal
&lt;/h2&gt;

&lt;p&gt;When Cellebrite pays $170M for Corellium, they're not buying a product. They're buying a position in the mobile device access market. They're saying: programmatic control of mobile devices is critical infrastructure, and we'll pay nine figures to own a piece of it.&lt;/p&gt;

&lt;p&gt;Virtualization was the first wave. Observation was the zeroth. Actuation — letting AI agents operate devices autonomously — is the next.&lt;/p&gt;

&lt;p&gt;The companies that figured out how to let machines &lt;em&gt;look at&lt;/em&gt; phones built hundred-million-dollar businesses. The companies that figure out how to let machines &lt;em&gt;use&lt;/em&gt; phones will build bigger ones.&lt;/p&gt;

&lt;p&gt;I don't know if Drengr becomes that. But I know the layer it occupies — AI-native device actuation via an open protocol — is the layer that doesn't exist yet at scale. And $170M says the market is paying attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Drengr Stands Today
&lt;/h2&gt;

&lt;p&gt;Real devices. Real interactions. Real results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Android&lt;/strong&gt;: physical phones, emulators, cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iOS&lt;/strong&gt;: full simulator support — tap, type, swipe, pinch zoom, long press, scroll&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-device&lt;/strong&gt;: connect Android and iOS simultaneously, switch with a parameter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Any AI client&lt;/strong&gt;: Claude Desktop, Cursor, Windsurf, VS Code — anything that speaks MCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One install&lt;/strong&gt;: &lt;code&gt;npm install -g drengr&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The binary is 5MB. Written in Rust. No runtime dependencies. It runs on your machine, talks to your devices, and gives any AI agent the ability to operate a phone.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>From Intent Classification to Open-Ended Action Spaces: Why Mobile Testing Needed a New Paradigm</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Mon, 06 Apr 2026 01:50:48 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/from-intent-classification-to-open-ended-action-spaces-why-mobile-testing-needed-a-new-paradigm-2lpb</link>
      <guid>https://forem.com/sharminsirajudeen/from-intent-classification-to-open-ended-action-spaces-why-mobile-testing-needed-a-new-paradigm-2lpb</guid>
      <description>&lt;h1&gt;
  
  
  From Intent Classification to Open-Ended Action Spaces: Why Mobile Testing Needed a New Paradigm
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Google recently shipped &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;AI Edge Gallery&lt;/a&gt; — an on-device AI sandbox app with a feature called "Mobile Actions" that lets you control your phone with natural language. Say "turn on the flashlight," and a 270M parameter model called FunctionGemma figures out the intent, extracts the parameters, and dispatches the right function call. It runs entirely offline. It clocks 1,916 tokens/sec prefill on a Pixel 7 Pro. And it's impressive.&lt;/p&gt;

&lt;p&gt;But it also reveals a ceiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Closed-World Assumption
&lt;/h2&gt;

&lt;p&gt;FunctionGemma is, at its core, a tiny NLU engine performing intent classification and slot filling. You speak. It classifies your sentence into one of a fixed set of intents — &lt;code&gt;turnOnFlashlight&lt;/code&gt;, &lt;code&gt;createCalendarEvent&lt;/code&gt;, &lt;code&gt;showLocationOnMap&lt;/code&gt; — and extracts the relevant slots: a time, a location, a contact name. The native app code then dispatches the structured output to the corresponding platform API.&lt;/p&gt;

&lt;p&gt;This is a &lt;strong&gt;closed-world system&lt;/strong&gt;. Every possible action is known at compile time. Every function is pre-registered. Every slot is pre-defined. The model's job is pattern matching over a bounded action space — the same fundamental design that Dialogflow, Alexa Skills, and SiriKit Intents have used for years, now running on-device at remarkable speed. These platforms have evolved over time — Apple's App Intents, Alexa's generative AI features — but the underlying intent-schema architecture remains fundamentally closed-world by design.&lt;/p&gt;

&lt;p&gt;It works beautifully for what it is. But it cannot do what it has never been told exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Open-World Problem
&lt;/h2&gt;

&lt;p&gt;Now consider a different scenario. You're a QA engineer. You need to verify that a flower delivery app correctly applies a promo code at checkout, that the cart total updates, and that the confirmation screen renders the right order summary. The app was built by your team. No one pre-registered its UI elements as callable functions. No one fine-tuned a model on its screen taxonomy.&lt;/p&gt;

&lt;p&gt;This is an &lt;strong&gt;open-world problem&lt;/strong&gt;. The action space is unbounded. The UI is arbitrary. The screens have never been seen by the testing agent before.&lt;/p&gt;

&lt;p&gt;This is the problem &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt; solves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text-First Perception, Schema-Never
&lt;/h2&gt;

&lt;p&gt;Drengr is an MCP (Model Context Protocol) server — the open protocol that connects AI models to external tools and data sources, in the same way LSP (Language Server Protocol) connects editors to language servers. Drengr is purpose-built for mobile UI interaction. It doesn't require your app to expose an API. It doesn't need accessibility labels (though it uses them when available). It doesn't ask you to define intents or register functions.&lt;/p&gt;

&lt;p&gt;Instead, it operates through three primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;drengr_look&lt;/code&gt;&lt;/strong&gt; — Captures the current screen state as a compact text description (~300 tokens per screen) or an annotated image with numbered elements. Text-first by default — vision only escalates when less than 60% of elements have labels. 100x cheaper than sending screenshots every step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;drengr_do&lt;/code&gt;&lt;/strong&gt; — Performs 13 actions on the device: tap, type, swipe, long press, back, home, launch, wait, key press, install, clear and type, scroll to top, scroll to bottom. Each action returns a situation report — a structured diff of what changed on screen (new elements, disappeared elements, crash detection, stuck detection).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;drengr_query&lt;/code&gt;&lt;/strong&gt; — Structured queries about device and app state: list connected devices, check current activity, detect crashes, find elements by text, explore app navigation, read network calls, check keyboard state, dump the raw UI tree, and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI client — Claude Desktop, Cursor, Windsurf, VS Code, any MCP-compatible host — acts as the brain. Drengr provides the eyes and hands. The agent looks at a screen it has never seen, understands what's there, decides what to do, and does it. No pre-training on your app. No test script maintenance. No brittle XPath selectors that break every sprint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Distinction Matters
&lt;/h2&gt;

&lt;p&gt;The difference between closed-world function dispatch and open-world UI interaction is not incremental. It is architectural.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Closed-World (FunctionGemma)&lt;/th&gt;
&lt;th&gt;Open-World (Drengr)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Action space&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fixed, pre-defined functions&lt;/td&gt;
&lt;td&gt;Arbitrary, discovered at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UI knowledge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Compiled into the model&lt;/td&gt;
&lt;td&gt;Observed per-screen via text scenes + vision fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;New app support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires fine-tuning or function registration&lt;/td&gt;
&lt;td&gt;Works immediately against any app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Failure mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"I don't have a function for that"&lt;/td&gt;
&lt;td&gt;"I can see the screen — let me figure it out"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NLU → function dispatch&lt;/td&gt;
&lt;td&gt;Perception → reasoning → action&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;FunctionGemma is a classifier. Drengr is an agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The MCP Advantage
&lt;/h2&gt;

&lt;p&gt;Drengr is built as an MCP server — the same architectural pattern that made LSP the backbone of every modern code editor. Anthropic themselves draw this parallel in the MCP specification: both protocols solve the M×N integration problem. LSP connects M editors to N language servers. MCP connects M AI clients to N tool servers. Both use JSON-RPC 2.0 transport.&lt;/p&gt;

&lt;p&gt;This means Drengr isn't married to a single LLM. Today, a developer can wire up Claude Code, Cursor, or Windsurf as the reasoning layer, and Drengr handles the device interaction. Tomorrow, when a better model drops, you swap the brain without touching the tools.&lt;/p&gt;

&lt;p&gt;This separation of concerns — &lt;strong&gt;the model thinks, the server acts&lt;/strong&gt; — is what makes the architecture durable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who This Is For
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;QA engineers&lt;/strong&gt; tired of maintaining Appium scripts that break every release cycle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile developers&lt;/strong&gt; who want to validate user flows without writing test code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engineering leads&lt;/strong&gt; exploring agentic testing as a force multiplier for small teams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI tooling teams&lt;/strong&gt; evaluating MCP-compatible infrastructure for mobile automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Testing Problem, Reframed
&lt;/h2&gt;

&lt;p&gt;Traditional mobile test automation asks: &lt;em&gt;"How do I script a robot to press the right buttons?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Drengr asks: &lt;em&gt;"What if the robot could just look at the screen and figure it out?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That reframing — from scripted automation to perceptual agency — is the paradigm shift. It's the difference between giving someone a map with every turn pre-marked, and giving them eyes and the ability to navigate.&lt;/p&gt;

&lt;p&gt;Google proved that on-device NLU can dispatch to a handful of OS functions at blazing speed. Drengr proves that an LLM with the right tools can operate across any app, any screen, any flow — without ever being told what to expect.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drengr is free to use and available on &lt;a href="https://www.npmjs.com/package/drengr" rel="noopener noreferrer"&gt;npm&lt;/a&gt;. It supports Android (physical devices, emulators), iOS simulators (full gesture support), and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). Built in Rust. Single binary. No runtime dependencies.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Connecting Claude to a Real Phone via MCP</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Sun, 05 Apr 2026 03:01:57 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/connecting-claude-to-a-real-phone-via-mcp-dfj</link>
      <guid>https://forem.com/sharminsirajudeen/connecting-claude-to-a-real-phone-via-mcp-dfj</guid>
      <description>&lt;h1&gt;
  
  
  I Gave Claude My Phone and It Tested My App
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup: 90 seconds
&lt;/h2&gt;

&lt;p&gt;I plugged an Android phone into my MacBook. Opened Claude Desktop. Added one line to the MCP config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"drengr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"drengr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire setup. No Appium. No Selenium grid. No environment variables pointing to Java homes and Android SDK paths. Just &lt;code&gt;npm install -g drengr&lt;/code&gt;, plug in the phone, and tell Claude what to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Open YouTube and find a video about MCP servers"
&lt;/h2&gt;

&lt;p&gt;I typed that into Claude. Here's what happened over the next 40 seconds:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Claude called &lt;code&gt;drengr_look&lt;/code&gt; — got a text description of the home screen&lt;/li&gt;
&lt;li&gt;It saw YouTube in the app list and called &lt;code&gt;drengr_do&lt;/code&gt; to launch it&lt;/li&gt;
&lt;li&gt;YouTube opened. Claude called &lt;code&gt;drengr_look&lt;/code&gt; again — got the YouTube home feed as a list of labeled elements&lt;/li&gt;
&lt;li&gt;It tapped the search bar, typed "MCP servers," and hit search&lt;/li&gt;
&lt;li&gt;Results appeared. Claude read the titles and tapped the most relevant video&lt;/li&gt;
&lt;li&gt;The video started playing. Claude confirmed: "Found and playing 'MCP Server Explained' by IBM Technology"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Six actions. No scripts. No selectors. Claude read the screen, made decisions, and executed actions — exactly like a human would, except it took 40 seconds instead of 2 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The moment it got interesting
&lt;/h2&gt;

&lt;p&gt;I then asked: "Now go to Shorts and swipe through a few."&lt;/p&gt;

&lt;p&gt;Claude navigated to the Shorts tab, swiped up three times, read the titles of each short, and told me what it saw. It handled the vertical scroll, the full-screen video player, the overlay buttons — all without any special configuration.&lt;/p&gt;

&lt;p&gt;This is the kind of interaction that breaks traditional test frameworks. Shorts uses a custom renderer, the UI tree is minimal, the scroll behavior is non-standard. A selector-based test would need a custom handler for every quirk. Claude just... used the app.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's actually happening under the hood
&lt;/h2&gt;

&lt;p&gt;Claude doesn't see the phone directly. Drengr sits in between:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude&lt;/strong&gt; → calls MCP tools → &lt;strong&gt;Drengr&lt;/strong&gt; → talks to the device → &lt;strong&gt;Phone&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drengr handles the messy parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capturing the screen and parsing the UI tree into a format the AI can read&lt;/li&gt;
&lt;li&gt;Translating "tap element 3" into the right platform command&lt;/li&gt;
&lt;li&gt;Reporting back what changed after every action (the situation report)&lt;/li&gt;
&lt;li&gt;Detecting if the app crashed or the UI got stuck&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude handles the smart parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Looking at the screen description and deciding what to do&lt;/li&gt;
&lt;li&gt;Adapting when something unexpected happens&lt;/li&gt;
&lt;li&gt;Knowing when the task is complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation is deliberate. The AI is the brain, Drengr is the hands. When better AI models come out, Drengr doesn't need to change — the hands stay the same, the brain gets smarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The things that surprised me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;It recovers from mistakes.&lt;/strong&gt; At one point Claude tapped the wrong video. It noticed the title didn't match what it expected, pressed back, and picked the right one. No retry logic, no error handling code — the AI just adapted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It works across apps.&lt;/strong&gt; I asked Claude to "check my notifications" after the YouTube test. It pressed home, pulled down the notification shade, read the notifications, and summarized them. No app-specific setup needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Text mode is almost always enough.&lt;/strong&gt; Out of ~30 actions across the session, Claude only needed the annotated screenshot twice — both times on screens with custom-rendered content. The rest worked with the ~300 token text description. That's 10x cheaper than sending images every step.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it can't do (yet)
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend this replaces manual testing today. Some limits are real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt; — Each step takes 2-3 seconds (LLM round-trip). A human tester can tap faster. But the human can't run 50 test flows in parallel on a device farm.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual verification&lt;/strong&gt; — Claude can tell if an element exists, but not if it "looks right." Color, alignment, spacing — these need human eyes or a visual regression tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex gestures&lt;/strong&gt; — Standard taps, swipes, long presses, and pinch zooms work. But game-specific multi-touch patterns aren't there yet.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The sweet spot today is regression testing: "does the checkout flow still work after this deploy?" That's the 80% of QA time that's spent running the same flows every sprint. Let the AI handle that, and let humans focus on exploratory testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; drengr
drengr doctor          &lt;span class="c"&gt;# check your setup&lt;/span&gt;
drengr setup &lt;span class="nt"&gt;--client&lt;/span&gt; claude-desktop  &lt;span class="c"&gt;# generate MCP config&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connect a device, open your AI client, and tell it what to test. The first time an AI agent navigates your app without a single line of test code, you'll understand why I built this.&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Your Mobile QA Team Is Still Writing XPath. In 2026.</title>
      <dc:creator>Sharmin Sirajudeen</dc:creator>
      <pubDate>Sun, 05 Apr 2026 02:54:42 +0000</pubDate>
      <link>https://forem.com/sharminsirajudeen/your-mobile-qa-team-is-still-writing-xpath-in-2026-104g</link>
      <guid>https://forem.com/sharminsirajudeen/your-mobile-qa-team-is-still-writing-xpath-in-2026-104g</guid>
      <description>&lt;h1&gt;
  
  
  Your Mobile QA Team Is Still Writing XPath. In 2026.
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I'm the creator of &lt;a href="https://drengr.dev" rel="noopener noreferrer"&gt;Drengr&lt;/a&gt;, an MCP server that gives AI agents eyes and hands on mobile devices. I started this blog to share the engineering behind it. No pretending to be a neutral observer writing a think piece — I built this, and I'm here to talk about it.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The test that breaks every sprint
&lt;/h2&gt;

&lt;p&gt;You know the drill. Your QA engineer writes a beautiful test suite. Login, browse catalog, add to cart, checkout. Fifty selectors, careful waits, retry logic for flaky network calls. It passes on Monday.&lt;/p&gt;

&lt;p&gt;Tuesday, the design team moves the checkout button. Three selectors break. The test fails. The QA engineer spends half a day updating locators. The test passes again.&lt;/p&gt;

&lt;p&gt;Wednesday, a new feature adds a bottom sheet that overlaps the cart icon. The tap lands on the sheet instead of the cart. The test fails. Another half day.&lt;/p&gt;

&lt;p&gt;This cycle repeats every sprint, in every mobile team, everywhere. The test suite doesn't test the app anymore — it tests whether the selectors still match the UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The root cause: selectors were never the right abstraction
&lt;/h2&gt;

&lt;p&gt;XPath, resource IDs, accessibility identifiers — they're all addresses. "Tap the element at this path in the view hierarchy." The moment the hierarchy changes, the address is wrong.&lt;/p&gt;

&lt;p&gt;Humans don't navigate apps by address. They look at the screen, see "Checkout," and tap it. They don't care that the button moved from &lt;code&gt;//android.widget.Button[@resource-id='checkout_btn']&lt;/code&gt; to &lt;code&gt;//android.widget.FrameLayout/android.widget.Button[2]&lt;/code&gt;. They just see the button and tap it.&lt;/p&gt;

&lt;p&gt;AI agents can do the same thing — if you give them the screen, not a selector tree.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "giving them the screen" looks like
&lt;/h2&gt;

&lt;p&gt;When an AI agent connects to Drengr, it asks: "What's on screen?" Drengr responds with either a compact text description:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] "Checkout" (Button)
[2] "Your Cart: 3 items" (TextView)
[3] "Remove" (Button)
[4] "Continue Shopping" (Button)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or an annotated image with numbered elements. The AI reads this, decides "tap element 1," and calls &lt;code&gt;drengr_do&lt;/code&gt;. After the action, it gets a situation report telling it what changed.&lt;/p&gt;

&lt;p&gt;No selectors. No XPath. No element IDs to maintain. The AI sees the screen the way a human does — by what's visible, not by where it lives in the view tree.&lt;/p&gt;

&lt;h2&gt;
  
  
  "But what about reliability?"
&lt;/h2&gt;

&lt;p&gt;Fair question. If the AI is interpreting the screen every time, doesn't that introduce non-determinism?&lt;/p&gt;

&lt;p&gt;Yes. And that's the point. A deterministic test that breaks when the UI changes isn't reliable — it's rigid. An AI agent that adapts to UI changes is more reliable in practice because it handles the variations that break selector-based tests.&lt;/p&gt;

&lt;p&gt;Drengr adds guardrails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stuck detection&lt;/strong&gt; — if the screen doesn't change after an action, the agent knows to try something else&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crash detection&lt;/strong&gt; — if the app dies, the agent knows immediately and can restart&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Situation reports&lt;/strong&gt; — after every action, the agent gets a structured diff of what changed, so it stays oriented&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI doesn't just blindly tap. It observes, acts, and adapts. That's more robust than a fixed script that works exactly one way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost argument
&lt;/h2&gt;

&lt;p&gt;"AI calls are expensive." Sure, if you're sending screenshots to GPT-4o on every step.&lt;/p&gt;

&lt;p&gt;Drengr's text-only mode compresses a screen to ~300 tokens. A 15-step test flow costs about $0.05 on GPT-4o pricing. The same flow with screenshots costs $0.45.&lt;/p&gt;

&lt;p&gt;But here's the real cost comparison: how much does your QA team spend maintaining selectors? If one engineer spends 2 hours a week updating broken tests, that's $5,000/month in salary going to XPath maintenance. The AI API costs are rounding errors next to that.&lt;/p&gt;

&lt;h2&gt;
  
  
  A test suite that survives redesigns
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;com.example.shop&lt;/span&gt;
&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;browse&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;wireless&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;earbuds&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;result"&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;purchase&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cart&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;checkout&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;card"&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;90s&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;verify&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Go&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;order&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;history&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;verify&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;order&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;appears"&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;45s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This YAML survived 3 redesigns of our test app. The checkout flow moved from a separate page to a bottom sheet to a full-screen modal. The YAML didn't change. The AI adapted every time because it reads the screen, not the selectors.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;drengr test tests.yml&lt;/code&gt; runs it. JUnit XML output plugs into any CI pipeline. No Appium server to maintain, no Selenium grid, no element locator spreadsheet.&lt;/p&gt;

&lt;h2&gt;
  
  
  This isn't theoretical
&lt;/h2&gt;

&lt;p&gt;Drengr runs on real Android phones, iOS simulators, and cloud device farms (BrowserStack, SauceLabs, AWS Device Farm, LambdaTest, Perfecto, Kobiton). It connects to any MCP-compatible AI client — Claude Desktop, Cursor, Windsurf, VS Code.&lt;/p&gt;

&lt;p&gt;One binary. One install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; drengr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your QA team can stop writing XPath. The AI can read the screen.&lt;/p&gt;

</description>
      <category>appium</category>
      <category>mobiledev</category>
      <category>testing</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
