<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shubham Verma</title>
    <description>The latest articles on Forem by Shubham Verma (@shubham_verma_8f24ba13c9b).</description>
    <link>https://forem.com/shubham_verma_8f24ba13c9b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3704737%2Fa382c0d3-1e76-4236-b59a-9f0bb185134f.png</url>
      <title>Forem: Shubham Verma</title>
      <link>https://forem.com/shubham_verma_8f24ba13c9b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shubham_verma_8f24ba13c9b"/>
    <language>en</language>
    <item>
      <title>Why Streaming AI Responses Feels Faster Than It Is (Android + SSE)</title>
      <dc:creator>Shubham Verma</dc:creator>
      <pubDate>Mon, 12 Jan 2026 08:01:49 +0000</pubDate>
      <link>https://forem.com/shubham_verma_8f24ba13c9b/why-streaming-ai-responses-feels-faster-than-it-is-android-sse-2o6f</link>
      <guid>https://forem.com/shubham_verma_8f24ba13c9b/why-streaming-ai-responses-feels-faster-than-it-is-android-sse-2o6f</guid>
      <description>&lt;p&gt;AI models have become incredibly fast. &lt;br&gt;
Network latency has improved.&lt;br&gt;
Yet many AI chat apps still &lt;strong&gt;feel&lt;/strong&gt; slow.&lt;br&gt;
This isn’t a hardware problem or a model problem.&lt;br&gt;&lt;br&gt;
It’s a &lt;strong&gt;user experience problem&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: AI Chat Apps &lt;em&gt;Feel&lt;/em&gt; Slow
&lt;/h2&gt;

&lt;p&gt;When a user sends a message and the UI stays blank even briefly the brain interprets that silence as delay.&lt;/p&gt;

&lt;p&gt;From the user’s perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Did my message go through?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Is the app frozen?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Is the model slow?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In most cases, none of this is true.&lt;/p&gt;

&lt;p&gt;But perception matters more than reality.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Latency in AI apps is psychological before it is technical.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Waiting for the Full Response Breaks UX
&lt;/h2&gt;

&lt;p&gt;Many AI chat apps follow a simple pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send the prompt
&lt;/li&gt;
&lt;li&gt;Wait for the full response
&lt;/li&gt;
&lt;li&gt;Render everything at once
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Technically, this works.&lt;/p&gt;

&lt;p&gt;From a UX standpoint, it fails.&lt;/p&gt;

&lt;p&gt;Humans are extremely sensitive to silence in interactive systems. Even a few hundred milliseconds without visible feedback creates uncertainty. Loading spinners help, but they still feel disconnected from the response itself.&lt;/p&gt;

&lt;p&gt;This is the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Actual latency&lt;/strong&gt; → how long the system takes
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perceived latency&lt;/strong&gt; → how long it &lt;em&gt;feels&lt;/em&gt; like it takes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most AI apps optimize the former and ignore the latter.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foi050mo59xogyuvc8h2e.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foi050mo59xogyuvc8h2e.gif" alt="Demo video" width="358" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming Is the Obvious Fix and Why It’s Not Enough
&lt;/h2&gt;

&lt;p&gt;Streaming responses token by token improves responsiveness immediately.&lt;/p&gt;

&lt;p&gt;As soon as text starts appearing, users know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system is working&lt;/li&gt;
&lt;li&gt;Their input was received&lt;/li&gt;
&lt;li&gt;Progress is happening&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Technologies like &lt;strong&gt;Server-Sent Events (SSE)&lt;/strong&gt; make this straightforward.&lt;/p&gt;

&lt;p&gt;However, naive streaming introduces a new problem.&lt;/p&gt;

&lt;p&gt;Modern models can generate text extremely fast. Rendering tokens as they arrive causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bursty text updates
&lt;/li&gt;
&lt;li&gt;Jittery sentence formation
&lt;/li&gt;
&lt;li&gt;Broken reading flow
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, entire words or clauses can appear at once, breaking natural reading rhythm.&lt;/p&gt;

&lt;p&gt;At that point, the interface is fast but exhausting.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Streaming fixes &lt;strong&gt;speed&lt;/strong&gt;, but can hurt &lt;strong&gt;readability&lt;/strong&gt; if done carelessly.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Core Insight: Decoupling Network Speed from Visual Speed
&lt;/h2&gt;

&lt;p&gt;Network speed and human reading speed are fundamentally different.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Servers operate in milliseconds
&lt;/li&gt;
&lt;li&gt;Humans read in chunks, pauses, and patterns
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the UI mirrors the network exactly, users are forced to adapt to machine behaviour.&lt;/p&gt;

&lt;p&gt;A better approach is the opposite:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Make the UI adapt to humans, not servers.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of rendering text immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incoming tokens are buffered
&lt;/li&gt;
&lt;li&gt;The UI consumes them at a controlled pace
&lt;/li&gt;
&lt;li&gt;The experience feels calm, intentional, and readable
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To do this, I introduced a &lt;strong&gt;&lt;code&gt;StreamingTextController&lt;/code&gt;&lt;/strong&gt; a small but critical layer that sits between the network and the UI.&lt;/p&gt;

&lt;p&gt;Streaming isn’t just about showing text earlier.&lt;br&gt;&lt;br&gt;
It’s about showing it &lt;strong&gt;at the right pace&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the StreamingTextController Works (Conceptual)
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;StreamingTextController&lt;/code&gt; exists to separate &lt;strong&gt;arrival speed&lt;/strong&gt; from &lt;strong&gt;rendering speed&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Keeping this logic outside the ViewModel prevents timing concerns from leaking into state management.&lt;/p&gt;

&lt;p&gt;At a high level:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tokens arrive via SSE&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tokens are buffered&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controlled consumption&lt;/strong&gt; at a steady, human-friendly rate
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progressive UI rendering&lt;/strong&gt; via state updates
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From the UI’s perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text grows smoothly
&lt;/li&gt;
&lt;li&gt;Sentences form naturally
&lt;/li&gt;
&lt;li&gt;Network volatility is invisible
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This mirrors how humans process information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We read in bursts, not characters
&lt;/li&gt;
&lt;li&gt;Predictable pacing improves comprehension
&lt;/li&gt;
&lt;li&gt;Reduced jitter lowers cognitive load
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What this controller is &lt;em&gt;not&lt;/em&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Not a typing animation
&lt;/li&gt;
&lt;li&gt;Not an artificial delay
&lt;/li&gt;
&lt;li&gt;Not a workaround for slow models
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s a &lt;strong&gt;UX boundary&lt;/strong&gt; translating machine output into human interaction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Decisions: Making Streaming Production-Ready
&lt;/h2&gt;

&lt;p&gt;Streaming only works long-term if it remains stable and testable.&lt;/p&gt;

&lt;p&gt;Responsibilities are clearly separated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Network layer&lt;/strong&gt; → emits raw tokens
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;StreamingTextController&lt;/strong&gt; → pacing &amp;amp; buffering
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ViewModel (MVVM)&lt;/strong&gt; → lifecycle &amp;amp; immutable state
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI (Jetpack Compose)&lt;/strong&gt; → declarative rendering
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Technologies used intentionally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kotlin Coroutines + Flow
&lt;/li&gt;
&lt;li&gt;Jetpack Compose
&lt;/li&gt;
&lt;li&gt;Hilt
&lt;/li&gt;
&lt;li&gt;Clean Architecture
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal wasn’t novelty.&lt;br&gt;&lt;br&gt;
It was &lt;strong&gt;predictable behaviour under load and across devices.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpid3hvrb7pp6vusi1ft7.jpeg" alt="Structure diagram" width="800" height="1200"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Common Mistakes When Building Streaming UIs
&lt;/h2&gt;

&lt;p&gt;Some easy mistakes to make:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Updating the UI on every token
&lt;/li&gt;
&lt;li&gt;Binding rendering speed to model speed
&lt;/li&gt;
&lt;li&gt;No buffering or back-pressure
&lt;/li&gt;
&lt;li&gt;Timing logic inside UI code
&lt;/li&gt;
&lt;li&gt;Treating streaming as an animation
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Streaming is not about visual flair.&lt;br&gt;&lt;br&gt;
It’s about &lt;strong&gt;reducing cognitive load&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Beyond Chat Apps
&lt;/h2&gt;

&lt;p&gt;The same principles apply to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live transcription&lt;/li&gt;
&lt;li&gt;AI summaries&lt;/li&gt;
&lt;li&gt;Code assistants&lt;/li&gt;
&lt;li&gt;Search explainers&lt;/li&gt;
&lt;li&gt;Multimodal copilots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As AI systems get faster, &lt;strong&gt;UX not model speed becomes the differentiator&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo &amp;amp; Source Code
&lt;/h2&gt;

&lt;p&gt;This project is open source and meant as a reference implementation.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/sh7verma/AiChat" rel="noopener noreferrer"&gt;https://github.com/sh7verma/AiChat&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SSE streaming setup
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;StreamingTextController&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Jetpack Compose chat UI
&lt;/li&gt;
&lt;li&gt;Clean, production-ready structure
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Users don’t care how fast your model is.&lt;/li&gt;
&lt;li&gt;They care how fast your product &lt;strong&gt;feels&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Streaming reduces uncertainty.
&lt;/li&gt;
&lt;li&gt;Pacing restores clarity.&lt;/li&gt;
&lt;li&gt;Good AI UX sits at the intersection of both.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>android</category>
      <category>ai</category>
      <category>ux</category>
      <category>kotlin</category>
    </item>
  </channel>
</rss>
