<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ably Blog</title>
    <description>The latest articles on Forem by Ably Blog (@ablyblog).</description>
    <link>https://forem.com/ablyblog</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F987387%2Ff9d0ea92-d06e-46d6-8efc-6e92b510943e.png</url>
      <title>Forem: Ably Blog</title>
      <link>https://forem.com/ablyblog</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ablyblog"/>
    <language>en</language>
    <item>
      <title>Appends for AI apps: Stream into a single message with Ably AI Transport</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Thu, 26 Feb 2026 12:05:30 +0000</pubDate>
      <link>https://forem.com/ablyblog/appends-for-ai-apps-stream-into-a-single-message-with-ably-ai-transport-398a</link>
      <guid>https://forem.com/ablyblog/appends-for-ai-apps-stream-into-a-single-message-with-ably-ai-transport-398a</guid>
      <description>&lt;p&gt;Streaming tokens is easy. Resuming cleanly is not. A user refreshes mid-response, another client joins late, a mobile connection drops for 10 seconds, and suddenly your "one answer" is 600 tiny messages that your UI has to stitch back together. Message history turns into fragments. You start building a side store just to reconstruct "the response so far".&lt;/p&gt;

&lt;p&gt;This is not a model problem. It's a delivery problem&lt;/p&gt;

&lt;p&gt;That's why we developed message appends for &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt;. Appends let you stream AI output tokens into a single message as they are produced, so you get progressive rendering for live subscribers and a clean, compact response in history.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The failure mode we're fixing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The usual implementation is to stream each token as a single message, which is simple and works perfectly on a stable connection. In production, clients disconnect and resume mid-stream: refreshes, mobile dropouts, backgrounded tabs, and late joins.&lt;/p&gt;

&lt;p&gt;Once you have real reconnects and refreshes, you inherit work you did not plan for: ordering, dedupe, buffering, "latest wins" logic, and replay rules that make history and realtime agree. You can build it, but it is the kind of work that quietly eats weeks of engineering time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxg5b1j6o07bcp2xkds.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxg5b1j6o07bcp2xkds.png" alt=" " width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With appends you can avoid that by changing the shape of the data. Instead of hundreds of token messages, you have one response message whose content grows over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pattern: create once, append many
&lt;/h3&gt;

&lt;p&gt;In Ably AI Transport, you publish an initial response message and capture its server-assigned serial. That serial is what you append to.&lt;/p&gt;

&lt;p&gt;It's a small detail that ends up doing a lot of work for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;serials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;msgSerial&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, as your model yields tokens, you append each fragment to that same message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msgSerial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;What changes for clients&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Subscribers still see progressive output, but they see it as actions on the same message serial. A response starts with a create, tokens arrive as appends, and occasionally clients may receive a full-state update to resynchronise (for example after a reconnection).&lt;/p&gt;

&lt;p&gt;Most UIs end up implementing this shape anyway. With appends, it becomes boring and predictable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message.append&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;renderAppend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message.update&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;renderReplace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important difference is that history and realtime stop disagreeing, without your client code doing any extra work. You render progressively for live users, and you still treat the response as one message for storage, retrieval, and rewind.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Reconnects and refresh stop being special cases&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Short disconnects are one thing. Refresh is the painful case, because local state is gone and to stream each token as a single message forces you into replaying fragments and hoping the client reconstructs the same response.&lt;/p&gt;

&lt;p&gt;With message-per-response, hydration is straightforward because there is always a current accumulated version of the response message. Clients joining late or reloading can fetch the latest state as a single message and continue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/docs/channels/options/rewind" rel="noopener noreferrer"&gt;Rewind&lt;/a&gt; and history become useful again because you are rewinding meaningful messages, not token confetti:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;realtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai:chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rewind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2m&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Token rates without token-rate pain
&lt;/h3&gt;

&lt;p&gt;Models can emit tokens far faster than most realtime setups want to publish. If you publish a message per token, rate limits become your problem and your agent has to handle batching in your code.&lt;/p&gt;

&lt;p&gt;Appends are designed for high-frequency workloads and include automatic rollups. Subscribers still receive progressive updates, but Ably can roll up rapid appends under the hood so you do not have to build your own throttling layer.&lt;/p&gt;

&lt;p&gt;If you need to tune the tradeoff between smoothness and message rate, you can adjust appendRollupWindow. Smaller windows feel more responsive but consume more message-rate capacity. Larger windows batch more aggressively but arrive in bigger chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enabling appends&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Appends require the "Message annotations, updates, appends, and deletes" channel rule for the namespace you're using. Enabling it also means messages are persisted, which affects usage and billing.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why this is a better default for AI output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you are shipping agentic AI apps, you eventually need three things at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;streaming UX&lt;/li&gt;
&lt;li&gt;history that's usable&lt;/li&gt;
&lt;li&gt;recovery that does not depend on luck&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Appends are how you get there without building your own "message reconstruction" subsystem. If you want the deeper mechanics (including the message-per-response pattern and rollup tuning), the &lt;a href="https://ably.com/docs/ai-transport" rel="noopener noreferrer"&gt;AI Transport docs&lt;/a&gt; are the best place to start.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>streaming</category>
      <category>realtime</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Realtime steering: interrupt, barge-in, redirect, and guide the AI</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Mon, 09 Feb 2026 09:58:06 +0000</pubDate>
      <link>https://forem.com/ablyblog/realtime-steering-interrupt-barge-in-redirect-and-guide-the-ai-22ai</link>
      <guid>https://forem.com/ablyblog/realtime-steering-interrupt-barge-in-redirect-and-guide-the-ai-22ai</guid>
      <description>&lt;p&gt;Start typing, change your mind, redirect the AI mid-response. It just works. That is the promise of realtime steering. Users expect to interrupt an answer, correct its direction, or inject new instructions on the fly without losing context or restarting the session. It feels simple, but delivering it requires low-latency control signals, reliable cancellation, and shared conversational state that survives disconnects and device switches. This post explores why expectations have shifted, why today's stacks struggle with these patterns, and what your infrastructure needs to support proper realtime steering.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What's changing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AI tools are moving beyond static, one-turn interactions. Users expect to interact dynamically, especially in chat. But most AI systems today force users to wait while the assistant responds in full, even if it's off-track or no longer relevant. That's not how human conversations work.&lt;/p&gt;

&lt;p&gt;Expectations are shifting toward something more natural. Users want to jump in mid-stream, adjust the AI's course, or stop it altogether. These patterns (barge-in, redirect, steer) are becoming table stakes for responsive, agentic assistants.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What users want, and why this enhances the experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Users want to stay in control of the conversation. If the AI starts drifting, they want to say "stop" or "try a different angle" and get an immediate course correction. They want to guide the assistant's direction without breaking the flow or starting over.&lt;/p&gt;

&lt;p&gt;This improves trust, keeps sessions on-topic, and avoids wasted time. It also brings AI interactions closer to how real collaboration works: iterative, reactive, fast.&lt;/p&gt;

&lt;p&gt;Users now expect a few technical behaviours as part of that experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Responses can be interrupted in real time&lt;/li&gt;
&lt;li&gt;New instructions are applied mid-stream without reset&lt;/li&gt;
&lt;li&gt;The AI keeps context and adjusts without losing the thread&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why realtime steering is proving hard to build&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most AI systems treat generation as a one-way stream. Once the model starts producing tokens, the system just plays them out to the client. If the user wants to interrupt or change direction, the only real option is to cancel and resend a new prompt - often from scratch. That's because most systems today cannot support mid-stream redirection because their underlying communication model does not allow it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateless HTTP cannot carry steering signals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional request–response models push output in one direction only. Once a long-running generation begins, there is no reliable way to send control signals back to the server. Cancelling or redirecting usually means tearing down the stream and starting again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser-held state breaks immediately&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most apps keep the state of an active generation in the browser. If the user refreshes or switches device, the in-flight response loses continuity. Any client-side steering logic tied to that state vanishes too, which forces a full reset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend models often run without shared conversational state&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the orchestration layer is not tracking what the AI is currently doing, it cannot apply corrections cleanly. The model receives a brand-new prompt instead of a context-preserving instruction layered onto an active task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The default stack was never designed for low-latency control loops&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Steering requires coordinated signalling between UI, transport, orchestration, and model inference. That means ordering guarantees, durable state, and fast propagation of control messages. Without these, the AI continues generating tokens after a user says stop, causing confusion and wasted compute.&lt;/p&gt;

&lt;p&gt;Steering mid-stream looks like a simple UX gesture. It is not. It is a distributed-systems problem sitting under a conversational interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why you need a drop-in AI transport layer for steering&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Delivering realtime control requires more than token streaming. It requires a transport layer that keeps context alive, supports low-latency bidirectional messaging, and ensures that user instructions and model output remain synchronised.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bi-directional, low-latency messaging&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Client-side signals such as "stop" or "try this instead" must reach the backend quickly and reliably. WebSockets or similar long-lived connections make this possible by enabling client-to-server control while the &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;AI continues to stream output.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpa1kjj7ko14raf19ty0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpa1kjj7ko14raf19ty0.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliable interrupt and cancellation primitives&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stopping generation must be instant and clean. The transport must carry cancellation events with ordering guarantees so the backend halts inference exactly where intended, without corrupting state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session continuity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system needs persistent session identity so instructions and outputs are tied to the same conversational thread. Redirection should extend the session, not rebuild it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg6p1z9am78x5bck9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwg6p1z9am78x5bck9a.png" alt=" " width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presence and focus tracking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If users have &lt;a href="https://ably.com/blog/cross-device-ai-sync" rel="noopener noreferrer"&gt;multiple tabs or devices&lt;/a&gt; open, the system needs to know where instructions are coming from. Steering messages must route to the correct active session without collisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx70zhamocl1d04pebxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx70zhamocl1d04pebxr.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Realtime steering relies on a transport layer designed for conversational control, not just message delivery.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How the experience maps to the transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User experience desired&lt;/th&gt;
&lt;th&gt;Required transport layer features&lt;/th&gt;
&lt;th&gt;Underlying technical implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Interrupt and redirect responses in real time&lt;/td&gt;
&lt;td&gt;Bi-directional messaging&lt;/td&gt;
&lt;td&gt;WebSocket-based channels enabling client-to-server signals during output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cancel generation cleanly&lt;/td&gt;
&lt;td&gt;Interrupt primitives&lt;/td&gt;
&lt;td&gt;Server-side control hooks to stop model inference and close stream pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preserve continuity after steering&lt;/td&gt;
&lt;td&gt;Session continuity&lt;/td&gt;
&lt;td&gt;Persistent session or conversation IDs with context caching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update response direction on the fly&lt;/td&gt;
&lt;td&gt;Dynamic state sync&lt;/td&gt;
&lt;td&gt;Shared state model where new input is merged into active conversational context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Steer across devices&lt;/td&gt;
&lt;td&gt;Identity-aware multiplexing&lt;/td&gt;
&lt;td&gt;Fan-out model updates across all user sessions in sync&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Realtime steering for AI you can ship today&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You don't need a new architecture to support real-time steering, cancellation, or recovery. You need a transport layer that can keep the session alive, deliver messages in order, and preserve state across disconnects. &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; provides those foundations out of the box, so you can build controllable, resilient AI interactions without rebuilding your entire stack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/sign-up" rel="noopener noreferrer"&gt;Sign-up for a free account&lt;/a&gt; and try today.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>systemdesign</category>
      <category>ux</category>
    </item>
    <item>
      <title>Why orchestrators become a bottleneck in multi-agent AI published</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 03 Feb 2026 12:42:23 +0000</pubDate>
      <link>https://forem.com/ablyblog/why-orchestrators-become-a-bottleneck-in-multi-agent-ai-published-4mgf</link>
      <guid>https://forem.com/ablyblog/why-orchestrators-become-a-bottleneck-in-multi-agent-ai-published-4mgf</guid>
      <description>&lt;p&gt;Complex user tasks often need multiple AI agents working together, not just a single assistant. That's what agent collaboration enables. Each agent has its own specialism - planning, fetching, checking, summarising - and they work in tandem to get the job done. The experience feels intelligent and joined-up, not monolithic or linear. But making that work means more than prompt chaining or orchestration logic. It requires shared state, reliable coordination, and user-visible progress as agents branch out and converge again. This post explores what users now expect, why traditional infrastructure falls short, and how to support truly collaborative AI systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What's changing?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The shift from simple question-response to collaborative AI experiences goes beyond continuity or conversation. It's about delegation. Users are starting to expect AI systems that can take a complex request and break it down behind the scenes. That means not one big model doing everything, but a network of agents, each focused on a part of the task, coordinating to deliver a coherent outcome. We've seen this in tools like travel planners, research assistants, and document generators. You don't just want answers, you want progress, structure, and coordination you can see. The AI system shouldn't just feel like a chat thread, it should feel like a team quietly getting on with things while keeping you informed.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What users want, and why this enhances the experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When users interact with a system powered by multiple agents, they want to feel the benefits of parallelism without the overhead of managing complexity. If one agent is fetching flight data, another handling hotel options, and a third reviewing visa requirements, the user doesn't care about the internal plumbing. They care that their travel plan is evolving visibly and coherently. They want to see that agents are working, understand what's happening in realtime, and be able to intervene or revise things if needed.&lt;/p&gt;

&lt;p&gt;Crucially, users expect the state of their task to reflect reality, not just the conversation. If they change a hotel selection manually, the system should adapt. If an agent crashes or stalls, the UI should show it. The value isn't just in faster results, it's in reliability, transparency, and the sense that multiple agents are genuinely collaborating, with each other and with the user - toward a shared goal.&lt;/p&gt;

&lt;p&gt;To deliver this, agent systems need to stay in sync. State needs to be shared across agents and user sessions. Progress needs to be surfaced incrementally, not hidden behind a final answer. And context must be preserved so agents don't overwrite or duplicate each other's work. That's what turns a bunch of isolated model calls into a coordinated assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why this is proving challenging&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-agent systems &lt;em&gt;can&lt;/em&gt; work today, but the default pattern most tools push you toward is an orchestration-first user experience. Even when multiple agents are running behind the scenes, their activity is typically funnelled through a single orchestrator that becomes the only "voice" the user can see. That hides useful progress, creates a single bottleneck for updates, and limits how fluid the experience can feel.&lt;/p&gt;

&lt;p&gt;That's because traditional LLM interfaces assume a single stream of input and a single stream of output. Orchestration frameworks may invoke multiple agents in parallel, but the UI still tends to expose a linear, synchronous workflow: the orchestrator collects results, then reports back. If the user changes direction mid-process, or if an agent needs to react immediately to something in shared state, you're often forced back into "wait for the orchestrator" loops.&lt;/p&gt;

&lt;p&gt;The underlying infrastructure assumptions reinforce this. HTTP request/response cycles work well when one component is responsible for coordinating everything, but they make it awkward for &lt;em&gt;multiple&lt;/em&gt; agents to maintain an ongoing, direct connection to the user and to shared context. Token streaming helps, but it usually represents one agent's output to one user - not concurrent updates from a group of agents reacting in real time to a changing state.&lt;/p&gt;

&lt;p&gt;Ultimately, the challenge isn't that orchestration fails. It's that it constrains app developers. Most systems don't give you fine-grained control over which agent communicates what, when, and how, or an easy way to reflect multi-agent activity directly in the user experience. To build confidence and responsiveness, clients need to know which agents are active, what they're doing, and how that activity relates to the shared, realtime session context - without everything having to be mediated by a heavyweight orchestrator.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6e7vwnv8qc56l22wz5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6e7vwnv8qc56l22wz5h.png" alt=" " width="800" height="556"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why you need a drop-in AI transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To make multi-agent collaboration work in practice, you need infrastructure that handles concurrency, coordination, and visibility - not just messaging.&lt;/p&gt;

&lt;p&gt;The transport layer must support persistent, multiplexed communication where multiple agents can publish updates independently while still participating in the same user session. That gives app developers fine-grained control over the user experience: which agents speak to the user, when they speak, and how progress is presented. Orchestrators can still exist, but they don't have to mediate every user-facing update.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrkj6qtpjfddywuyvza6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrkj6qtpjfddywuyvza6.png" alt=" " width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  State synchronisation is non-negotiable
&lt;/h3&gt;

&lt;p&gt;Structured data, like a list of selected hotels or the current trip itinerary, should live in a realtime session store that agents and UIs can both read from and write to. This creates a  single source of truth, even when updates happen asynchronously, across devices, or outside the chat interface&lt;/p&gt;

&lt;h3&gt;
  
  
  Presence adds another layer of confidence
&lt;/h3&gt;

&lt;p&gt;When users see which agents are online and working, it sets expectations and builds trust. If an agent goes offline, the system should detect it, not leave the user guessing. This becomes even more important as these systems scale up in production environments where reliability is critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interruption handling rounds it out
&lt;/h3&gt;

&lt;p&gt;Users will change their minds mid-task. Your system needs to respond without the orchestrator agent tearing down and restarting everything. That means listening for user input while processing, canceling or rerouting tasks, and updating the shared state cleanly so individual agents can pick up where they left off or switch strategies on the fly.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How the experience maps to the transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User experience desired&lt;/th&gt;
&lt;th&gt;Required transport layer features&lt;/th&gt;
&lt;th&gt;Underlying technical implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Visible, concurrent agent progress&lt;/td&gt;
&lt;td&gt;Multiplexed pub/sub channels&lt;/td&gt;
&lt;td&gt;Multiple agents publish progress updates to a shared realtime channel the UI subscribes to&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared, up-to-date task state&lt;/td&gt;
&lt;td&gt;Structured state synchronisation&lt;/td&gt;
&lt;td&gt;Use of shared shared session state with clear schemas to reflect selections, status, and choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Seamless agent-to-agent coordination&lt;/td&gt;
&lt;td&gt;Out-of-band messaging support&lt;/td&gt;
&lt;td&gt;Internal HTTP APIs or RPC protocols between agents, decoupled from user-facing updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Awareness of system activity and health&lt;/td&gt;
&lt;td&gt;Presence tracking&lt;/td&gt;
&lt;td&gt;Agents register presence on connection and broadcast availability or error states&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graceful handling of mid-task changes&lt;/td&gt;
&lt;td&gt;Event-driven state updates and recovery&lt;/td&gt;
&lt;td&gt;Listen to user changes in shared state and cancel or adjust in-flight work accordingly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Making it work today&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-agent collaboration is already happening in planning tools, research systems, and internal automation workflows. The models are not the limiting factor. The hard part is the infrastructure that keeps agents in sync, shares state reliably, and exposes progress to users in real time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; gives you the infrastructure needed to support this pattern. Realtime channels, shared state objects, presence, and resilient connections provide the foundations for agents that coordinate reliably and surface their work as it happens. No rebuilds, no custom multiplexing, no home-grown state machinery.&lt;/p&gt;

&lt;p&gt;Sign-up for a free developer account and try it out.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Multi-agent AI systems need infrastructure that can keep up</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Fri, 30 Jan 2026 10:49:09 +0000</pubDate>
      <link>https://forem.com/ablyblog/multi-agent-ai-systems-need-infrastructure-that-can-keep-up-3aj7</link>
      <guid>https://forem.com/ablyblog/multi-agent-ai-systems-need-infrastructure-that-can-keep-up-3aj7</guid>
      <description>&lt;h2&gt;
  
  
  An Ably AI Transport demo
&lt;/h2&gt;

&lt;p&gt;When you're building agentic AI applications with multiple agents working together, the infrastructure challenges show up fast. Agents need to coordinate, users need visibility into what's happening, and the whole system needs to stay responsive even as tasks branch out across specialised workers.&lt;/p&gt;

&lt;p&gt;We built a multi-agent travel planning system to understand these problems better. What we learned applies well beyond holiday booking.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/mO53IQcHDaQ"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  The coordination problem
&lt;/h2&gt;

&lt;p&gt;The demo uses four agents: one orchestrator and three specialists (flights, hotels, activities). When a user asks to plan a trip, the orchestrator delegates sub-tasks to the specialists. Each specialist queries data sources, evaluates options, and reports back. The orchestrator synthesises everything and presents choices to the user.&lt;/p&gt;

&lt;p&gt;This mirrors how most teams are actually building agentic systems. You don't build one massive agent that tries to do everything. You build focused agents, give them specific tools, and coordinate between them.&lt;/p&gt;

&lt;p&gt;The infrastructure question is: how do you keep everyone (the agents and the user) synchronized as work happens?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why streaming alone isn't enough
&lt;/h2&gt;

&lt;p&gt;Token streaming solves part of this. The orchestrator can stream its responses back to the user so they're not waiting for complete answers. That's table stakes now for any AI interface.&lt;/p&gt;

&lt;p&gt;But streaming tokens from the orchestrator is only part of the problem. Users want visibility into the behaviour of each specialised agent – through their own token streams, structured updates like pagination progress, or the current reasoning of an agent working through a task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep58p3x4aoo90vi4oxrx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fep58p3x4aoo90vi4oxrx.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prompt: Plan a weekend trip to a nearby city&lt;/p&gt;

&lt;p&gt;In our AI Transport demo, we also use &lt;a href="https://ably.com/liveobjects" rel="noopener noreferrer"&gt;Ably LiveObjects&lt;/a&gt; to publish progress updates from each specialist agent. The user sees which agent is active (&lt;a href="https://ably.com/docs/presence-occupancy/presence" rel="noopener noreferrer"&gt;tracked via presence&lt;/a&gt;), what it's querying, and how much data it's processing. These aren't logs or debug output. They're structured state updates that drive the UI. The agent even decides how to represent its progress to the user, taking raw database query parameters and turning them into natural language descriptions through a separate model call.&lt;/p&gt;

&lt;p&gt;This requires infrastructure that can handle multiple publishers updating different parts of the shared state concurrently. The flight agent publishes its progress. The hotel agent publishes its progress. The orchestrator streams tokens (and it doesn't need to care about intermediate progress updates from the specialized agents). All on the same channel, all staying in sync.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1cl9ov6pxdjh2pgvkq4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1cl9ov6pxdjh2pgvkq4.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent searches for flights and hotels based of user's criteria&lt;/p&gt;

&lt;h2&gt;
  
  
  State that reflects reality, not just conversation
&lt;/h2&gt;

&lt;p&gt;Chat history creates a limited view of what's actually happening. If a user changes their mind, deletes a selection, or modifies something outside the conversation thread, the agent needs to know about it.&lt;/p&gt;

&lt;p&gt;We use Ably LiveObjects to maintain the user's current selections (flights, hotels, activities) and agent status. This creates a source of truth that exists independently of the conversation. The orchestrator can query this state directly through a tool call, even if nothing in the chat history explains the change.&lt;/p&gt;

&lt;p&gt;The interesting bit: agents can &lt;em&gt;subscribe&lt;/em&gt; to changes in this data, so they see updates live. While you could store this in a database and have agents query it via tool calls, the ability to subscribe means agents can react to user context in real time (what the user is doing in the app, data they're manipulating, configuration changes they're making).&lt;/p&gt;

&lt;p&gt;When the user asks "what's my current itinerary?", the agent doesn't rely on conversation history. It checks the actual state. If the user deleted their flight selection, the agent sees that immediately.&lt;/p&gt;

&lt;p&gt;This separation matters more as systems get complex. The conversation is one interface to the system. The actual state (what's selected, what's in progress, what's completed) needs to exist independently. Agents, users, and other parts of your system all need reliable access to current state, not a reconstruction from message history.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F41ec36b05l4v3tg5reuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F41ec36b05l4v3tg5reuh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent offers hotel options while remembering flight choice&lt;/p&gt;

&lt;h2&gt;
  
  
  Synchronising different types of state
&lt;/h2&gt;

&lt;p&gt;Not all state is created equal, and your infrastructure needs to handle different patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured, bounded state&lt;/strong&gt; works well with LiveObjects. Progress indicators (percentage complete, items processed), agent status (online, processing, completed), user selections, and configuration settings all have predictable size limits. Clients can subscribe to changes and re-render UI efficiently. Agents can read current state without parsing through message history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unbounded state&lt;/strong&gt; like full conversation history, audit trails, or complete reasoning chains still belongs in messages on a channel. You're appending to a growing log rather than updating bounded data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bidirectional state synchronization&lt;/strong&gt; enables richer interactions. You can sync agent state to users (progress updates, ETAs, task lists), let users configure controls for agents (settings, preferences, constraints), and give agents visibility into user context (where they are in the app, what they're doing, what data they're viewing). Each of these can use structured data patterns for efficient synchronization.&lt;/p&gt;

&lt;p&gt;The key is knowing which pattern fits which data, and having infrastructure that supports both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decoupling internal coordination from user-facing updates
&lt;/h2&gt;

&lt;p&gt;The agents in our demo communicate with each other over HTTP using agent-to-agent protocols. That's appropriate for internal coordination. It's synchronous, it's request-response, it follows established patterns.&lt;/p&gt;

&lt;p&gt;The user-facing updates go over Ably AI Transport. That's where you need state synchronization and the ability for multiple publishers to update different parts of the UI concurrently.&lt;/p&gt;

&lt;p&gt;This decoupling matters. Each agent can independently decide how to surface its progress updates and state to the user, while the user maintains a single shared view over updates from all agents.&lt;/p&gt;

&lt;p&gt;We also let specialist agents write directly to LiveObjects, bypassing the orchestrator. When the flight agent has progress to report, it writes it. The user sees it. The orchestrator never touches that data (it only needs the final result). This avoids additional coordination and keeps the architecture simpler.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling interruptions
&lt;/h2&gt;

&lt;p&gt;Users change their minds. They interrupt. They refine requests mid-task. Your infrastructure needs to support this without rebuilding everything from scratch.&lt;/p&gt;

&lt;p&gt;In the demo, you can barge in and interrupt the agent while it's working. The system detects the new input, cancels the in-flight task, updates the state, and kicks off a new search. The UI shows the cancellation, the new request, and the new progress, all without breaking the conversation.&lt;/p&gt;

&lt;p&gt;This works because state updates are events on a channel. The agents listen for new user input even while they're processing. When they see it, they can decide whether to cancel current work, adapt it, or complete it first. The infrastructure doesn't dictate this logic (it enables it).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo21gdpcgxxxbski836rt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo21gdpcgxxxbski836rt.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent then helps user select activities to do on trip&lt;/p&gt;

&lt;h2&gt;
  
  
  What presence actually tells you
&lt;/h2&gt;

&lt;p&gt;Before any interaction starts, the UI shows which agents are online. This comes from Presence. Each agent enters presence when it starts up and updates it as its status changes.&lt;/p&gt;

&lt;p&gt;Presence serves multiple purposes. Agents can see the online status of users and take action if a user goes offline (canceling tasks or queuing notifications – essential from a cost optimization perspective). In multi-user applications, users can see who else is online in the conversation. And for your operations team, it's observability built into the architecture. This answers a basic question for users: is this system actually working right now?&lt;/p&gt;

&lt;h2&gt;
  
  
  The enterprise patterns that emerge
&lt;/h2&gt;

&lt;p&gt;This travel demo is deliberately simple, but the patterns map directly to enterprise use cases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research workflows&lt;/strong&gt; where multiple agents pull from different data sources (financial databases, customer records, market data) and coordinate findings. Users need to see progress across all of them, not wait for a final answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document generation&lt;/strong&gt; where one agent structures the outline, others fill in sections, another handles compliance checks. The state (which sections are complete, which are being reviewed, what's been approved) needs to stay synchronized as different agents work in parallel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Customer support routing&lt;/strong&gt; where classification agents determine issue type, specialist agents handle resolution, and orchestration agents manage escalation. Status updates need to flow to support reps, customers, and dashboards in real time.&lt;/p&gt;

&lt;p&gt;The common thread: multiple agents, concurrent work, shared state, and humans who need visibility and control. The infrastructure that makes a travel planner responsive and reliable is the same infrastructure that makes these systems work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1qbrx7ghoy0ycph3dj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1qbrx7ghoy0ycph3dj0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Labelled screenshot of AI Travel Agent's moving parts&lt;/p&gt;

&lt;h2&gt;
  
  
  What this requires from infrastructure
&lt;/h2&gt;

&lt;p&gt;You need a reliable transport layer that allows concurrent agents and clients to communicate in realtime. This isn't just about pub/sub – it's about robust infrastructure, high availability, and &lt;a href="https://ably.com/topic/pubsub-delivery-guarantees" rel="noopener noreferrer"&gt;guaranteed delivery&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You need state synchronisation that works for both structured data and message logs. Having access to both patterns depending on your needs is critical–bounded state objects for UI updates and configuration, unbounded message streams for conversation history and audit trails.&lt;/p&gt;

&lt;p&gt;You need presence so you know what's actually online and available. You need &lt;a href="https://ably.com/docs/platform/architecture/connection-recovery" rel="noopener noreferrer"&gt;connection recovery&lt;/a&gt; so users don't lose context when networks flicker.&lt;/p&gt;

&lt;p&gt;Most importantly, you need this to work at the edge – in browsers and mobile apps, not just between backend services. That's where your users are. That's where responsiveness matters. The transport layer needs to be &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;robust enough to handle the reality of client connectivity&lt;/a&gt;: spotty networks, mobile handoffs, browser tabs backgrounded and resumed.&lt;/p&gt;

&lt;p&gt;The hard part of building multi-agent systems isn't the LLMs. The models are getting better every month. The hard part is the coordination, the state management, the visibility, and the reliability as these systems get more complex.&lt;/p&gt;

&lt;p&gt;This is why we built AI Transport. We saw teams struggling with these exact problems: cobbling together WebSocket libraries, building their own state synchronization, dealing with reconnection logic, and watching their systems break under the messiness of real client connectivity. &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;AI Transport gives you the infrastructure layer these systems need&lt;/a&gt;, built on Ably's proven reliability at scale, so you can focus on your agents instead of your transport layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building agentic AI experiences? You can ship it now
&lt;/h2&gt;

&lt;p&gt;This demo was built with &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt;. It's achievable today. You don't need to rebuild your stack to make it happen.&lt;/p&gt;

&lt;p&gt;Ably AI Transport provides all you need to support persistent, identity-aware, streaming AI experiences across multiple clients. If you're working on agentic products and want to get this right, improve the AI UX, we'd love to talk.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Anticipatory customer experience: How realtime infrastructure transforms CX</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Wed, 28 Jan 2026 10:27:17 +0000</pubDate>
      <link>https://forem.com/ablyblog/anticipatory-customer-experience-how-realtime-infrastructure-transforms-cx-3pn4</link>
      <guid>https://forem.com/ablyblog/anticipatory-customer-experience-how-realtime-infrastructure-transforms-cx-3pn4</guid>
      <description>&lt;p&gt;We're entering a new era of &lt;strong&gt;anticipatory customer experience&lt;/strong&gt; – one that's not just reactive, not just responsive, but truly predictive. In this new model, systems don't wait for friction to appear; they recognise signals early and step in before the user ever feels a slowdown or moment of uncertainty. The bar has shifted: customers now expect brands to predict their needs and act before friction even surfaces. It's a fundamental rewiring of the relationship between companies and the people they serve.&lt;/p&gt;

&lt;p&gt;This shift toward &lt;strong&gt;predictive customer experiences&lt;/strong&gt; isn't hypothetical. Anticipatory experiences are happening now, powered by &lt;strong&gt;realtime data infrastructure&lt;/strong&gt; that moves companies from playing catch-up to staying ahead. Think of it as the Age of Anticipation – where realtime signals, reliability, and adaptability form the core of modern CX design.&lt;/p&gt;

&lt;p&gt;Anticipatory CX isn't magic, it's just realtime infrastructure done right.&lt;/p&gt;

&lt;p&gt;So, if you're building next-generation CX or AI-powered agentic systems, this article outlines the architectural groundwork required to make anticipation real.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is anticipatory customer experience?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Anticipatory customer experience&lt;/strong&gt; uses realtime data infrastructure to predict and address customer needs before friction occurs. Unlike reactive support that waits for problems, anticipatory CX leverages continuous data streams, event-driven patterns, and predictive signals to intervene proactively, turning unknowns into reassurance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why realtime infrastructure matters for CX:&lt;/strong&gt; Realtime infrastructure enables the continuous flow of customer signals needed for prediction. Without it, systems rely on stale, batch-processed data that kills foresight. Companies like Doxy.me and HubSpot use &lt;strong&gt;realtime platforms&lt;/strong&gt; to anticipate confusion, delays, and churn risk before customers experience frustration.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;From reactive to anticipatory: Why realtime data infrastructure powers predictive CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Anticipation starts with having the right information at the right moment. But prediction requires fresh, &lt;strong&gt;realtime signals&lt;/strong&gt; flowing continuously through your systems.&lt;/p&gt;

&lt;p&gt;The healthcare sector illustrates this shift perfectly. &lt;a href="https://ably.com/case-studies/doxyme" rel="noopener noreferrer"&gt;Doxy.me&lt;/a&gt;, a telehealth platform trusted by hundreds of thousands of providers, faced a critical challenge: how do you anticipate patient confusion before it derails a virtual appointment? Their answer was "teleconsent" – a feature where healthcare providers walk patients through consent forms collaboratively, in real time.&lt;/p&gt;

&lt;p&gt;As the patient reads, fills in fields, and types responses, the provider sees every change as it happens. No refresh required. No lag. No wondering if the patient is stuck on question three. The system detects hesitation patterns and enables providers to intervene before confusion becomes abandonment. This is anticipatory CX in action – predicting friction points and addressing them before they escalate.&lt;/p&gt;

&lt;p&gt;But building this required infrastructure that could handle the continuous flow of patient interactions without introducing the very friction it was meant to eliminate. "The more that I can get my team to focus on healthcare business logic and less to focus on infrastructural data synchronisation, the better," explains Heath Morrison from Doxy.me. "Anything that provides higher level APIs to get us more in that space – and not be specialised in the stuff you guys should specialise in – is appealing and valuable to us."&lt;/p&gt;

&lt;p&gt;By rebuilding their realtime stack on reliable infrastructure, Doxy.me achieved a 65% cost reduction while transforming their system from a liability into a core strength. &lt;strong&gt;&lt;a href="https://ably.com/case-studies/doxyme" rel="noopener noreferrer"&gt;Read the full Doxy.me case study →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Retailers are doing similar work, spotting churn risk in realtime and intervening with targeted offers or support before the customer clicks away. Financial services companies are shifting from asking "what happened?" to "what's about to happen?" These aren't reactive fixes. They're &lt;strong&gt;anticipatory moves&lt;/strong&gt; that change outcomes – but only when the underlying data infrastructure can keep pace.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Realtime infrastructure&lt;/strong&gt; like &lt;a href="https://ably.com/pubsub" rel="noopener noreferrer"&gt;Ably's&lt;/a&gt; makes this possible – it's the unseen layer that ensures systems receive the continuous stream of signals they need to predict accurately, without lag or data loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Industries using anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Healthcare:&lt;/strong&gt; &lt;a href="https://ably.com/health-tech" rel="noopener noreferrer"&gt;Telehealth platforms&lt;/a&gt; use realtime infrastructure to anticipate patient needs, showing "doctor joining now" before patients wonder if something's wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Financial Services:&lt;/strong&gt; Banks predict fraud patterns and alert customers to unusual activity before money moves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retail:&lt;/strong&gt; E-commerce platforms spot abandonment signals and intervene with targeted offers before checkout failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logistics:&lt;/strong&gt; Delivery services flag delays and update ETAs before customers start refreshing tracking pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building trust through realtime customer engagement: The infrastructure foundation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Trust is built in moments of uncertainty. And anticipation? It turns unknowns into reassurance.&lt;/p&gt;

&lt;p&gt;Think about the last time you booked a rideshare or waited for a delivery. The difference between a company that leaves you guessing and one that proactively updates you – "your driver is two minutes away," "slight delay, new ETA: 3:47pm" – is the difference between anxiety and confidence. &lt;strong&gt;Realtime anticipation&lt;/strong&gt; doesn't just inform, it reassures.&lt;/p&gt;

&lt;p&gt;Telehealth platforms have figured this out. When patients see "doctor joining now" before they've even begun to wonder if something's wrong, it changes the entire experience. Logistics companies that flag delays before customers start refreshing tracking pages are doing the same thing: reducing friction before it becomes frustration.&lt;/p&gt;

&lt;p&gt;But there's a flip side: when realtime systems fail, trust erodes faster than it built up. A phantom notification, a delayed update, an inaccurate prediction – these aren't just technical hiccups. They're credibility problems. Reliability isn't a nice-to-have, it's the foundation. When customers cite Ably's five-plus years without a global outage, they're not celebrating uptime for its own sake. They're describing the baseline that makes anticipation possible at scale. &lt;a href="https://status.ably.io/" rel="noopener noreferrer"&gt;View Ably's live uptime status&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ably exists to be that foundation. The reason trust can scale across millions of interactions, without companies needing to worry about the underlying infrastructure failing at the worst moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Core technologies behind anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Core technology&lt;/th&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Realtime pub/sub messaging&lt;/td&gt;
&lt;td&gt;WebSocket-based event distribution for instant signal propagation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event-driven architecture&lt;/td&gt;
&lt;td&gt;Composable, adaptive systems that respond to customer signals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predictive analytics&lt;/td&gt;
&lt;td&gt;AI-powered interpretation of continuous data streams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continuous data streams&lt;/td&gt;
&lt;td&gt;Sub-6.5ms message delivery latency without polling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fault-tolerant infrastructure&lt;/td&gt;
&lt;td&gt;99.999% uptime requirements for maintaining trust&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Future-proofing customer experience: Event-driven architecture for anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To anticipate effectively, your CX stack needs to evolve as fast as your customers' expectations do. Rigid, monolithic architectures can't keep up with new signals, emerging channels, or changing customer behaviors. The future belongs to composable, &lt;strong&gt;event-driven systems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Doxy.me's evolution illustrates this perfectly. They built their realtime features organically – using PubNub to handle presence detection and state synchronisation, all ephemeral data that disappeared after each session. But as they planned their next phase, they hit a wall: they needed persistence. The ability to decouple patient workflows from video calls, support richer collaboration, maintain state across sessions, and plug in new capabilities without rebuilding their entire stack. They prototyped with Convex and loved the developer experience, but needed production-grade infrastructure that could slot into their Node/TypeScript/Postgres/AWS environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/pubsub" rel="noopener noreferrer"&gt;&lt;strong&gt;Event-driven architectures&lt;/strong&gt;&lt;/a&gt; make this kind of evolution possible. You can layer in predictive capabilities, plug in new communication channels, or add analytics tools – all without tearing everything down and starting over. One enterprise CX leader described it this way: "We used to dread adding new functionality. Now we think in terms of what events we need to listen for and what actions we want to trigger. It has completely changed our velocity."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/solutions/customer-experience-tech" rel="noopener noreferrer"&gt;Ably&lt;/a&gt; enables this kind of interoperability – CRMs, chat systems, analytics tools, customer-facing applications all publishing and subscribing to customer events in real time. WebSockets and pub/sub patterns ensure consistent, low-latency communication across every channel, without developers having to reinvent transport logic for each integration. It's the connective tissue that makes anticipatory systems work at scale.&lt;/p&gt;

&lt;p&gt;But more moving parts do mean more complexity. Companies need governance frameworks and resilience planning to ensure their adaptive architectures don't become fragile ones. The ones succeeding here aren't necessarily the ones with the newest tech – they're the ones who've built systems that can absorb change without breaking.&lt;/p&gt;

&lt;p&gt;The Age of Anticipation is composable. Adaptive, event-driven architecture is what makes foresight scalable.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to implement anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Establish realtime data infrastructure&lt;/strong&gt; – Replace polling with streaming architecture for continuous signal flow&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Implement event-driven pub/sub patterns&lt;/strong&gt; – Enable loosely coupled systems that respond to customer signals&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Build predictive models using continuous data&lt;/strong&gt; – Layer AI/ML on top of realtime streams for pattern recognition&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Create proactive intervention workflows&lt;/strong&gt; – Design automated responses to predictive signals (offers, alerts, support)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Monitor reliability metrics rigorously&lt;/strong&gt; – Track latency, uptime, message integrity to maintain trust at scale&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What makes this different&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most CX discussions focus on speed (faster responses, quicker resolutions.) But anticipation goes deeper. It's about infrastructure that doesn't just move data quickly, but does so reliably enough to build trust and flexibly enough to adapt as expectations evolve. &lt;a href="https://ably.com/four-pillars-of-dependability" rel="noopener noreferrer"&gt;Explore Ably's four pillars of dependability&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Realtime infrastructure&lt;/strong&gt; is the hidden enabler. It's what makes customer care feel effortless, predictive, and ultimately, more human. Not because it replaces human judgment, but because it removes the friction that gets in the way of delivering exceptional care.&lt;/p&gt;

&lt;p&gt;The companies winning in the Age of Anticipation aren't the ones with the flashiest technology demos. They're the ones who've built the unglamorous, reliable, adaptive infrastructure that makes anticipation possible at scale. They've realised that foresight isn't magic – it's architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Business impact of anticipatory customer experience&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.pwc.com/us/en/services/consulting/business-transformation/library/2025-customer-experience-survey.html" rel="noopener noreferrer"&gt;&lt;strong&gt;52% of consumers&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;stopped using brands after bad experiences&lt;/strong&gt; – making proactive, anticipatory CX non-negotiable (PwC 2025 Customer Experience Survey)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ably.com/case-studies/doxyme" rel="noopener noreferrer"&gt;&lt;strong&gt;65% cost reduction&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;achieved by Doxy.me&lt;/strong&gt; through realtime infrastructure that prevents issues versus fixing them reactively&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://broadbandbreakfast.com/four-predictions-for-customer-experience-in-2025/" rel="noopener noreferrer"&gt;&lt;strong&gt;61% of CX leaders deliver proactive communications&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;using AI&lt;/strong&gt;, while only 6% of laggards do, creating a significant competitive gap (Cisco research)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://status.ably.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;5+ years without a global outage&lt;/strong&gt;&lt;/a&gt; – Ably's proven track record demonstrates the reliability required for maintaining trust at enterprise scale&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nextiva.com/blog/customer-experience-insights.html" rel="noopener noreferrer"&gt;&lt;strong&gt;40% of companies plan to increase investment&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;in predictive instant experiences&lt;/strong&gt; in 2025, signalling industry-wide shift to anticipatory models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Three pillars of anticipatory CX&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Realtime data streams&lt;/strong&gt; – Fresh, continuous signals flowing through your systems without latency or data loss&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Reliability at scale&lt;/strong&gt; – Infrastructure trusted to maintain consistency across millions of interactions, measured in years of uptime&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Adaptive architecture&lt;/strong&gt; – Event-driven systems that evolve with customer expectations without requiring rebuilds&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Ready to build anticipatory experiences?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ably's realtime platform delivers the continuous data streams and event-driven patterns your systems need to anticipate customer needs, with the reliability required to maintain trust at scale.&lt;/p&gt;

&lt;p&gt;Six-plus years of 100% uptime. Sub-6.5ms message delivery latency. Built-in message integrity guarantees.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ably.com/cx-tech" rel="noopener noreferrer"&gt;See how Ably powers anticipatory CX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ably.com/blog/data-integrity-in-ably-pub-sub" rel="noopener noreferrer"&gt;Read more about the technicalities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ably.com/support" rel="noopener noreferrer"&gt;Start building free or talk to our team about your use case&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>How we built an AI-first culture at Ably</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 27 Jan 2026 10:44:50 +0000</pubDate>
      <link>https://forem.com/ablyblog/how-we-built-an-ai-first-culture-at-ably-3aid</link>
      <guid>https://forem.com/ablyblog/how-we-built-an-ai-first-culture-at-ably-3aid</guid>
      <description>&lt;p&gt;Most companies talk about being "AI-first." At Ably, we decided to actually become one. We build realtime infrastructure for AI applications. To do that credibly, we need to live and breathe AI ourselves – not just in our product, but in how we work every day.&lt;/p&gt;

&lt;p&gt;A year ago, we began a company-wide push for AI adoption. This post breaks down how we did it: the pillars, the tooling, the MCP advantage, the early mistakes, the wins across engineering, marketing, sales, and finance, and the cultural momentum that turned a mandate into a mindset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2hw90fipip8r3sz621w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2hw90fipip8r3sz621w.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building an AI-first company culture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When Jamie Newcomb, Ably's Head of Product Operations, began championing internal AI adoption, the approach was straightforward: everyone at Ably should explore how AI could make them more effective. No exceptions.&lt;/p&gt;

&lt;p&gt;"We might want to tone down the language," Jamie admits with a laugh, "but it really is mandated. Everyone at Ably should be using AI to see how they can make themselves more effective. But it's not just about doing things faster. It's about doing things you couldn't do before. The goal is to shift the mindset, where people stop asking 'can AI help with this?' and start assuming it can, then push further: what's now possible that wasn't?"&lt;/p&gt;

&lt;p&gt;Today, that mandate has evolved into something far more organic. A company-wide culture where AI isn't just accepted, it's expected.&lt;/p&gt;

&lt;p&gt;For a company processing &lt;a href="https://ably.com/docs/platform/architecture/platform-scalability" rel="noopener noreferrer"&gt;2 trillion operations monthly&lt;/a&gt;, this isn't about following trends, it's about credibility. It's about walking the walk. To build &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;AI Transport&lt;/a&gt; that developers can trust for agentic workloads, we need firsthand experience of how AI performs in real operational environments, both the advantages and the pitfalls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7he4g7vo9qxz4xk49s0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7he4g7vo9qxz4xk49s0.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Three pillars of successful AI adoption&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ably's approach to AI rests on three interconnected pillars:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Internal AI adoption and enablement&lt;/strong&gt;: Integrating AI into workflows and processes across every team to enhance capabilities and drive productivity improvements. The goal isn't just providing tools, it's automating repetitive, time-consuming tasks so people can focus on strategic thinking and creative problem-solving.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI developer experience&lt;/strong&gt;: Using AI to make Ably's platform more discoverable and easier to use for developers. This means AI-enhanced documentation, intelligent tooling, and optimized SDK experiences, empowering developers to build real-time products faster with the help of LLMs. The goal is to position Ably as essential infrastructure for real-time user experiences powered by AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI product enhancement&lt;/strong&gt;: Making proactive, explicit efforts to understand AI use cases where Ably delivers value, determining what we need to enable those use cases, and ensuring those capabilities are part of our roadmap. This pillar is about building infrastructure informed by real customer needs, both known and yet to be discovered.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"My main role is about process efficiency in product engineering," Jamie explains. "And that naturally extended to AI adoption. We believe there are significant productivity improvements we can make if everyone adopts AI thoughtfully across the company."&lt;/p&gt;

&lt;p&gt;These pillars aren't separate initiatives, they're a unified strategy. Internal productivity adoption teaches us what works in practice. Developer experience ensures we're making Ably discoverable and easy to use for the growing number of developers building with AI. And AI product enhancement ensures we're building infrastructure informed by real customer needs, not just theory. This article focuses primarily on the first pillar, but the three are deeply connected. What we learn from using AI internally shapes how we build for developers using AI externally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9t3d5qb2dlg5ovt3h1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9t3d5qb2dlg5ovt3h1t.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The MCP advantage&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Perhaps the most significant internal development has been Ably's adoption of the Model Context Protocol (MCP), built over the summer of 2025.&lt;/p&gt;

&lt;p&gt;"The Ably MCP connects all our internal tools together," Jamie explains. "It lets people access data across systems via AI assistants. Building this and seeing it genuinely change how people work has been incredibly rewarding."&lt;/p&gt;

&lt;p&gt;What started as an experiment to see what was possible has grown into a company-wide platform that's now critical to daily workflows, integrating 15+ services through over 140 tools. Engineers can check CI build status and debug workflow failures without leaving their conversation. Product managers search across Jira issues, GitHub PRs, and Slack threads in a single query. Sales teams pull Gong call transcripts and HubSpot contact history to prepare for customer meetings. The breadth is significant: GitHub, Jira, Confluence, Slack, HubSpot, Gong, Jellyfish, Metabase, PagerDuty, GSuite and more,  all accessible through natural conversation.&lt;/p&gt;

&lt;p&gt;Before MCP, every AI interaction started from zero,  engineers manually explaining Ably's infrastructure, marketers pasting in brand guidelines, constant context-switching that made AI feel like more work rather than less.&lt;/p&gt;

&lt;p&gt;Now when an Ably employee opens Claude, they're not starting from scratch. Through MCP, they have immediate access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shared context and prompt library&lt;/li&gt;
&lt;li&gt;Company knowledge and documentation&lt;/li&gt;
&lt;li&gt;Ably's tone of voice guidelines and style guides&lt;/li&gt;
&lt;li&gt;Live data from internal tools and systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of "Here's what Ably is, here's our tone of voice, now help me write this email," it becomes simply: "Help me write this customer email about latency improvements." The AI already knows.&lt;/p&gt;

&lt;p&gt;Scaling to 140+ tools created its own challenge: context limits. Ably solved this with a tool registry that lets the AI discover only what it needs for each task, keeping interactions lean and responsive.&lt;/p&gt;

&lt;p&gt;"That context library is really important," Jamie emphasises. "The prompts for critical workflows (like our ICP matching) are all version controlled. When something needs adjusting, it's not about AI being wrong. It's about iterating on what you're asking the AI to do."&lt;/p&gt;

&lt;p&gt;The platform continues to evolve based on team feedback. When engineers noticed they were dropping out the terminal to check GitHub Actions builds, new workflow tools were shipped within hours. Claude Code is used heavily to maintain and extend the MCP itself, with Claude's Agent SDK integrated throughout the development workflow. Using AI to build AI tooling is a big part of why the velocity is so high. That responsiveness, treating internal AI tooling as a living product rather than a one-off project, reflects how deeply AI has become embedded in Ably's operating culture.&lt;/p&gt;

&lt;p&gt;Jamie spoke at length with Jellyfish on how Ably moved beyond data retrieval to unlock real analysis and insights through MCP, and &lt;a href="https://jellyfish.co/blog/how-ably-makes-magic-with-jellyfishs-mcp/" rel="noopener noreferrer"&gt;you can read the full article here.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdf3nbs977hkjx64e5b8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdf3nbs977hkjx64e5b8x.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AI tool selection
&lt;/h2&gt;

&lt;p&gt;When Ably first encouraged company-wide AI adoption, the approach was deliberately open-ended. People experimented with ChatGPT, Claude, and workflow orchestration tools like N8N, Zapier, and Relay.&lt;/p&gt;

&lt;p&gt;"We've settled on Claude for our primary AI, particularly Claude Code for engineers, but people have the freedom to use whatever works best for them," Jamie says. "If someone has a strong case for a different tool, that's fine. We're not prescriptive about it."&lt;/p&gt;

&lt;p&gt;Everyone at Ably has access to Claude for day-to-day work, whether that's drafting documents, thinking through problems, or exploring ideas. For workflow automation, Relay emerged as the orchestration layer, handling the multi-stage pipelines that power lead enrichment, ICP scoring, and sales alerts. The combination of Claude for reasoning and Relay for orchestration has become Ably's default stack, though teams remain free to experiment.&lt;/p&gt;

&lt;p&gt;This flexibility matters, especially given Ably's positioning around AI Transport. "We can't just say 'use Claude' when we're building infrastructure that works with any LLM provider," Jamie notes. "We need to show that our approach works regardless of which AI you're using."&lt;/p&gt;

&lt;h2&gt;
  
  
  Results by team
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Engineering
&lt;/h3&gt;

&lt;p&gt;All engineers now use Claude Code for agentic coding, but the workflows vary based on the task.&lt;/p&gt;

&lt;p&gt;For narrow, well-defined tickets, Claude can often one-shot a solution. Engineers point it at the relevant files, describe what they want, and use test-driven development as a guardrail. Claude writes the test first, sees it fail, writes the implementation, and confirms the test passes. For larger tasks, the approach is more iterative: Claude generates a plan as a markdown file, the engineer reviews and refines it, then kicks off implementation in a fresh context with the plan as input.&lt;/p&gt;

&lt;p&gt;Discovery is another common use case. Engineers ask Claude questions about the codebase, "where does X get used?", "how does a message get from acceptance to being broadcast out to clients?", using it as a way to navigate complex systems without reading through thousands of lines of code.&lt;/p&gt;

&lt;p&gt;The Ably MCP bridges the gap between documentation and code. Engineers pull context from Confluence docs, have Claude synthesise summaries, and feed those into coding sessions, turning scattered documentation into usable implementation context. Some are experimenting with Claude Code running asynchronously in the browser, queuing up tasks from a phone and reviewing the work later.&lt;/p&gt;

&lt;p&gt;Beyond individual workflows, Claude is integrated into the development pipeline itself. Claude's Agent SDK is connected to GitHub to generate implementation context, review PRs, and fix CI issues before code reaches production. When a PR goes up, AI reviews it for obvious issues first, then engineers review it as they would any other colleague's work.&lt;/p&gt;

&lt;p&gt;One principle remains constant: a single human author owns every PR, regardless of how much was AI-generated. The practice of engineering judgment, knowing what to accept, what to push back on, and what to rewrite, is still the job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoqy7fyvs6seh7risg7q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoqy7fyvs6seh7risg7q.png" alt=" " width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Marketing
&lt;/h3&gt;

&lt;p&gt;The Marketing team wanted to spend more time shaping the narrative and shipping campaigns, not doing repetitive admin. There are always multiple activities in flight, each needing planning, research, execution, reporting, and analysis. That's where AI has been a huge productivity lever: the team has adopted it to streamline the "admin" layer so they can increase both output and quality without adding headcount.&lt;/p&gt;

&lt;p&gt;Today, the team uses a small stack of AI tools across the lifecycle. They analyse Gong calls to accelerate market research and tighten messaging and positioning. They use Claude to pull and synthesise data from multiple sources to scope and validate content opportunities faster. They also automate lead validation and categorisation for sales follow-up, enriching contact and company data so the first human touch starts from context, not guesswork. And they map the customer journey with attribution, using AI to connect what prospects do pre-signup to intent signals, so they can prioritise the right audiences and double down on what's actually working.&lt;/p&gt;

&lt;p&gt;For example lead qualification that used to take hours, is now a multi-stage AI pipeline that runs automatically on every signup. The system researches companies across 6+ data sources (Crunchbase, LinkedIn, SEC filings, PitchBook), extracts structured data, scores against 8 ICP criteria, classifies personas, and routes alerts to Slack with tier assignments and recommended actions – all before anyone on the team sees the lead.&lt;/p&gt;

&lt;p&gt;"Marketing used to spend considerable time on this," Jamie recalls. "Now the first time they see a lead, it already has a confidence-scored ICP assessment, enriched company data, and suggested next steps."&lt;/p&gt;

&lt;h3&gt;
  
  
  Sales
&lt;/h3&gt;

&lt;p&gt;New lead assignment uses multi-signal analysis (employee count, funding raised, revenue for public companies) to automatically route accounts to Commercial, Enterprise, or Strategic segments. For qualified leads, AI generates personalised email sequences based on the ICP analysis, tailoring messaging to the prospect's industry, technical challenges, and relevant customer references.&lt;/p&gt;

&lt;p&gt;For existing customers, AI monitors self-service accounts against usage limits, surfacing expansion opportunities when customers approach thresholds and flagging critical capacity alerts that need immediate outreach. Relay handles the orchestration across all workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmfzrl8c8o8wg6q9cr4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmfzrl8c8o8wg6q9cr4r.png" alt=" " width="800" height="276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Finance
&lt;/h3&gt;

&lt;p&gt;Finance operations at Ably are treated like a tech product, using AI as a force multiplier to engineer away repetitive work.&lt;/p&gt;

&lt;p&gt;The team systematically verifies contracts, builds smarter revenue models, and automates reconciliation work. A recent hackathon project eliminates thousands of monthly clicks in the Stripe-to-Xero process, the kind of repetitive work that most finance teams wouldn't know where to start automating.&lt;/p&gt;

&lt;p&gt;They use Ably's MCP to retrieve data from Xero, then create and update sheets directly through Claude, turning what would be manual exports and data entry into conversational requests. It's a small example of how the platform extends beyond engineering and into every corner of the business.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3pih5iy7kv07zh1bgf7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3pih5iy7kv07zh1bgf7.png" alt=" " width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The cultural shift
&lt;/h2&gt;

&lt;p&gt;Creating an AI-first culture isn't just about providing tools – it's about enablement, support, and honest assessment of where you are versus where you're headed.&lt;/p&gt;

&lt;p&gt;We run AI drop-in sessions every Friday where team members can bring questions, share what they've built, or explore new ideas. An internal Slack channel serves as a continuous stream of AI experiments, wins, and collaborative problem-solving.&lt;/p&gt;

&lt;p&gt;"When Charlotte [Delivery Manager] and I approach teams, we don't even talk about AI initially," Jamie reveals. "We ask: what are your repetitive processes? Once teams understand their processes, then you can start the AI conversation."&lt;/p&gt;

&lt;p&gt;"Anyone can build something now," Jamie says. "The barrier to solving a problem has basically been removed because people can use AI to build the solution themselves."&lt;/p&gt;

&lt;p&gt;The result is what Jamie calls the "wow moment": when someone successfully builds their first AI-powered solution, a ceiling lifts. "Once people have that moment, they just keep building."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filers6qw4hdbfdx9vnhf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filers6qw4hdbfdx9vnhf.png" alt=" " width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;But Jamie is candid about where Ably still has room to grow. "To be completely honest, we haven't hit our potential yet," he admits. "We've made real progress, but there's still more impact to unlock from AI across how we work, our processes, and how we achieve our product outcomes. And when we think we've got there, there'll still be more room to grow."&lt;/p&gt;

&lt;p&gt;The vision for what's next is clear: continuing to integrate AI deeper into how Ably works. The foundations are in place: agentic coding, AI-assisted PR reviews, automated workflows across teams. But the true potential lies in making these the default across every function, not just the teams that adopted early.&lt;/p&gt;

&lt;p&gt;"The biggest gains come from how people think, not just what tools they use," Jamie explains. "When people stop asking 'can AI help with this?' and start assuming it can, that's where the real impact comes from."&lt;/p&gt;

&lt;p&gt;The most significant outcome isn't any specific tool or workflow, it's that cultural shift in action. "We don't have a problem at Ably where people are on the fence about whether AI can help them," Jamie reflects. "We've shown that it can. Now it's about enablement and encouraging people to identify problems they can solve themselves."&lt;/p&gt;

&lt;p&gt;The same infrastructure philosophy that powers our internal AI adoption powers our AI Transport product. &lt;a href="https://ably.com/blog/evolution-of-realtime-ai" rel="noopener noreferrer"&gt;Read how Ably enables reliable, scalable realtime experiences for conversational AI here.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>leadership</category>
      <category>mcp</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Gen‑2 AI UX: Conversations that stay in sync across every device</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 27 Jan 2026 10:15:10 +0000</pubDate>
      <link>https://forem.com/ablyblog/gen-2-ai-ux-conversations-that-stay-in-sync-across-every-device-302m</link>
      <guid>https://forem.com/ablyblog/gen-2-ai-ux-conversations-that-stay-in-sync-across-every-device-302m</guid>
      <description>&lt;p&gt;Start a conversation on your laptop, finish it on your phone. The context just follows you. That's what cross-device AI sync delivers. No reloading history, no reintroducing yourself, just one continuous thread across every screen. It builds trust, reduces friction, and makes the assistant feel like a single, persistent presence. This post unpacks why users expect it, what makes it technically tricky, and what your system needs to get it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI conversations must survive the device switch
&lt;/h2&gt;

&lt;p&gt;Modern users have grown to expect realtime, seamless experiences from their apps and AI tools. They want instantaneous responses, continuous interactions, and no interruptions as they move between devices. This expectation extends to AI-powered experiences: if you start a conversation with an AI assistant on your laptop, you should be able to pick it up on your phone or another tab without missing a beat. Equally, if you have initiated a long-running asynchronous task you want to be notified once it's completed, no matter which device you're using at the time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What users want, and why this enhances the experience
&lt;/h2&gt;

&lt;p&gt;Users want continuous, identity-aware AI conversations that follow them across devices. In practice, this means an AI chat session linked to their identity rather than a single tab or device. The conversation should feel like a single thread they can return to at any time.&lt;/p&gt;

&lt;p&gt;That continuity builds trust. The AI isn't "forgetting" just because you switched devices. A remembered history signals reliability and intent, helping users feel the assistant is genuinely useful. Multi-turn conversations flow naturally, and users avoid repeating themselves or reconstructing context.&lt;/p&gt;

&lt;p&gt;This matters even more once AI systems move beyond simple chat. When an LLM is running long-lived, asynchronous work such as multi-step research, tool calls, or background analysis, users expect to see progress and results wherever they happen to be at the time. You might start a task on your desktop, step away while the model works, and then pick up your tablet to see the output appear as soon as it's ready.&lt;/p&gt;

&lt;p&gt;Real-world usage makes multi-device continuity unavoidable. These moments must be frictionless and reinforce the sense that the AI is persistent, reliable, and working on your behalf rather than being tied to a fragile client session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is proving challenging
&lt;/h2&gt;

&lt;p&gt;HTTP is fundamentally stateless. Each request stands alone, meaning conversation context has to be manually preserved and restored for both the client and the server. This makes cross-device AI session continuity complex.&lt;/p&gt;

&lt;p&gt;Having clients poll for updates is inefficient and adds latency. Long-polling or server-sent events help, but only partially. They don't enable simultaneous bi-directional, low latency messaging, which is what smooth AI conversations require.&lt;/p&gt;

&lt;p&gt;Handling reconnections, preserving message order, and managing updates across multiple active clients requires considerable infrastructure. Doing this reliably, at scale, across networks and devices, is beyond what the typical product team can or should build from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why you need a drop-in AI transport
&lt;/h2&gt;

&lt;p&gt;Given the challenges described, building a robust system for realtime synchronisation from scratch can significantly drain engineering resources and slow product velocity. This is where a drop-in AI transport layer becomes essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent, bi-directional messaging
&lt;/h3&gt;

&lt;p&gt;To support conversations that stay in sync across devices, such a layer must offer persistent, bi-directional messaging using protocols like WebSockets for AI streaming. This allows for continuous, low-latency communication in both directions, enabling the AI to push updates and the client to send input without waiting for discrete request/response cycles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z8y6lh6wbk9z2x5ks33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z8y6lh6wbk9z2x5ks33.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Identity aware fan-out
&lt;/h3&gt;

&lt;p&gt;Equally important is identity-aware fan-out. The transport system needs to recognize all active sessions associated with a single user and ensure that every message or state update is sent to all of those endpoints. That means when a user sends a message on one device, every other device they're signed in on should immediately reflect the change.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01i4piouiymino6cca9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01i4piouiymino6cca9l.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Ordering and session recovery
&lt;/h3&gt;

&lt;p&gt;The system also needs to preserve message ordering and support reliable session recovery. If the connection drops momentarily, say from a device switch or network disruption, the user shouldn't lose messages or see them out of order. A well-designed transport layer offers mechanisms to replay missed events and keep message sequences intact, ensuring consistency in the conversation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xvknzom35uh6g1ar9k6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xvknzom35uh6g1ar9k6.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Presence tracking
&lt;/h3&gt;

&lt;p&gt;This enables the backend to know which devices are currently online and active. It helps coordinate updates, prevents redundant notifications, and can be used to power features like realtime indicators for typing or collaborative editing across devices.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Streaming support&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To maintain a high-quality conversational UX, the transport layer must support LLM token streaming. This includes delivering partial, realtime updates from the AI model as it generates responses. That stream must arrive quickly, in order, and appear simultaneously on any active device.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmmr5enl0lul9dxvt0uy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmmr5enl0lul9dxvt0uy.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ably's infrastructure supports all of these capabilities as part of its realtime platform. It eliminates the need to custom-build low-level transport solutions, allowing engineering teams to focus on building intelligent, agentic features instead of protocol logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How the desired experience maps to the transport layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The table below breaks down what users expect from cross-device AI conversations, what your transport layer must support to deliver those experiences, and the technical mechanics that make it all work.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User Experience Desired&lt;/th&gt;
&lt;th&gt;Required Transport Layer Features&lt;/th&gt;
&lt;th&gt;Underlying Technical Requirements&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Identity-aware sync across devices&lt;/td&gt;
&lt;td&gt;Identity-aware fan-out&lt;/td&gt;
&lt;td&gt;Map user identity to all active sessions and ensure message fan-out across them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State replication across sessions&lt;/td&gt;
&lt;td&gt;Maintain consistent shared state for conversation history and updates across devices.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Realtime continuation of streaming outputs and input states&lt;/td&gt;
&lt;td&gt;Durable stream relay&lt;/td&gt;
&lt;td&gt;Emit model outputs as streaming tokens; buffer streams server-side to continue on reconnect or switch.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No loss of flow during context switches&lt;/td&gt;
&lt;td&gt;Reliable message ordering&lt;/td&gt;
&lt;td&gt;Guarantee delivery order of messages across devices, preserving conversational context.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session recovery on reconnect&lt;/td&gt;
&lt;td&gt;Rehydrate sessions with missed messages after disconnects or page refreshes.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A persistent mental model of the AI assistant&lt;/td&gt;
&lt;td&gt;Live session multiplexing&lt;/td&gt;
&lt;td&gt;Allow multiple client connections per user and route interactions through a unified session view.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Realtime delta propagation&lt;/td&gt;
&lt;td&gt;Transmit message edits or UI state updates as granular deltas to all active endpoints.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-device sync&lt;/td&gt;
&lt;td&gt;Mirror updates across devices in real time, including UI elements and message scroll state.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session handoff and presence awareness&lt;/td&gt;
&lt;td&gt;Detect presence state and manage smooth transition of active sessions between devices.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cross device AI? You can ship it now.
&lt;/h2&gt;

&lt;p&gt;Seamless cross-device conversations aren't futuristic - they're achievable today. You don't need to rebuild your stack to make it happen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; provides all you need to support persistent, identity-aware, streaming AI experiences across multiple clients. If you're working on Gen‑2 AI products and want to get this right, we'd love to talk.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>mobile</category>
      <category>ux</category>
    </item>
    <item>
      <title>The new Ably dashboard: realtime visibility in your hands</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Thu, 22 Jan 2026 17:54:35 +0000</pubDate>
      <link>https://forem.com/ablyblog/the-new-ably-dashboard-realtime-visibility-in-your-hands-470a</link>
      <guid>https://forem.com/ablyblog/the-new-ably-dashboard-realtime-visibility-in-your-hands-470a</guid>
      <description>&lt;p&gt;We've rebuilt the Ably dashboard to give developers clear, realtime visibility into how their applications behave.&lt;/p&gt;

&lt;p&gt;This isn't just a cosmetic refresh. It's a shift from a configuration-first dashboard to a live observability surface. One that lets you see channels, connections, messages, and errors as they happen, debug issues instantly, and understand usage without stitching together logs and tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why we rebuilt the dashboard
&lt;/h2&gt;

&lt;p&gt;The previous dashboard did a solid job in helping you manage your apps, API keys, and configurations. But it didn't show what was actually happening inside your realtime system. When something broke or behaved unexpectedly, you were left piecing together clues from SDK logs, APIs, and external tools. There wasn't a single place to answer operational questions like who's connected right now, what's happening on a particular channel, or whether a pattern of errors is brand new or recurring.&lt;/p&gt;

&lt;p&gt;The new dashboard brings realtime observability directly into the browser. No setup, no extra tooling, no context switching - just a live window into Ably.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoi9nei1gbr6vawivb18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoi9nei1gbr6vawivb18.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's new (and why it matters)
&lt;/h2&gt;

&lt;p&gt;At the heart of this release are four capability upgrades that change how your team operates realtime systems on Ably. Each one is useful on its own; together, they make it far easier to understand behavior in your apps, debug faster, and understand what's driving usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Channel and connection inspectors&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The new inspectors provide realtime visibility into how your system behaves as data flows through Ably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The channel inspector&lt;/strong&gt; shows you who's attached to the channel, which messages are being sent, and what's happening in the presence set. You can also see which rules and integrations are active on that channel. Alongside that live activity, it surfaces realtime metrics like message rates, occupancy, and connection counts, so you can see performance as it changes - not after the fact.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gspiop09ziamtayszsz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gspiop09ziamtayszsz.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The connection inspector&lt;/strong&gt; enables you to see who's connected to your app. You can choose  a specific connection and see which channels it's attached to and view live statistics such as the rate of messages being published by that connection. Additionally, you can see information such as the geographical location of the connection and the SDK it's connected to Ably with.Combined, the inspectors put realtime visibility into who's interacting with your app, and which channels they're interacting with. You now have the ability to debug any issues far more easily, such as 'why is this channel still active?'&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll2ehp05lpfaf31k6npy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll2ehp05lpfaf31k6npy.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Log search
&lt;/h3&gt;

&lt;p&gt;Live streams are great for what's happening right now, but many investigations start with a timestamp. Log search lets you query historical platform events so you can understand past behavior, trace what happened and understand why it occurred. You can then compare today's traffic with last week's. It's ideal for debugging anomalies and spotting patterns, especially when you're trying to answer whether a problem is new or recurring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2tidfyw3rv03r1hz5lp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2tidfyw3rv03r1hz5lp.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Reports and analytics
&lt;/h3&gt;

&lt;p&gt;The new reports section gives you aggregated visibility into how your apps are used over time - message volumes, connection and channel durations - so you can understand where consumption is coming from and what's driving traffic. This is particularly helpful when needing to explain usage internally, plan scaling work, or map realtime costs back to product features.&lt;/p&gt;

&lt;p&gt;This is the foundation for deeper analytics arriving in future releases, including more granular breakdowns by product, SDK and device, plus finer‑grained views by app, channel, and namespace.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgbj9w0bdkazot79iq0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgbj9w0bdkazot79iq0s.png" alt=" " width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Web CLI
&lt;/h3&gt;

&lt;p&gt;We've also introduced a new web CLI. A browser-based command line that lets you run Ably commands instantly. You can publish and subscribe to messages, enter presence, and manage your app configuration without any local setup. It complements the redesigned dashboard to give you a fast way to interact with Ably from anywhere. The web CLI is a powerful tool that's invaluable for exploring Ably features without requiring you to write any code. It is especially useful during support calls where you need to quickly reproduce a certain behavior or send a specific set of messages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci61jl0be932cs8xlmw8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci61jl0be932cs8xlmw8.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs tail and live telemetry
&lt;/h3&gt;

&lt;p&gt;Each inspector includes a live, realtime log stream scoped to the resource you're viewing. If you're inspecting a channel, you see the events relevant to that channel; if you're inspecting a connection, you see the events relevant to that connection. This means you can trace behaviour as it happens, correlate spikes in live metrics with specific platform events, and debug instantly rather than collecting evidence after the incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  A more modern, product-first dashboard
&lt;/h3&gt;

&lt;p&gt;Alongside these new capabilities, we've modernized the dashboard itself. Navigation is cleaner and faster, with dedicated sections for each Ably product. The result is a more intuitive experience that helps teams get to the right tools quicker; whether they're debugging, trying out a new product, testing new features, or monitoring live traffic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t68c8m7k0o6nljn2zbm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t68c8m7k0o6nljn2zbm.png" alt=" " width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's coming next
&lt;/h2&gt;

&lt;p&gt;This release is a major step toward making the dashboard a full observability layer for Ably. We're working towards live logs: a global realtime stream of platform events across all resources to complement the existing log search functionality - so you can see what &lt;em&gt;is&lt;/em&gt; happening live, and what &lt;em&gt;has&lt;/em&gt; happened previously. We're also continuing to expand the reports section to provide richer visualization of your usage, performance, and reliability, across all your apps.&lt;/p&gt;

&lt;p&gt;In parallel, we're continuing to modernize the remaining areas of the dashboard so that all your resources benefit from enhanced observability and analytics.&lt;/p&gt;

</description>
      <category>dashboard</category>
      <category>observability</category>
      <category>realtime</category>
      <category>devtools</category>
    </item>
    <item>
      <title>The evolution of realtime AI: The transport layer needed for stateful, steerable AI UX</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Wed, 21 Jan 2026 17:16:19 +0000</pubDate>
      <link>https://forem.com/ablyblog/the-evolution-of-realtime-ai-the-transport-layer-needed-for-stateful-steerable-ai-ux-2kl4</link>
      <guid>https://forem.com/ablyblog/the-evolution-of-realtime-ai-the-transport-layer-needed-for-stateful-steerable-ai-ux-2kl4</guid>
      <description>&lt;p&gt;When we launched Ably in 2016, we set out to solve a fundamental problem: delivering reliable, low-latency real-time experiences at scale. So we set out to build a globally distributed system that didn't force developers to choose between latency, integrity, and reliability – trade-offs that had defined the realtime infrastructure space for years.&lt;/p&gt;

&lt;p&gt;Fast forward to today, and we're reaching 2 billion devices monthly, processing 2 trillion operations for customers who demand rock-solid infrastructure for their mission-critical features. But over the past year, as AI has transformed from a backend optimisation tool into a front-and-centre user experience, we've been asking ourselves a critical question: &lt;strong&gt;What's Ably's role in the AI ecosystem?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  From "nice-to-have" to essential infrastructure
&lt;/h3&gt;

&lt;p&gt;A year ago, if you'd asked us about Ably's AI story, we would have told you that yes, customers were using us. Companies like HubSpot and Intercom were leveraging Ably for token streaming and realtime AI features. But honestly? The value proposition felt incremental. Traditional LLM interactions followed a simple request-response pattern: send a query, stream back tokens, done. HTTP streaming handled this reasonably well, and while Ably offered benefits, there wasn't a smoking gun reason to use us specifically for AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's changed dramatically.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The shift to Gen 2 AI experiences
&lt;/h3&gt;

&lt;p&gt;What we're calling "Gen 2" AI experiences are fundamentally different from what came before. Instead of simply querying a model's training data, today's AI agents reason, search the web, call APIs, interact with tools via MCP (Model Context Protocol), and orchestrate complex multi-step workflows. Just look at how Perplexity searches, or how ChatGPT now breaks down complex requests into observable reasoning steps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q46rytcsnebh5srj14j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q46rytcsnebh5srj14j.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This shift introduces an entirely new set of challenges:&lt;/p&gt;

&lt;h3&gt;
  
  
  Modern AI UX problems
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Async by default&lt;/strong&gt;: When an AI agent needs 30 seconds or a minute to complete a task (not 3 seconds) user behaviour changes. They switch tabs, check their phone, or start other work. A simple HTTP request suddenly needs to handle disconnections, reconnections, and state recovery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continuous feedback is mandatory&lt;/strong&gt;: Users need to know what's happening. "Searching the web... Analysing documents... Calling your CRM..." This isn't a nice-to-have anymore. Without feedback, users assume the system has failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-threading conversations&lt;/strong&gt;: Imagine asking a support agent about your order status. While they're checking, you ask another question. Now you have two parallel operations that need coordination. The agent needs to know what else is in flight and potentially prioritise or sequence responses intelligently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-device continuity&lt;/strong&gt;: Users want to be able to set tasks running and then pick-up later from where they left off. They may start a deep research query on their laptop, close it, and then want to check progress on their phone an hour later. The entire conversation state needs to seamlessly transfer.&lt;/p&gt;

&lt;h3&gt;
  
  
  The transport layer modern AI needs
&lt;/h3&gt;

&lt;p&gt;Our vision for addressing these challenges centres on what we're calling the &lt;strong&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt;&lt;/strong&gt;. A drop-in solution that handles the complexity of making the AI UX resilient and multi-device - so developers can focus on building great agent experiences, not wrestling with networking problems.&lt;/p&gt;

&lt;p&gt;We focus on everything between your AI agents and end-user devices, leaving orchestration, LLM selection, and business logic where they belong – in your control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: The foundation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the base level, the AI Transport provides what you'd expect from Ably: &lt;a href="https://ably.com/platform" rel="noopener noreferrer"&gt;&lt;strong&gt;bulletproof reliability&lt;/strong&gt;&lt;/a&gt;, multi-device synchronisation, and automatic resume capabilities. But the real shift is architectural. Instead of your agent responding directly to requests, it returns a conversation ID. Devices subscribe to that conversation, and from that point forward, the agent pushes updates through Ably.&lt;/p&gt;

&lt;p&gt;This simple change unlocks powerful capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decoupling&lt;/strong&gt;: Agents and devices can disconnect and reconnect independently without losing continuity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bidirectional control&lt;/strong&gt;: Need to stop an agent mid-task or ask a follow-up question? There's a direct communication channel that doesn't require complex routing infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State recovery&lt;/strong&gt;: &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;Reconnecting devices don't replay every token&lt;/a&gt;. They get current state and resume from there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Richer orchestration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The next layer introduces shared state on channels using live shared state. This enables sophisticated coordination:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent presence and status&lt;/strong&gt;: Devices know if agents are active, thinking, or have crashed. Agents can broadcast their current focus ("Analyzing Q4 data...") as state rather than events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent coordination&lt;/strong&gt;: When multiple agents work simultaneously -  say, one handling a technical query while another processes a billing question -  they can see each other's state and coordinate without stepping on each other's work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context-aware prioritisation&lt;/strong&gt;: Agents can see if a user is actively waiting versus having backgrounded their session, enabling smarter resource allocation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client-side tool calls&lt;/strong&gt;: In co-pilot scenarios, agents can query the client directly about user context "Is the user currently editing this field?" without roundtripping through backend systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Enterprise-grade observability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because everything flows through Ably, you gain comprehensive visibility into the last mile of your AI experience. Stream observability into your existing systems, integrate with Kafka for audit trails, and leverage enterprise features like SSO and SOC 2 compliance that come standard with Ably's infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F668a91lqa6q7llsxl01x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F668a91lqa6q7llsxl01x.png" alt=" " width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A new way of building
&lt;/h3&gt;

&lt;p&gt;What excites us most is the idea of providing a stateful conversation layer that removes infrastructure concerns from the developer's plate. Think of it as abstract storage for conversation state, combined with the realtime capabilities developers need for modern AI UX.&lt;/p&gt;

&lt;p&gt;The developers building these experiences don't want to solve networking problems. They want to focus on prompts, orchestration, RAG pipelines, and agent logic. The transport layer shouldn't be where they spend their time – but it will become critical as user expectations evolve to match what ChatGPT, Perplexity, and Claude are demonstrating daily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Framework-agnostic by design
&lt;/h3&gt;

&lt;p&gt;One pattern we've noticed: as engineering teams mature in their AI journey, they tend to move away from monolithic frameworks and build custom orchestration logic. This makes sense – these systems become core to their business differentiation.&lt;/p&gt;

&lt;p&gt;That's why the Ably AI Transport is deliberately framework-agnostic. Yes, we're building drop-in integrations with OpenAI's agent framework, LangChain, LangGraph, Vercel AI SDK, ag-ui and others to make getting started trivially easy. But the architecture doesn't lock you in. Swap out your orchestration layer, change LLM providers, rebuild your agent logic – your transport layer and device communication remain consistent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The road ahead
&lt;/h2&gt;

&lt;p&gt;If you're building AI experiences and wrestling with questions like "How do I handle interruptions?", "What happens when users switch devices mid-conversation?", or "How do I coordinate multiple parallel agent tasks?" – we'd love to talk. We're convinced there's a better way to build these experiences, and it starts with not having to rebuild the real-time infrastructure layer from scratch.&lt;/p&gt;

&lt;p&gt;The plumbing shouldn't be your problem. Building great AI experiences should be.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>realtime</category>
      <category>infrastructure</category>
      <category>transport</category>
    </item>
    <item>
      <title>AI UX: Reliable, resumable token streaming</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 20 Jan 2026 17:54:26 +0000</pubDate>
      <link>https://forem.com/ablyblog/ai-ux-reliable-resumable-token-streaming-5a63</link>
      <guid>https://forem.com/ablyblog/ai-ux-reliable-resumable-token-streaming-5a63</guid>
      <description>&lt;p&gt;Refresh the page, lose signal, switch tabs - the AI conversation just keeps going. That's what reliable, resumable token streaming makes possible. No restarts, no lost context, just the same response picking up right where it left off. It keeps users in flow and builds trust, making conversations feel seamless. Even better, it unlocks things like switching devices mid-stream without missing a beat. This post explains why users expect it, why it's hard to build, and what your infrastructure needs to make it work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The new expectation for seamless AI interactions
&lt;/h2&gt;

&lt;p&gt;As AI becomes woven into everyday apps, users have rising expectations for seamless interactions. An emerging baseline is that an AI conversation or generated answer should not be fragile. Reliable token streaming that survives crashes or reloads is quickly becoming expected behaviour. People now assume an ongoing AI response will continue uninterrupted despite a temporary failure. If your browser tab crashes or you hit refresh accidentally, you'd expect the AI to keep going in the background and resume when you return - just like a video resumes where you left off.&lt;/p&gt;

&lt;p&gt;This expectation isn't theoretical; it's showing up as a real demand from users and developers. Many have noticed the annoyance of a chatty AI that goes silent after a network blip and forces you to retry the prompt from scratch. Forward-looking teams are already experimenting with ways to make AI streams more resilient. In short, the bar for a smooth AI experience is rising: reliable, resumable streaming is moving from a nice-to-have to a must-have.&lt;/p&gt;

&lt;h2&gt;
  
  
  What users want, and why this enhances the experience
&lt;/h2&gt;

&lt;p&gt;Users want AI conversations that continue uninterrupted despite failures or reloads. They don't want to babysit the AI or repeat themselves due to technical faults. Consider this scenario: You're halfway through a detailed AI response when the page reloads or the network drops. When things come back, you expect the conversation to pick up right where it left off. The same question, same response, no rewind. That's the baseline now: the AI should just handle it. In practice, this means users now expect a few key behaviours:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Streamed responses resume instantly after a reload. The AI's answer picks up exactly where it stopped when you refreshed.&lt;/li&gt;
&lt;li&gt;Incomplete prompts persist across failures. If you submit a question and the app crashes or you go offline, the AI still finishes the answer. You don't lose your query or its partial response.&lt;/li&gt;
&lt;li&gt;Reconnection doesn't trigger full re-generation. Coming back online or reopening the app doesn't make the AI start the answer over from scratch; it continues as if nothing happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under the hood, delivering these behaviours requires that the AI's generation process not be tied to a single fragile connection. Even if the user disconnects, the system must carry on generating tokens so it can seamlessly resume later. In other words, the conversation's state should survive independently of the user's browser tab or device session. This greatly enhances the user experience by ensuring the AI is always "in sync" with the user, no matter the hiccups along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is proving challenging
&lt;/h2&gt;

&lt;p&gt;Building seamless, resumable streaming sounds simple on the surface: keep the tokens flowing, even when something goes wrong. But under the hood, it's anything but. Most web stacks were never designed for this kind of continuity, and the gaps show quickly when you try to implement it. There are a few core failure points that make reliable streaming difficult to get right:&lt;/p&gt;

&lt;h3&gt;
  
  
  Stateless protocols like HTTP drop the stream on failure
&lt;/h3&gt;

&lt;p&gt;Most web interactions (think HTTP requests, REST APIs) are stateless and short-lived. If you're streaming an AI response over a standard HTTP connection and it drops, that request is gone. HTTP has no native concept of resuming a half-finished response. It wasn't designed for long-lived, continuous streams. This makes it fundamentally ill-suited to delivering multi-turn, token-based output with realtime guarantees.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming logic often lives only in the browser
&lt;/h3&gt;

&lt;p&gt;Many apps place the responsibility for handling AI output in the client - usually the browser tab. If that tab crashes or is closed, any awareness of the current conversation state disappears. Unless the server is explicitly maintaining progress (e.g. buffering the partial response), the result is a hard reset. Even a minor network blip or page reload can cause the entire generation to be lost, forcing the user to re-issue the prompt and wait again. From the developer's side, this means wasted tokens and potentially double the LLM costs for the same request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server infrastructure rarely stores stream state by default
&lt;/h3&gt;

&lt;p&gt;Even when &lt;a href="https://ably.com/topic/websockets" rel="noopener noreferrer"&gt;WebSockets&lt;/a&gt; or similar protocols are used, many backends treat streaming as fire-and-forget. Once tokens are emitted, they're not stored. If a client reconnects and asks "what did I miss?", the server has no answer unless a stateful resume mechanism is in place. That means tracking client progress, buffering streamed output, handling retries, and ensuring correct ordering. None of which are trivial to bolt on after the fact. Building this kind of infrastructure requires careful design, and is one reason robust streaming support remains rare despite user demand.&lt;/p&gt;

&lt;h3&gt;
  
  
  These three limitations add up to a brittle default behaviour:
&lt;/h3&gt;

&lt;p&gt;A typical implementation ends up tying AI generation directly to a single client connection. If that connection drops due to a refresh, crash, or network issue, the stream is lost. Undelivered tokens vanish, and the user is left with an incomplete response and no way to recover. They have to restart the prompt and wait all over again. The result is a jarring, inefficient experience that breaks user trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why you need a drop-in AI transport layer
&lt;/h2&gt;

&lt;p&gt;Making streaming resilient isn't something you can bolt on later. It needs to be handled at the transport layer, by infrastructure that treats continuity as the default. This isn't one feature, it's a set of behaviours working together to keep streams intact even when clients disconnect, reload, or crash. At a minimum, the transport should handle:&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent streaming connections
&lt;/h3&gt;

&lt;p&gt;Instead of one-request-per-response, the client should use a persistent connection (e.g. WebSocket) that stays open for streaming. A persistent channel enables realtime, bi-directional,  delivery and helps guarantee in-order arrival of tokens.&lt;/p&gt;

&lt;p&gt;Unlike stateless HTTP calls, a WebSocket connection can continuously push data to the client without needing to re-establish a new HTTP request for each chunk. This drastically reduces the chance of interruption. And if the connection does break, the protocol can attempt automatic reconnection quickly, avoiding the overhead of starting a whole new session from scratch. In short, a long-lived connection is the foundation for uninterrupted token streams.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc0vjfyrjxoc7vkw38mfd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc0vjfyrjxoc7vkw38mfd.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reliable, resumable token streaming: Your AI keeps streaming even after reloads, tab crashes, or network drops. No restarts. No lost context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server-side output buffering and replay
&lt;/h3&gt;

&lt;p&gt;A resilient transport layer buffers the AI's output on the server side as it's being generated. Every token or chunk is stored (in memory or a fast store) at least until it's safely delivered. Why? Because if the client disconnects momentarily, those tokens must still be available to send later. &lt;a href="https://ably.com/platform" rel="noopener noreferrer"&gt;Ably's messaging platform&lt;/a&gt;, for instance, persists messages while the client is reconnecting, guaranteeing no data is lost in transit. This buffered backlog enables catch-up: when a client reconnects, the transport can replay any missed tokens from the buffer before returning to live streaming.&lt;/p&gt;

&lt;p&gt;The user doesn't see gaps in the text, because the system fills in everything that was generated while they were away. Without server-side buffering, there's simply no way to recover what was produced during a disconnect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6hxyr6ms853f8bmehah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6hxyr6ms853f8bmehah.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rejoin &amp;amp; instantly hydrate state: When someone comes back, they instantly see the live state of the conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session tracking across client restarts
&lt;/h3&gt;

&lt;p&gt;To resume a stream, the system needs to know who's reconnecting and where to pick things up. That means tracking session state across connections. That typically means using a stable session or conversation ID that stays the same even if the page reloads or the device changes.&lt;/p&gt;

&lt;p&gt;When the client reconnects, it should tell the server what it last received (for example, "I got up to token 123"). The server then uses that information to send only the tokens the client missed. This handshake (where the client shares its last-seen message ID) is what lets the stream continue cleanly, without starting over.&lt;/p&gt;

&lt;p&gt;Platforms like Ably support this by using resume tokens or last-event IDs. The client includes that token on reconnect, and t&lt;a href="https://ably.com/docs/platform/architecture/connection-recovery#:~:text=When%20network%20connectivity%20is%20reestablished%2C,missed%20during%20the%20disconnection%20period" rel="noopener noreferrer"&gt;he stream resumes from exactly the right point&lt;/a&gt;. As a result, even after a crash, refresh, or switching devices, the AI response carries on with no manual sync needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4zgnvztryubbe101j3o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4zgnvztryubbe101j3o.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multi-device &amp;amp; multi-tab continuity: Start a chat on your laptop, continue on your phone; open three tabs. it all stays in perfect sync.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ordered delivery guarantees and reconnection state
&lt;/h3&gt;

&lt;p&gt;Maintaining the correct order of tokens is critical. We can't have jumbled or duplicated text when a stream resumes. The transport layer must guarantee that messages (tokens) are &lt;a href="https://ably.com/blog/data-integrity-in-ably-pub-sub" rel="noopener noreferrer"&gt;delivered in order, exactly once.&lt;/a&gt; This involves assigning each chunk a unique sequence identifier and ensuring the client never processes the same chunk twice even in the face of retries.&lt;/p&gt;

&lt;p&gt;Upon reconnection, the system should replay missed tokens in the original order, with none missing and none repeated. Achieving this reliably often means the server needs to temporarily queue messages and only release them in sequence. Ably's approach, for example, replays any messages the client missed during a disconnection and ensures no gaps or duplicates in the data. In practice, that means when your AI resumes its answer after a drop the user sees a continuous, correctly ordered completion as if no disconnect ever happened.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufk5w03d678djij3349a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufk5w03d678djij3349a.png" alt=" " width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Exactly-once, in-order delivery: Every message, token, event, or state update arrives once, arrives only once, and arrives in the correct sequence.&lt;/p&gt;

&lt;p&gt;All of this points to a deeper architectural issue. To make streaming reliable, you need to rethink where state lives and how streams are managed across connections:&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;A drop-in transport layer for AI needs to manage these concerns transparently. It keeps a persistent pipe open, buffers the token stream, tracks session offsets, and enforces ordering and exactly-once semantics. For the developers, this means you don't have to build custom state management for every AI session – the transport layer provides the assurances that "your AI will keep streaming, no matter what." Essentially, it's infrastructure that transforms the unreliable web into a dependable conduit for AI data.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the experience maps to the transport layer
&lt;/h2&gt;

&lt;p&gt;To better illustrate, here's how specific user expectations translate into transport-layer requirements and technical implementations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User experience desired&lt;/th&gt;
&lt;th&gt;Required transport layer feature&lt;/th&gt;
&lt;th&gt;Underlying technical implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Answer continues after page refresh: The AI's response resumes exactly where it left off when the user reloads the page, with no repetition.&lt;/td&gt;
&lt;td&gt;Stream resumption on reconnect&lt;/td&gt;
&lt;td&gt;Connection recovery using a resume token or last message ID (e.g. sending an SSE Last-Event-ID or a WebSocket reconnection handshake) so the server knows what data to replay.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No lost content on brief disconnection: A short network drop doesn't cause missing chunks of the answer. The user never sees a gap in the generated text.&lt;/td&gt;
&lt;td&gt;Server-side message backlog for catch-up&lt;/td&gt;
&lt;td&gt;The transport layer buffers outgoing tokens in a queue or stream. On reconnection, it delivers any tokens that were generated while the client was offline, before resuming live streaming.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No restart of AI generation: The AI doesn't reset or start a new answer when the user comes back. It continues the same completion that was in progress.&lt;/td&gt;
&lt;td&gt;Decoupling of generation from client connection&lt;/td&gt;
&lt;td&gt;The generation runs in its own process or service (e.g. an API worker or background job) that isn't directly tethered to the client's connection. The client connects to a stream of results, but the generation logic doesn't depend on that connection being alive.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No duplicate or jumbled text after reconnection: The user doesn't see the AI repeat itself or skip ahead when recovering from a drop.&lt;/td&gt;
&lt;td&gt;Ordered, exactly-once delivery&lt;/td&gt;
&lt;td&gt;Each token is tagged with a unique sequence identifier. On reconnect, the server uses these IDs to send only the missing tokens in order. Mechanisms like Ably's unique message IDs and sequenced delivery ensure continuity with no overlaps.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt state preserved: If the user submits a prompt and the app reloads, they don't need to re-enter it; the AI still responds.&lt;/td&gt;
&lt;td&gt;Session context persistence&lt;/td&gt;
&lt;td&gt;The prompt and conversation state are tied to a session ID stored server-side. The transport layer (or application logic) ensures that the pending prompt is still processed and its output is stored, even if the client isn't connected. When the user reconnects with the same session ID, the response is delivered as normal.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Delivering reliable, resumable streaming today
&lt;/h2&gt;

&lt;p&gt;Reliable, resumable token streaming isn't theoretical anymore. You can ship it now. You don't need to redesign your whole architecture or stitch together a fragile set of custom reconnection hacks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;Ably AI Transport&lt;/a&gt; gives you the infrastructure required to keep AI responses flowing, even when the client drops. Persistent connections, ordered replay, automatic resume, and delivery guarantees come as part of the platform. Your generation process keeps running, and users see the response continue exactly where it left off when they return.&lt;/p&gt;

&lt;p&gt;If you're building Gen-2 AI experiences and want streaming that survives reloads, outages, and device switches, we'd be happy to help.&lt;/p&gt;

&lt;p&gt;Reach out for &lt;a href="https://share.hsforms.com/2vFBJVWNOQzKfBXkaRLGawg44qpp" rel="noopener noreferrer"&gt;early access&lt;/a&gt; or to learn how we can support reliable, resumable streaming in your AI products.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ux</category>
      <category>streaming</category>
      <category>websockets</category>
    </item>
    <item>
      <title>Ably AI Transport is now available</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Tue, 20 Jan 2026 17:49:29 +0000</pubDate>
      <link>https://forem.com/ably/ably-ai-transport-is-now-available-482p</link>
      <guid>https://forem.com/ably/ably-ai-transport-is-now-available-482p</guid>
      <description>&lt;p&gt;Today we’re launching &lt;a href="https://ably.com/ai-transport" rel="noopener noreferrer"&gt;&lt;strong&gt;Ably AI Transport&lt;/strong&gt;&lt;/a&gt;: a drop-in realtime delivery and session layer that sits between agents and devices, so AI experiences stay continuous across refreshes, reconnects, and device switches — without an architecture rewrite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap: HTTP streaming breaks down for stateful AI UX
&lt;/h2&gt;

&lt;p&gt;AI has moved from “type and wait” requests to experiences that are long-running and stateful: responses stream, users steer mid-flight, and work needs to carry across tabs and devices. That shift changes what “working” means in production. It’s not just whether the model can generate tokens, it’s whether the experience stays continuous when real users behave like real users do.&lt;/p&gt;

&lt;p&gt;Most AI apps still start with a connection-oriented setup: the client opens a streaming connection (SSE, fetch streaming, sometimes WebSockets), the agent generates tokens, and the UI renders them as they arrive. It’s low friction and demos well.&lt;/p&gt;

&lt;p&gt;But HTTP streaming really solves only the first part of the problem, and it’s not a good place to end.&lt;/p&gt;

&lt;p&gt;First: &lt;strong&gt;continuity&lt;/strong&gt;. When output is tied to a specific connection, the experience becomes fragile by default. Refreshes, network changes, backgrounding, multiple tabs, device switches, agent handovers (even agent crashes) are normal behaviour. And they’re exactly where teams see partial output, missing tokens, duplicated messages, drifting state, and “start again” recovery paths. That’s where user trust gets lost.&lt;/p&gt;

&lt;p&gt;Second: &lt;strong&gt;capability&lt;/strong&gt;. A connection-first transport layer doesn’t just make UX fragile. It limits what you can build. Once you want true collaborative patterns like barge-in, live steering, copilot-style bidirectional exchange, multi-agent coordination, or a seamless human takeover with full context, you need more than “a stream.” You need a stateful conversation layer that can support multiple participants, resumable delivery, and shared session state.&lt;/p&gt;

&lt;p&gt;So teams patch it: buffering, replay, offsets, reconnection logic, session IDs, routing rules for interrupts and tool results, multi-subscriber consistency, and observability once production incidents start. It’s critical work — but it’s not differentiation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Ably AI Transport does
&lt;/h2&gt;

&lt;p&gt;AI Transport gives each AI conversation a durable bi-directional session that isn’t tied to one tab, connection or agent. Agents publish output into a session channel, clients subscribe from any device, and Ably handles the delivery guarantees you’d otherwise rebuild yourself: ordered delivery, recovery after reconnects, and fan-out to multiple subscribers.&lt;/p&gt;

&lt;p&gt;It’s deliberately model and framework-agnostic. You keep your agent runtime and orchestration. AI Transport handles the delivery and session layer underneath.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqaf20vuuz65xgvr49g7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqaf20vuuz65xgvr49g7.png" alt="AI Transport examples grid" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The key shift: sessions become channels
&lt;/h2&gt;

&lt;p&gt;In a connection-oriented setup, the “session” effectively lives inside the streaming pipe. When the pipe breaks, continuity becomes a headache.&lt;/p&gt;

&lt;p&gt;With AI Transport, the session is created once and represented as a durable channel. Agents and clients can join independently. Refresh becomes reattach and hydrate. Device switching becomes another subscriber joining the same session. Multi-device behaviour becomes fan-out rather than custom routing. Agents and humans become truly connected over a transport designed for AI bi-directional low latency conversations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkryrw054d5r2s8iie20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkryrw054d5r2s8iie20.png" alt="Before and after: HTTP streaming vs AI Transport" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Ably AI Transport ensures a resilient, stateful AI UX
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Resumable, ordered token streaming:&lt;/strong&gt; A great AI UX depends on durable streaming. Output is treated as session data, so clients can catch up cleanly after refreshes, brief dropouts, and network handoffs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-device continuity:&lt;/strong&gt; Conversations are user-scoped, not tab-scoped. Multiple clients can join the same session without split threads, duplication, or drifting state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live steering and interruption:&lt;/strong&gt; Modern AI UX needs control, not just output. Interrupts, redirects, and approvals route through the same bi-directional session fabric as the response stream, so steering works even across reconnects and devices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presence-aware sessions:&lt;/strong&gt; Once agents do real work, wasted compute becomes a serious cost problem. Presence provides a reliable signal for whether the user is currently connected (or fully offline across devices), so you can throttle, defer, or resume work accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents that collaborate and act with awareness:&lt;/strong&gt; As soon as you have more than one agent (or an agent plus tools/workers), coordination becomes the product. Shared session state and routing prevent clashing replies, duplicated context, and “two brains answering at once,” so multiple agents can communicate directly with users coherently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seamless human takeover when it really matters:&lt;/strong&gt; When an agent hits a boundary (risk, uncertainty, or policy) a human should be able to step in with full context and continue the session immediately. The handoff keeps the same session history and controls, so there’s no repeated questions, no “start again,” and no losing track of what happened mid-flight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identity and access control:&lt;/strong&gt; Beyond toy demos, you need to know who can read, write, steer, or approve actions. Verified identity plus fine-grained permissions let multi-party sessions stay secure without inventing a bespoke access model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability and governance:&lt;/strong&gt; When AI UX breaks in production, it’s rarely obvious where. Built-in visibility into session delivery and continuity makes failures diagnosable and auditable instead of “black box streaming incidents.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqaf20vuuz65xgvr49g7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqaf20vuuz65xgvr49g7.png" alt="AI Transport capabilities" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Concrete examples
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Multi-device copilots:&lt;/strong&gt; A user starts a long-running answer on desktop, switches to mobile mid-response, and the session continues without restarting. Steering and approvals apply to the same session regardless of device.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cr0ci7i6jmhpiacn57r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cr0ci7i6jmhpiacn57r.png" alt="Multi-device copilots architecture" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-running agents:&lt;/strong&gt; A research agent runs multi-step tool work for minutes. If the user disconnects, the work continues; when the user returns, the client hydrates from session history instead of resetting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8dar8zk01rlw93zv3xk7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8dar8zk01rlw93zv3xk7.png" alt="Long-running agents architecture" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started (low friction)
&lt;/h2&gt;

&lt;p&gt;You can get a basic session running in minutes:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
js
import Ably from 'ably';

// Initialize Ably Realtime client
const realtime = new Ably.Realtime({ key: 'API_KEY' });

// Create a channel for publishing streamed AI responses
const channel = realtime.channels.get('my-channel');

// Publish initial message and capture the serial for appending tokens
const { serials: [msgSerial] } = await channel.publish('response', { data: '' });

// Example: stream returns events like { type: 'token', text: 'Hello' }
for await (const event of stream) {
  // Append each token as it arrives
  if (event.type === 'token') {
    channel.appendMessage(msgSerial, event.text);
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>realtime</category>
      <category>devtools</category>
      <category>architecture</category>
    </item>
    <item>
      <title>AWS us-east-1 outage: How Ably’s multi-region architecture held up</title>
      <dc:creator>Ably Blog</dc:creator>
      <pubDate>Fri, 24 Oct 2025 12:24:17 +0000</pubDate>
      <link>https://forem.com/ably/aws-us-east-1-outage-how-ablys-multi-region-architecture-held-up-15mk</link>
      <guid>https://forem.com/ably/aws-us-east-1-outage-how-ablys-multi-region-architecture-held-up-15mk</guid>
      <description>&lt;h2&gt;
  
  
  Resilience in action: zero service disruption
&lt;/h2&gt;

&lt;p&gt;During this week’s AWS us-east-1 outage, &lt;a href="https://ably.com/" rel="noopener noreferrer"&gt;Ably&lt;/a&gt; maintained full service continuity with no customer impact. This was our multi-region architecture working exactly as designed; error rates were negligibly low and unchanged throughout. Any additional round trip latency was limited to 12ms, which is below the typical variance in any client-to-endpoint connection, and well below our 40–50ms global median; this is imperceptible to users and below monitoring thresholds. There were no user reports of issues. Taken together this means there was zero service disruption.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical sequence
&lt;/h2&gt;

&lt;p&gt;Ably provides a globally-distributed system hosted on AWS with services provisioned in multiple regions globally. Each scales independently in response to the level of traffic in the regions, and us-east-1 is normally the busiest region.&lt;/p&gt;

&lt;p&gt;From the onset of the AWS incident what we saw was that the infrastructure already existing in that region continued to provide error-free service. However, issues with various ancillary AWS services meant that our control plane in the region was disrupted, and it was clear that we would not be able to add capacity in the region as traffic levels increased during the day.&lt;/p&gt;

&lt;p&gt;As a result, at around 1200 UTC we made DNS changes so that new connections were not routed to us-east-1; traffic that would have ordinarily been routed there (based on latency) were instead handled in us-east-2. This is a routine intervention that we make in response to disruption in a region. Pre-existing connections in us-east-1 remained untouched, continuing to serve traffic without errors and with normal latency throughout the incident. Our monitoring systems, via connections established before the failover, confirmed this directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency impact: negligible
&lt;/h2&gt;

&lt;p&gt;We continuously test real-world performance in multiple ways. Monitors operated by Ably, in proximity to regional datacenter endpoints, indicated that the worst case impact on latency - which would have been clients directly adjacent to the us-east-1 datacenter, but which now have to connect to us-east-2 - was 12ms at p50. We also have real browser &lt;a href="https://ably.com/docs/platform/architecture/latency#round-trip-latency-measurement" rel="noopener noreferrer"&gt;round-trip latency measurements&lt;/a&gt; using Uptrends, which more closely simulate real users, with actual browser instances publishing and receiving messages between various global monitoring locations.&lt;/p&gt;

&lt;p&gt;These measurements taken during the incident are shown below; real-world clients experienced even lower latency impact, since from each of the cities tested, there is negligible difference in distance, and  latency, between that location and us-east-2 versus us-east-1. Taken across all US cities that are monitoring locations, the measured latency difference averaged 3ms. That actual difference is substantially lower than normal variance in client connection latencies, and is therefore imperceptible to users and well below monitoring thresholds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvptk7f2a8ecv4w4nopag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvptk7f2a8ecv4w4nopag.png" alt="Ably publish error rates for us-east-1" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwoyevnqqwx9o2b37o8p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwoyevnqqwx9o2b37o8p.png" alt="Ably US browser latencies" width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We restored us-east-1 routing on 21 October following validation from AWS and our own internal testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture at work
&lt;/h2&gt;

&lt;p&gt;This incident validated our multi-region architecture in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each region operates independently, isolating failures&lt;/li&gt;
&lt;li&gt;Latency-based DNS adapts routing to regional availability&lt;/li&gt;
&lt;li&gt;Existing persistent connections are unaffected if the only change is to the routing of new connections&lt;/li&gt;
&lt;li&gt;A further layer of defense, not used in this case, provides automatic client-side failover to up to five globally-distributed endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That final layer matters. Even if us-east-1 infrastructure had failed entirely (it didn’t), client SDKs would have automatically failed over to alternative regions, maintaining connectivity at the cost of increased latency. It didn’t activate this time, since regional operations continued normally, but it’s a core part of our defense-in-depth strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons reinforced
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The key takeaways for us from this incident:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A genuinely distributed system spanning multiple regions, not just availability zones, is essential for ultimate continuity of service&lt;/li&gt;
&lt;li&gt;Planning for, and drilling, responses to this type of event is critical to ensuring that your resilience is real and not just theoretical&lt;/li&gt;
&lt;li&gt;A multi-layered approach, with mitigations both in the infrastructure and SDKs, ensures redundancy and continuity even without active intervention. AWS continues to be an outstandingly good global service, but occasional regional failures must be expected. Well-architected systems on AWS infrastructure are capable of supporting the most critical business needs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep your realtime apps running smoothly, even when the internet breaks. Try &lt;a href="https://ably.com/" rel="noopener noreferrer"&gt;Ably&lt;/a&gt; for free today!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>uptime</category>
      <category>outage</category>
    </item>
  </channel>
</rss>
