Forem: Ably Blog

LiveObjects now available: shared state without the infrastructure overhead

Ably Blog — Tue, 05 May 2026 08:48:23 +0000

Shared state is a hard problem. Not hard in the abstract, computer-science sense (the concepts are well understood). Hard in the someone has to actually build this sense, where every team that wants a live leaderboard, a shared config panel, or a poll that updates in real time ends up reinventing the same wheels: conflict resolution, reconnection handling, state recovery.

Most teams do not want to spend their time building and maintaining that layer. They want to ship the feature that depends on it.

That is what LiveObjects is for.

From experimental to production-ready

When we first shipped LiveObjects, the API was explicitly experimental. We had the primitives (LiveMap for synchronized key-value state, LiveCounter for distributed counting) but the ergonomics needed work. Early adopters were clear: working directly with object instances felt brittle, especially when objects were replaced. Subscriptions broke. Navigating nested structures was cumbersome. The mental model didn't fit how people actually wanted to build.

So we rebuilt the API from the ground up. The result shipped in the JavaScript SDK before the end of last year, moving LiveObjects into Public Preview. It's centered on path-based operations. Instead of binding to specific object instances, you work with PathObjects that resolve at runtime against whatever exists at that location. Replace the object underneath, and your subscriptions follow automatically.

That feedback loop, from experimental signal to a redesigned API, is what today's release reflects. The API is stable and ready for production.

What changed?

A new API designed around how you think about state

The old approach required holding references to specific object instances, which meant reasoning about object identity rather than the shape of your data. The new PathObject API flips this: you describe a path, and the SDK handles the rest.

import * as Ably from 'ably';
import { LiveObjects, LiveMap, LiveCounter } from 'ably/liveobjects';

const client = new Ably.Realtime({ 
  key: 'your-api-key',
  plugins: { LiveObjects }
});

const channel = client.channels.get('game:room-42', {
  modes: ['OBJECT_SUBSCRIBE', 'OBJECT_PUBLISH']
});

const root = await channel.object.get();

await root.batch((ctx) => {
  ctx.set('leaderboard', LiveMap.create({
    alice: LiveCounter.create(0),
    bob: LiveCounter.create(0),
  }));
  ctx.set('round', LiveCounter.create(1));
});

root.get('leaderboard').subscribe(({ object }) => {
  renderLeaderboard(object.compact());
});

await root.get('leaderboard').get('alice').increment(10);

Subscriptions now observe paths rather than instances. If the underlying object is replaced, your subscription keeps working. No rewiring.

Object resets

One of the most requested features: reset an object to a clean state without tearing down and recreating the channel. Previously, the workaround was destroying the channel entirely, which forced clients to reconnect, reattach subscribers, re-establish presence, and race the teardown. Object resets remove all of that. Useful for a new game round, a cleared poll, or a reset config without losing connection state.

Reliable data expiry

State now expires reliably after 90 days by default. Previously this was best-effort. If you're building anything with ephemeral sessions or time-bounded content, you can depend on this rather than writing your own cleanup logic.

Revised object limits

The 100-object-per-channel limit has been revised to apply sensibly to top-level objects. Applications with nested structures can model data naturally without counting objects or designing workarounds to stay under the limit.

Easier map handling

.compact() and .compactJson() convert any LiveMap tree to a plain JavaScript object in one call, useful for rendering, serialization, or passing state to code that doesn't know about LiveObjects.

const state = root.get('leaderboard').compact();
// { alice: 120, bob: 95 }

What you can build

Live polls, leaderboards, collaborative forms, shared dashboards: any feature where multiple clients write to the same state and see each other's changes immediately. LiveObjects handles these well. But the use case we're focused on most right now is AI.

State sync for AI sessions

Agents need to share context. Not respond to a single request and forget it, but maintain a live picture of what's happening: what the user is working on, what tasks are in progress, what the session looks like.

The naive approach is polling or rebuilding session context on every request. That works until it doesn't. Agents diverge, state drifts, and the coordination layer becomes the thing your team maintains instead of the product.

LiveObjects is a cleaner mechanism. Multiple clients and agents read and write shared state simultaneously, conflicts are resolved automatically, and every subscriber sees updates the moment they land.

await root.get('session').batch((ctx) => {
  ctx.set('current_task', 'Summarizing document');
  ctx.set('progress', LiveCounter.create(0));
  ctx.set('context', LiveMap.create({
    page_title: 'Q3 Report',
    selected_text: 'Revenue grew 24% YoY...'
  }));
});

root.get('session').subscribe(({ object }) => {
  updateAgentStatusPanel(object.compact());
});

If you're building AI applications and using Ably for token streaming, LiveObjects handles the state layer: what the model is working with, what it's doing, and what users can steer in real time.

Multiple SDKs, production-ready

LiveObjects is available in JavaScript today, with Swift and Java coming in the weeks that follow. Other SDKs are available now via inband objects and the REST API for platforms without a native client yet.

We're making a stability commitment for each SDK when it reaches the bar, not flipping a global flag while only one runtime is actually ready.

Get started

The LiveObjects plugin ships as part of the standard SDK.

npm install ably

The LiveObjects docs have quick-start guides and a migration guide if you're upgrading from the experimental API. If you're building AI applications, the AI Transport docs cover how LiveObjects fits into the state sync layer.

Multi-device AI session continuity: how cross-device conversation sync works

Ably Blog — Tue, 05 May 2026 08:29:44 +0000

Written by Amber Dawson

You start a research task on your laptop, the network drops during a meeting, and when you open your phone to continue, the conversation is gone – you re-prompt, get partial duplicate results, and lose 30 minutes of work. The delivery layer dropped it. That's one of the most consistent problems teams hit when building AI applications.

It's particularly acute in customer support, where a session belongs to the conversation - not to any single device, connection, or participant. An AI agent handles a query, the user switches from desktop to mobile mid-interaction, a human needs to step in. Every one of those transitions is a point where the session can silently break.

Why this breaks

HTTP streaming is stateless. Each connection is independent, tied to a specific device and browser session, so when the user switches devices, refreshes, or loses connectivity, the new device has no position in the stream. It doesn't know which tokens the previous device received, it can't resume mid-response, and it starts over.

There's no shared state across connections. Device B has no visibility into what Device A received, and without session tracking built into the architecture, the server treats each connection as a new actor. A stateless delivery layer wasn't designed for conversations that span sessions, devices, or time.

What breaks in production

Teams building multi-device AI experiences without dedicated infrastructure hit the same set of edge cases.

Lost responses. The model finished generating while the user was offline or mid-switch. Nobody saw the output. The compute was wasted.

Duplicate effort. The user doesn't know if the previous session completed, so they re-prompt. You pay for the same response twice.

State conflicts. A new prompt arrives on the phone while the laptop tab still shows an incomplete response. Which version is canonical? The server doesn't know.

Mobile-specific failures. iOS and Android background apps aggressively drop connections. WiFi-to-cellular handoffs are frequent. A conversation that works fine on desktop will fall apart on mobile without explicit reconnection and resume handling.

These failures don't show up in demos. They appear in production, under real network conditions, with real users – and they erode trust quickly because AI conversations often carry context the user spent time building.

What most teams build first

The standard workaround is a Redis buffer between the AI backend and the client. It handles full page reloads reasonably well. It doesn't handle tab switches. It breaks on mobile backgrounding. And it has no path for multi-device delivery – the session state is scoped to one client, not to the user.

Every serious production team discovers this wall independently and ends up engineering some version of the same architecture. Vercel's own lead maintainer acknowledged the gap directly: "to solve this we would need to have a channel to the server that allows transporting that information. WebSockets are one option." That's the right diagnosis. The Redis buffer is an approximation of the real fix.

The architectural shift: state lives in the channel, not the connection

The underlying problem is that session state is coupled to the connection. The fix is decoupling them.

Instead of streaming directly over an HTTP connection, the server publishes messages to a channel. Any device subscribing to that channel receives the same messages. The state is in the channel. The connection is the transport, nothing more.

This is the foundation of what's increasingly called a durable session – a persistent, addressable session between agents and users that outlives any single connection, device, or participant. Durable execution makes the backend crash-proof; durable sessions makes the experience crash-proof. They sit on opposite sides of the agent and complement each other.

In practice this changes the behavior fundamentally. Any device can join – same browser tab, phone, or tablet. Subscribing to the channel gives that device access to the conversation. Reconnection becomes catch-up rather than restart: channels persist message history, and when a device reconnects, it replays what it missed and transitions to live delivery. From the user's perspective, they pick up where they left off.

Conflicts route through the server. User actions – sending prompts, interrupting, deleting messages – go to the server, which publishes the authoritative result to the channel. All devices receive the same update. There's no client-side state to reconcile.

What the transport layer has to handle

Identity-aware fan-out. The system needs to recognize all active sessions associated with a single user and propagate updates across all of them. When a user sends a message on one device, every other active device should reflect the change immediately. This requires mapping user identity to active connections at the infrastructure level, not the application layer.

Ordering and session recovery. If the connection drops – from a device switch, a network blip, or a page refresh – the user shouldn't lose messages or see them out of sequence. A well-designed transport layer replays missed events and keeps message sequences intact. History loads first, then the live stream resumes. The client doesn't need to manage the transition.

Token stream compaction. Replaying thousands of individual tokens to a reconnecting device is wasteful. A better pattern compacts token streams into complete responses in channel history: one message per AI response, not hundreds of tokens. New devices load the complete response instantly, then receive new tokens for any in-progress generation.

Presence tracking. The backend needs to know which devices are currently active. This matters for more than UX. Should the model keep streaming if the user closed the tab? Should a background task escalate if all devices have disconnected? Presence answers these questions from a live membership set rather than polling or timeout heuristics. Without it, systems rely on assumptions that produce missed interactions, wasted compute, and handoffs that arrive too late.

Presence-aware cost controls. AI agents can quietly generate output that delivers no value but incurs real cost – streaming to an empty room, running tool calls after the user navigates away. Tying agent activity to presence means the infrastructure pauses or deprioritizes automatically when no devices are engaged and resumes when they return. Costs scale with actual usage, not connection count.

Mobile is the hardest case

Mobile devices are the toughest environment for connection continuity.

Network instability is constant – WiFi-to-cellular handoffs, tunnel blackouts, dead zones. Resume capability isn't optional. Apps get backgrounded aggressively, so the model might finish generating while the app is suspended, and when the user returns they should see the completed response, not an empty screen.

Push notifications bridge the gap. When significant events occur while the app is backgrounded – task complete, human takeover required – notifications alert the user and deep-link directly to the conversation. The payload should carry enough context for the app to restore state without a full reload. Push notification infrastructure (FCM, APNs, Web Push) ships as a supported capability; AI-specific end-to-end delivery patterns are still being documented, so implementation details vary by platform.

Battery is also a real constraint. Holding open WebSocket connections when the app is backgrounded drains battery, so intelligent reconnection strategies close connections when backgrounded, reconnect on foreground, and use push notifications to trigger reconnects for important updates.

Gen-1 AI vs Gen-2 AI: the real decision

Not every AI application needs cross-device support. HTTP streaming works well for Gen-1 AI products – a user sends a prompt, the model returns a response, the interaction is complete. Single session, single device, seconds to complete. For that use case, HTTP streaming is the right call.

Gen-2 AI products look structurally different. Sessions last minutes or hours. Agents make tool calls mid-conversation, coordinate with other agents, and run tasks in the background while the user is elsewhere. Humans need to step in – approving actions, taking over from an agent that has reached its limits, handing control back. Users move between devices and expect the conversation to follow them.

The question isn't whether your architecture is complex. It's which generation of product you're building. If sessions outlive a single connection, if users will move between devices, if a human might need to join a running conversation – channel-based architecture is the right call. 32 of 37 vendors evaluated have no multi-device fan-out capability at all, which means most teams building Gen-2 products are either rebuilding this layer from scratch or shipping without it.

What this makes possible

Channel-based sessions change what teams can build. A user starts a complex analysis on their phone during a commute, continues on their laptop at the office, and receives a push notification when the background task completes. In a customer support workflow, an AI agent handles a query, the conversation follows the user from desktop to mobile mid-interaction, and a human operator can step in on any device with full session context intact – then hand control back to the agent when they're done.

Users already expect this from messaging applications. AI conversations are next.

The infrastructure decision is whether to build session synchronization yourself or use systems designed for it. Building it means pub/sub channels, message persistence with configurable retention, client SDKs that handle subscription and history replay, presence tracking, mobile SDKs with background handling, push notification support, and identity-scoped authorization. That's weeks to months of engineering, and the edge cases don't appear until production.

Ably AI Transport implements this model – the docs on channel history and connection state recovery cover what the infrastructure layer needs to handle in detail.

Why we're betting on Durable Sessions

Ably Blog — Mon, 27 Apr 2026 11:45:08 +0000

Written by Matthew O'Riordan

Over the past year, I've spoken to more than 40 engineering teams building production AI agents. Different companies, different frameworks, different use cases. The same conversation kept happening.

"Our streams break when users switch tabs." "We can't tell if the agent crashed or is still thinking." "We built a custom reconnection layer and it took three months." "Our users can't switch from laptop to phone mid-conversation." Every team described it differently, but they were all describing the same gap. Between the agent and the user, there's no dedicated infrastructure for the session itself.

We backed this up with research across 37 AI infrastructure platforms, hundreds of GitHub issues and community threads, and 40+ customer discovery calls. 35 of those 37 platforms have no stream resumption after a disconnect. 33 have no way to detect an agent crash. The gap is universal, and the framework maintainers know it. Vercel built a pluggable ChatTransport in AI SDK 5 so developers can bring their own transport. TanStack AI shipped a ConnectionAdapter for third-party providers. They've diagnosed the problem and built the plugin points. They're waiting for specialist infrastructure to show up.

Nobody did anything wrong. Everyone focused on the right thing first: the intelligence, the orchestration, the models. But as AI experiences have gotten more sophisticated, the transport layer between the agent and the user has become the constraint.

Agents are becoming human-like, and they need human infrastructure

The insight that changed how we think about this came from an unexpected direction. As agents get more sophisticated, they start behaving like human participants in a conversation. They think for a while before responding. They work on tasks in the background. They hand off to a human colleague when they hit their limits. Users walk away, come back later, and expect to pick up exactly where things were.

These are the exact communication challenges we've been solving for human-to-human interaction for 10 years. Presence, reliable delivery, session continuity across devices, bidirectional control. Every messaging app since WhatsApp has solved these problems for humans, and the moment agents become participants in conversations, they need the same infrastructure.

We've been building this for a decade

I'll be honest. We almost dismissed the AI space entirely. When every company suddenly needed an "AI strategy," my instinct was skepticism. We're an infrastructure company. We process trillions of transactions across billions of devices. Why would we need an AI-specific product?

I was wrong. Companies like Intercom and HubSpot were already building AI agent experiences on top of our Pub/Sub messaging infrastructure, the realtime layer that handles reliable delivery between servers, devices, and services. They needed ordered delivery, presence, session state, multi-device support. They were using the infrastructure we'd already built, without waiting for us to package it as an AI product.

Ably has been a durable session layer for 10 years. We never called it that because the term didn't exist. We called it realtime infrastructure, messaging, pub/sub. But the capabilities are the same. Persistent sessions that survive disconnects. Ordered delivery with automatic catch-up. Multi-device fan-out, presence, bidirectional communication. We built all of this for human communication at scale, and it turns out it's exactly what AI-to-human communication needs too.

A category is forming

We're not inventing this term. We're recognizing something that's already happening.

ElectricSQL published a "Durable Sessions" blog post earlier this year defining it as a pattern for collaborative AI. EMQX has used "Durable Sessions" as a named feature in their MQTT broker for years. Convex is building agent components with persistent threads and durable workflows. Vercel is building a DurableAgent class. At least 12 companies are converging on the same problem space from different angles.

The pattern mirrors Durable Execution. Temporal existed before AI agents needed it, then suddenly every team building production agents needed backend workflows that couldn't fail. Temporal went from niche to a $5 billion valuation. AWS adopted the term for Lambda Durable Functions. The category debate was over.

Durable Execution made the backend crash-proof. Durable Sessions makes the experience crash-proof. They're complementary layers on opposite sides of the agent.

This needs to be bigger than Ably

A category with one company in it isn't a category. It's a product pitch. We want other companies in this space. We want developers to recognize "durable sessions" as an infrastructure layer they need, regardless of who provides it.

We've published durablesessions.ai as a community resource that defines the concept, documents vendor convergence, and tracks how the ecosystem is forming. I'm personally committed to pushing this forward. Not because it helps Ably specifically, but because I believe it will improve how we all build and experience AI. I've been doing this for a long time and I've never been more energized about what's ahead.

If you're at AI Engineer Europe next week, our tech lead will be presenting on durable sessions and why this layer matters. I'll be there too. Come find me and the team. If you're building in this space, whether as a competitor, a complement, or a fellow traveler, I want to talk. Getting the people working on this in the same room, having honest conversations about what developers actually need, is worth more than any blog post.

This is the first in a series. Over the coming weeks, we'll go deeper into the evidence, the ecosystem, and the practical framework for evaluating what your AI sessions actually need. Follow along here or connect with me on LinkedIn.

Why AI agents need a transport layer: Solving the realtime sync problem

Ably Blog — Mon, 27 Apr 2026 11:37:29 +0000

Building AI agents that work reliably in production requires solving problems that have nothing to do with AI. While teams focus on prompt engineering, model selection, and agent orchestration, a different class of challenges emerges at deployment. These have little to do with LLMs and everything to do with keeping agents and clients synchronized in realtime.

Over the past few months, we've spoken with engineers at over 40 companies building AI assistants, copilots, and agentic workflows. The same infrastructure problems surfaced repeatedly – problems with distributed systems, not models.

The infrastructure gap in AI applications

When you're building an AI agent that streams responses to users, you're not just building an AI system. You're building a distributed realtime application where state needs to stay synchronized across components that connect, disconnect, and reconnect unpredictably.

These are the technical challenges that came up consistently:

Connection management at scale. Managing WebSocket or SSE connections between agents and clients becomes complex quickly. Connections drop during mobile network handoffs, page refreshes, and tab switches. Each disconnection requires handling buffering, replay logic, and state reconciliation.

Client-specific state tracking. Agents need to track what each individual client has received, across multiple devices and multiple users. When a client reconnects, the agent must determine exactly which messages they missed and replay only those, without gaps or duplicates.

Distributed agent routing. In distributed deployments, reconnecting clients need to reach the correct agent instance. This gets harder still with durable execution patterns, where agent state persists but the instance handling it may change.

Continuity between historical and live data. Clients loading a conversation need continuity between historical messages and live streaming responses. Gaps in this transition break the user experience.

What teams actually wanted wasn't complicated: token streams that survive network interruptions, conversations that work across device switches, multi-user sessions that stay synchronized, and long-running agent work that continues when users go offline.

These requirements describe a transport layer problem – the infrastructure between agents and clients that handles delivery, synchronization, and state management.

Technical patterns for AI workloads

Several technical patterns emerged from observing how teams build AI applications on top of pub/sub infrastructure.

Token streaming with message appends. LLMs stream tokens individually, but storing thousands of separate token messages per response creates inefficient channel history. Loading a conversation would require replaying thousands of individual tokens.

The solution is a message append operation: publish an initial message, then append subsequent tokens to it by referencing the message serial. Clients joining mid-stream receive the complete response so far in a single update, then receive subsequent appends. Channel history contains one compacted message per AI response rather than thousands of token fragments.

Server-side rollups batch appends within a configurable time window (default 40ms) to stay within rate limits while maintaining smooth streaming UX. This handles the impedance mismatch between token-by-token streaming from models and efficient message storage.

Annotations for citations. AI responses that reference external sources need citation metadata attached without modifying the response content itself. Publishing citations as annotations – metadata referencing a message serial – keeps the response clean while enabling rich client-side rendering.

Annotations include a type (e.g., citations:multiple.v1) and arbitrary data: URLs, titles, character offsets for inline citation markers. The transport aggregates annotations automatically – clients receive a summary ("3 citations from wikipedia.org, 2 from nasa.gov") rather than processing every individual event.

Messaging patterns for agentic workflows. The bi-directional nature of channels enables several agent interaction patterns:

Tool calls: Agents publish tool invocations with a toolCallId for correlation. Clients can render generative UI (displaying a weather card when get_weather is invoked) or execute client-side tools (agent requests GPS location, client executes locally and publishes the result back).

Human-in-the-loop: Agents publish approval requests. Authorized users review and respond over the same channel. The agent verifies the approver's clientId or userClaim before executing sensitive operations. The request-response pattern fits naturally into bi-directional channels.

Chain-of-thought streaming: Streaming reasoning alongside output can happen inline (single channel, distinguished by message name) or threaded (separate reasoning channel per response, subscribed to on demand to reduce bandwidth).

Why this matters for production AI

The gap between prototype and production AI isn't primarily about model capabilities. It's about infrastructure that handles the messy realities of distributed systems: network interruptions, device switches, concurrent users, and agent failures.

When agents and clients communicate through a proper transport layer rather than direct connections, entire classes of complexity disappear. Agents don't track connection state. Reconnection logic isn't custom code in every agent. Multi-device support isn't a feature you build, it's a property of the architecture.

The interesting problems in AI infrastructure aren't always where you expect them. Sometimes the hard part isn't the AI – it's keeping everything synchronized.

Ready to build resilient AI applications? Explore the AI Transport documentation for implementation patterns, code examples, and architectural guidance.

The missing transport layer in user-facing AI applications

Ably Blog — Mon, 13 Apr 2026 14:17:39 +0000

Most AI applications start the same way: wire up an LLM, stream tokens to the browser, ship. That works for simple request-response. It breaks when sessions outlast a connection, when users switch devices, or when an agent needs to hand off to a human.

The cracks appear in the delivery layer, not the model. Every serious production team discovers this independently and builds their own workaround. Those workarounds don't hold once users start hitting them in production.

Here's what breaks, and what the transport layer needs to handle.

The shift that creates the problem

Simple AI applications are synchronous. User sends a message, model returns a response, done. A dropped connection restarts cleanly.

Agentic applications aren't like that. They run in a loop: perceive the user's intent, reason with the model, act by calling tools or sub-agents, and observe the result. Then they go around again until the task is done.

A research agent might loop a dozen times over several minutes, calling APIs and querying databases. The user is present throughout, watching, waiting, potentially needing to redirect. The connection might drop mid-loop, the user might switch devices, or they realize mid-stream the agent is heading the wrong way.

That's a different problem, and one HTTP streaming wasn't designed to solve. The backend surviving and the session surviving are two different things. What's missing is a layer that treats the conversation as durable state: persisting across connections, devices, and participants.

Durable execution makes the backend crash-proof. Durable sessions makes what the user actually sees crash-proof. Most teams building agentic products need both.

What breaks in production

Tokens disappear and reconnects corrupt state. HTTP streaming delivers tokens once. A dropped connection loses them. Most workarounds handle full page reloads but not tab switches or mobile backgrounding.

Worse, naive reconnect implementations replay the same output and produce duplicates: fragments, repeated tokens, or an interface in an indeterminate state. The Vercel AI SDK makes the tradeoff explicit: its resume and stop features are incompatible. You can resume a dropped stream or cancel it, but not both. A full breakdown of what resumable streaming requires at the infrastructure level is here.

Users can't see what the agent is doing. The agent is running tool calls, checking backend systems, orchestrating sub-agents. From the user's perspective it's a spinner and silence. Users abandon tasks they can't see progressing.

There's no standard mechanism for surfacing intermediate results as first-class events on the session channel.

There's no way to interrupt. Once generation starts, the user is locked out. Interruption requires bi-directional communication on the same channel simultaneously, user input arriving while agent output is still streaming, without breaking state. One company disabled user input entirely during agent responses because the backend couldn't distinguish an intentional cancel from a dropped connection.

The agent keeps working after the user has left. No signal tells the agent the user closed the tab. Compute and token costs accumulate.

Presence is a live membership set showing who is active in the session. Agents use it to pause expensive operations when nobody is there and resume when they return.

Multiple agents collide. When two specialist agents are working on the same request, every intermediate update routes through the orchestrator. The orchestrator becomes a bottleneck: when it's relaying progress it doesn't care about, the architecture starts to fight itself. The multi-agent coordination post goes deeper on how this plays out with concurrent specialist agents.

Agents fail silently. Most infrastructure has no agent health mechanism at the transport level. When an agent crashes, a presence disconnect fires immediately, rather than waiting for a timeout inferred from a dead stream. Build on the wrong signal and recovery logic breaks under real failure conditions.

Human handovers lose context. When an agent escalates, most implementations open a different interface, summarize what happened, and hope the transfer works. The user explains their problem again. A unified channel where agents and humans can both participate addresses this: the human arrives with full history and picks up mid-thread.

There are no transport-level diagnostics. Model-level tooling shows what the model decided to do. Nothing shows what happened between the agent and the user's screen: whether a message arrived, whether a reconnection worked, whether delivery stalled. Debugging a failed session means stitching together server logs that rarely reconstruct what actually happened.

What the transport layer needs to handle

Resumable streaming. Output persists in the channel, not the connection. When a client reconnects, it rejoins from its last received position with no gaps and no duplicates. Mutable messages handle retry corruption: republish to the same message ID and the client sees clean updated state, not a second copy. Vercel built a pluggable ChatTransport interface specifically to support this pattern; TanStack AI shipped a ConnectionAdapter for the same reason. The ecosystem has diagnosed the problem and built the plug-in points.

Multi-device continuity. Session state lives on the channel, not any individual client. Any device subscribing gets the same history and live updates. The session follows the user, not the connection.

23 of 26 AI platforms evaluated in recent market research have no multi-device session continuity, including ChatGPT.

Bi-directional communication on a shared channel. User input and agent output flow on the same channel simultaneously. A redirect from the user arrives as an explicit signal while the agent is mid-stream, not as an ambiguous TCP side effect. The backend can now distinguish an intentional cancel from a dropped connection.

Progress as structured events. Agent reasoning steps, tool call progress, and intermediate results should be first-class events on the channel, subscribable independently of the main response stream. Specialized agents publish progress directly. The orchestrator stops relaying events it doesn't care about.

Presence. A live membership set for users, agents, and human operators. Agents make real decisions based on it: pause when the user is gone, resume when they return. Crash detection is a presence event: when an agent disconnects, the event fires immediately.

Session-level diagnostics. Channel history serves as both the live diagnostic feed and the persistent audit record: structured, timestamped, and identity-attributed. This covers the delivery layer between agent and user, separate from model-level observability, and both surfaces matter in production.

The underlying principle

Each of these problems is tractable in isolation. Solving all of them together, without a dedicated infrastructure layer, is where engineering budget quietly disappears. None of it has anything to do with the AI product itself.

The workaround that seemed to hold breaks as soon as teams need cancellation, multi-device continuity, or human handover without a context break. The result is a growing layer of glue code that keeps teams away from the features they're actually trying to ship.

The category forming around this problem, durable sessions, is the session-layer equivalent of what durable execution did for backend workflows. The infrastructure requirement is the same: a layer built for the failure modes that actually occur, not workarounds patched onto infrastructure designed for something else.

Where Ably AI Transport fits

Ably AI Transport is a drop-in durable session layer that absorbs this complexity. Developers publish to a session. The infrastructure handles resumable streaming, multi-device continuity, presence, shared state, and bi-directional communication. No changes required to your model calls or agent orchestration.

Docs go deeper →

Why your AI response restarts on page refresh (and what it takes to prevent it)

Ably Blog — Mon, 13 Apr 2026 13:58:52 +0000

Your AI assistant is mid-sentence explaining a complex debugging strategy. The user refreshes the page. The response starts over from the beginning, or worse, vanishes entirely.

This isn't a model problem. It's a delivery problem.

What breaks

Most AI applications stream LLM responses over HTTP using Server-Sent Events or fetch streams. The connection delivers tokens in order until the response completes. If the user refreshes, closes the tab, or loses network connectivity, the stream ends. When they reconnect, there's no mechanism to resume from where they left off.

The application has two options: start the entire response over (wasting tokens and user time) or lose everything that was streamed before the disconnection (losing context the user already read).

Neither option works in production. Users refresh pages. Networks drop. Browsers crash. Mobile apps background. These aren't edge cases.

Why naive approaches fail

Client-side buffering: You can cache tokens in memory or localStorage, but this only handles intentional refreshes on the same device. It doesn't help with network interruptions, crashes, or users switching devices mid-conversation.

Response regeneration: Re-requesting the full response from the LLM costs tokens, adds latency, and often produces different output. The user sees the response change on reload, breaking continuity.

Stateless HTTP streaming: Standard SSE and fetch streams have no concept of session recovery. When the connection closes, the client has no way to tell the server "resume from token 847."

How resumable streaming actually works

The system needs three components:

Session identity: Each AI response gets a unique session ID that persists across connections. When the client reconnects, it presents this ID to resume the same logical response.

Offset tracking: The server tracks which tokens have been delivered. The client tracks which tokens it has received and rendered. On reconnect, the client requests "start from token N."

Ordered delivery with history: The transport layer guarantees token ordering and maintains a replayable history. When a client reconnects with an offset, the server resumes delivery from that point without re-invoking the LLM.

Tradeoffs

Building this yourself means managing session state, handling offset synchronisation across multiple connections, and ensuring tokens arrive in order even if network packets don't. You'll need persistent storage for token history and logic to handle race conditions when users reconnect from multiple tabs.

A concrete example

User asks an AI assistant to explain a codebase. The LLM streams 2,000 tokens over 30 seconds. At token 1,247, the user's network drops for eight seconds. Without resumability, the user sees a frozen response, then either loses everything or watches it restart.

With resumable streaming:

The client detects the disconnection and stores offset 1,247
Network recovers, client reconnects with session ID and offset
Server resumes delivery from token 1,248
User sees the response continue exactly where it stopped

The user never knows there was an interruption.

Multi-device continuity

Resumable streaming also enables conversation continuity across devices. The user starts a question on their laptop, switches to their phone, and sees the AI response pick up mid-stream. Same session ID, same offset tracking, different client.

This matters for AI workflows that span locations: research started at a desk, continued on a commute, finished in a meeting room. Without transport-level session management, each device restart loses context.

Why this matters for AI reliability

Unreliable delivery creates unreliable AI experiences. Users learn not to trust that responses will complete. They avoid asking complex questions because they might lose the answer. They stop using AI features on mobile networks.

Fixing this isn't about better models or smarter prompts. It's about ensuring delivery is as dependable as the intelligence behind it.

Next steps

If you're building AI features where responses take more than a few seconds, or where users might switch devices or encounter network issues, you need resumable streaming. You can build session management and offset tracking yourself, or use infrastructure like Ably AI Transport that handles it for you.

Either way, design for reconnection from day one. Your users will refresh. Your network will drop. Production isn't a stable connection.

Resume tokens and last-event IDs for LLM streaming: How they work & what they cost to build

Ably Blog — Mon, 13 Apr 2026 13:32:33 +0000

When an AI response reaches token 150 and the connection drops, most implementations have one answer: start over. The user re-prompts, you pay for the same tokens twice, and the experience breaks.

Resume tokens and last-event IDs are the mechanism that prevents this. They make streams addressable – every message gets an identifier, clients track their position, and reconnections pick up from exactly where they left off. The concept is straightforward. The production scope is not: storage design, deduplication, gap detection, distributed routing, and multi-device continuity all follow from the same first decision.

How resume tokens actually work

Resumable streaming has four moving parts.

Message identifiers. Every token or message gets a sequential ID when published – monotonically increasing, so each new message has a higher ID than the previous one.

Client state. The client tracks the ID of the last message it successfully received. In a browser, that's typically held in memory or local storage. On mobile, it needs to survive app backgrounding.

Reconnection protocol. When the connection drops, the client presents the last ID it saw. The server responds with everything that arrived after that ID, then transitions to live streaming.

Catchup delivery. The client receives missed messages in order before live tokens resume. The seam should be invisible.

The stream itself becomes the source of truth. The client doesn't reconstruct what it missed – the stream delivers it.

What SSE's Last-Event-ID header gives you

Server-Sent Events implements this natively. When an SSE connection drops, the browser automatically includes a Last-Event-ID header on reconnection. The server sees which event the client last received and resumes from there.

The browser handles reconnect logic. Application code doesn't change between initial connection and reconnection. For the happy path – stable connection, single device, short responses – SSE with Last-Event-ID works well.

The problems start at the boundary of what SSE can do.

SSE is unidirectional and HTTP-only. It has no native history beyond what you implement server-side. It doesn't handle bidirectional messaging, so live steering – users redirecting the AI mid-response – requires a separate channel. On distributed infrastructure, a reconnecting client may reach a different server instance that has no record of the original session. SSE handles the reconnect handshake. Everything else – distributed state, per-instance routing, multi-device history – is still your problem. For use cases that need bidirectional messaging, WebSockets vs SSE covers the tradeoffs in detail.

Building resume into WebSockets

WebSockets don't include resume semantics. When a WebSocket closes, the connection is gone. Reconnecting creates a new socket with no knowledge of the previous one.

Building resume on WebSockets means building all of it yourself:

Session IDs generated at stream start, stored server-side, presented by the client on reconnection. Message IDs assigned sequentially. Server logic to look up a session, find the position, replay history, then transition to live. Buffer management to decide how long to keep messages for sessions that haven't reconnected yet. Cleanup logic to expire stale sessions without cutting off legitimate reconnects.

Each piece is straightforward in isolation. The edge cases are where the weeks go.

The storage problem teams underestimate

Token-level storage is where most implementations hit an unexpected wall.

A 500-word response generates roughly 625 tokens. If you store each token as a separate record, loading one response means retrieving 625 records. A conversation with 20 exchanges is 12,500 records. Multiply across thousands of concurrent users and history retrieval becomes the performance bottleneck.

This matters because history retrieval is on the critical path for multi-device continuity. When a user switches from laptop to phone, the speed of catchup determines whether the experience feels continuous or broken.

The more practical model is to treat each AI response as a single logical message and append tokens to it rather than publishing them individually. Clients joining mid-stream receive the full message so far, then get new tokens as they arrive. One record per response instead of hundreds.

Duplicates and gaps: the two failure modes that break trust

Duplicates happen when the connection drops after the client receives a message but before the acknowledgement reaches the server. On reconnect, the server doesn't know whether to replay that message. Without deduplication logic, the client renders the same token twice.

The fix is using message IDs as deduplication keys on the client – straightforward in principle, but it needs to survive page reloads and work across tabs.

Gaps happen when sequential IDs arrive out of order or not at all. If a client receives message 153 after 150, messages 151 and 152 are missing. Without gap detection, the client silently renders an incomplete response. With it, you need logic to request missing messages, decide what to do if they can't be retrieved, and handle the state when the client gives up waiting.

Both failure modes are rare enough to be invisible in testing. Both surface under real network conditions: mobile handoffs, flaky WiFi, corporate proxy timeouts. The first time you see them is usually a support ticket.

What distributed deployment adds

A single-server implementation can tie session state to process memory and mostly work. As soon as you run multiple instances – which you will, for reliability and scale – a routing problem appears.

A client that connected to instance A reconnects to instance B. Instance B has no record of the session. Your options: route all reconnections back to the originating instance (a pinning strategy that creates hotspots and defeats the purpose of multiple instances), or store session state in shared infrastructure that all instances can read.

Shared session storage means Redis or equivalent: network round-trips on reconnect, cache invalidation logic, and failure handling when the cache is unavailable. This is solvable. It's also not in the first implementation.

The multi-device gap

Multi-device continuity is where connection-oriented design hits a wall.

When state lives in the connection – or in server memory tied to that connection – device switching loses context. The phone doesn't know what the laptop received. Without a shared source of truth for message history that any device can query, each reconnect from a new device is a new session.

True multi-device continuity requires decoupling state from connections entirely. The conversation lives in a channel or persistent store. Devices subscribe and catch up rather than resuming a connection.

This is a different architectural model than resuming an HTTP stream. For most teams, that realisation arrives after the first implementation is already in production.

When resumable streaming matters most

Not every streaming application needs this. For short-lived, single-session interactions on stable connections, standard HTTP streaming is fine.

Resume becomes critical under specific conditions:

Mobile clients handle network handoffs between WiFi and cellular constantly. Each one is a potential disconnection.

Long responses – anything over 30 seconds – have a high probability of encountering a transient failure.

Multi-device usage means the conversation needs to live in a channel, not a connection.

Multi-agent systems, where several agents publish updates to a shared channel. A reconnecting client needs to catch up on everything all agents published, not just the primary response thread.

The alternative is forcing users to restart on every interruption. That breaks trust fast, and the cost compounds on longer or more complex tasks where restarting is most painful.

What you're actually signing up for to build this

Teams that have shipped resumable streaming in production describe a consistent arc: the first implementation takes a week, the edge cases take a month, and cross-device reliability is still not fully solved six months later.

The full scope of a production-grade build: session management, message storage with efficient retrieval by ID range, client-side deduplication, gap detection, distributed routing, cache invalidation, buffer expiry, and monitoring to surface issues you can't reproduce locally.

Good transport infrastructure handles duplicates and gaps automatically. Application logic shouldn't need to check for either – that's the infrastructure's job.

Build vs infrastructure

Building resumable streaming yourself is a reasonable choice if you have a stable team, time to maintain it, and no multi-device or distributed requirements.

It's a harder choice than the SSE documentation makes it look. One team described spending several weeks on custom session management and still not fully solving cross-device reliability. The problems weren't obvious in the design phase – they appeared under mobile network conditions, under load, and when users did things the system wasn't built to handle.

The alternative is transport infrastructure that implements resume as part of the platform. You keep control of your LLM, prompts, and application logic. Session continuity, offset management, ordered delivery, and multi-device state become infrastructure concerns rather than application concerns.

Both paths are defensible. The costs of building are real and most of them are invisible until the first deploy.

Streaming responses between AI agents and clients? Ably AI Transport includes resumable token streaming, automatic replay, and channel-based delivery with guaranteed ordering. Docs go deeper.

Appends for AI apps: Stream into a single message with Ably AI Transport

Ably Blog — Thu, 26 Feb 2026 12:05:30 +0000

Streaming tokens is easy. Resuming cleanly is not. A user refreshes mid-response, another client joins late, a mobile connection drops for 10 seconds, and suddenly your "one answer" is 600 tiny messages that your UI has to stitch back together. Message history turns into fragments. You start building a side store just to reconstruct "the response so far".

This is not a model problem. It's a delivery problem

That's why we developed message appends for Ably AI Transport. Appends let you stream AI output tokens into a single message as they are produced, so you get progressive rendering for live subscribers and a clean, compact response in history.

The failure mode we're fixing

The usual implementation is to stream each token as a single message, which is simple and works perfectly on a stable connection. In production, clients disconnect and resume mid-stream: refreshes, mobile dropouts, backgrounded tabs, and late joins.

Once you have real reconnects and refreshes, you inherit work you did not plan for: ordering, dedupe, buffering, "latest wins" logic, and replay rules that make history and realtime agree. You can build it, but it is the kind of work that quietly eats weeks of engineering time.

With appends you can avoid that by changing the shape of the data. Instead of hundreds of token messages, you have one response message whose content grows over time.

The pattern: create once, append many

In Ably AI Transport, you publish an initial response message and capture its server-assigned serial. That serial is what you append to.

It's a small detail that ends up doing a lot of work for you:

const result = await channel.publish({ name: 'response', data: '' });
const { serials: [msgSerial] } = result;

Now, as your model yields tokens, you append each fragment to that same message:

if (event.type === 'token') {
  channel.appendMessage({ serial: msgSerial, data: event.text });
}

What changes for clients

Subscribers still see progressive output, but they see it as actions on the same message serial. A response starts with a create, tokens arrive as appends, and occasionally clients may receive a full-state update to resynchronise (for example after a reconnection).

Most UIs end up implementing this shape anyway. With appends, it becomes boring and predictable:

switch (message.action) {
case 'message.append': renderAppend(message.serial, message.data); break;
case 'message.update': renderReplace(message.serial, message.data); break;
}

The important difference is that history and realtime stop disagreeing, without your client code doing any extra work. You render progressively for live users, and you still treat the response as one message for storage, retrieval, and rewind.

Reconnects and refresh stop being special cases

Short disconnects are one thing. Refresh is the painful case, because local state is gone and to stream each token as a single message forces you into replaying fragments and hoping the client reconstructs the same response.

With message-per-response, hydration is straightforward because there is always a current accumulated version of the response message. Clients joining late or reloading can fetch the latest state as a single message and continue.

Rewind and history become useful again because you are rewinding meaningful messages, not token confetti:

const channel = realtime.channels.get('ai:chat', {
  params: { rewind: '2m' }
});

Token rates without token-rate pain

Models can emit tokens far faster than most realtime setups want to publish. If you publish a message per token, rate limits become your problem and your agent has to handle batching in your code.

Appends are designed for high-frequency workloads and include automatic rollups. Subscribers still receive progressive updates, but Ably can roll up rapid appends under the hood so you do not have to build your own throttling layer.

If you need to tune the tradeoff between smoothness and message rate, you can adjust appendRollupWindow. Smaller windows feel more responsive but consume more message-rate capacity. Larger windows batch more aggressively but arrive in bigger chunks.

Enabling appends

Appends require the "Message annotations, updates, appends, and deletes" channel rule for the namespace you're using. Enabling it also means messages are persisted, which affects usage and billing.

Why this is a better default for AI output

If you are shipping agentic AI apps, you eventually need three things at the same time:

streaming UX
history that's usable
recovery that does not depend on luck

Appends are how you get there without building your own "message reconstruction" subsystem. If you want the deeper mechanics (including the message-per-response pattern and rollup tuning), the AI Transport docs are the best place to start.

Realtime steering: interrupt, barge-in, redirect, and guide the AI

Ably Blog — Mon, 09 Feb 2026 09:58:06 +0000

Start typing, change your mind, redirect the AI mid-response. It just works. That is the promise of realtime steering. Users expect to interrupt an answer, correct its direction, or inject new instructions on the fly without losing context or restarting the session. It feels simple, but delivering it requires low-latency control signals, reliable cancellation, and shared conversational state that survives disconnects and device switches. This post explores why expectations have shifted, why today's stacks struggle with these patterns, and what your infrastructure needs to support proper realtime steering.

What's changing

AI tools are moving beyond static, one-turn interactions. Users expect to interact dynamically, especially in chat. But most AI systems today force users to wait while the assistant responds in full, even if it's off-track or no longer relevant. That's not how human conversations work.

Expectations are shifting toward something more natural. Users want to jump in mid-stream, adjust the AI's course, or stop it altogether. These patterns (barge-in, redirect, steer) are becoming table stakes for responsive, agentic assistants.

What users want, and why this enhances the experience

Users want to stay in control of the conversation. If the AI starts drifting, they want to say "stop" or "try a different angle" and get an immediate course correction. They want to guide the assistant's direction without breaking the flow or starting over.

This improves trust, keeps sessions on-topic, and avoids wasted time. It also brings AI interactions closer to how real collaboration works: iterative, reactive, fast.

Users now expect a few technical behaviours as part of that experience:

Responses can be interrupted in real time
New instructions are applied mid-stream without reset
The AI keeps context and adjusts without losing the thread

Why realtime steering is proving hard to build

Most AI systems treat generation as a one-way stream. Once the model starts producing tokens, the system just plays them out to the client. If the user wants to interrupt or change direction, the only real option is to cancel and resend a new prompt - often from scratch. That's because most systems today cannot support mid-stream redirection because their underlying communication model does not allow it.

Stateless HTTP cannot carry steering signals

Traditional request–response models push output in one direction only. Once a long-running generation begins, there is no reliable way to send control signals back to the server. Cancelling or redirecting usually means tearing down the stream and starting again.

Browser-held state breaks immediately

Most apps keep the state of an active generation in the browser. If the user refreshes or switches device, the in-flight response loses continuity. Any client-side steering logic tied to that state vanishes too, which forces a full reset.

Backend models often run without shared conversational state

If the orchestration layer is not tracking what the AI is currently doing, it cannot apply corrections cleanly. The model receives a brand-new prompt instead of a context-preserving instruction layered onto an active task.

The default stack was never designed for low-latency control loops

Steering requires coordinated signalling between UI, transport, orchestration, and model inference. That means ordering guarantees, durable state, and fast propagation of control messages. Without these, the AI continues generating tokens after a user says stop, causing confusion and wasted compute.

Steering mid-stream looks like a simple UX gesture. It is not. It is a distributed-systems problem sitting under a conversational interface.

Why you need a drop-in AI transport layer for steering

Delivering realtime control requires more than token streaming. It requires a transport layer that keeps context alive, supports low-latency bidirectional messaging, and ensures that user instructions and model output remain synchronised.

Bi-directional, low-latency messaging

Client-side signals such as "stop" or "try this instead" must reach the backend quickly and reliably. WebSockets or similar long-lived connections make this possible by enabling client-to-server control while the AI continues to stream output.

Reliable interrupt and cancellation primitives

Stopping generation must be instant and clean. The transport must carry cancellation events with ordering guarantees so the backend halts inference exactly where intended, without corrupting state.

Session continuity

The system needs persistent session identity so instructions and outputs are tied to the same conversational thread. Redirection should extend the session, not rebuild it.

Presence and focus tracking

If users have multiple tabs or devices open, the system needs to know where instructions are coming from. Steering messages must route to the correct active session without collisions.

Realtime steering relies on a transport layer designed for conversational control, not just message delivery.

How the experience maps to the transport layer

User experience desired	Required transport layer features	Underlying technical implementation
Interrupt and redirect responses in real time	Bi-directional messaging	WebSocket-based channels enabling client-to-server signals during output
Cancel generation cleanly	Interrupt primitives	Server-side control hooks to stop model inference and close stream pipelines
Preserve continuity after steering	Session continuity	Persistent session or conversation IDs with context caching
Update response direction on the fly	Dynamic state sync	Shared state model where new input is merged into active conversational context
Steer across devices	Identity-aware multiplexing	Fan-out model updates across all user sessions in sync

Realtime steering for AI you can ship today

You don't need a new architecture to support real-time steering, cancellation, or recovery. You need a transport layer that can keep the session alive, deliver messages in order, and preserve state across disconnects. Ably AI Transport provides those foundations out of the box, so you can build controllable, resilient AI interactions without rebuilding your entire stack.

Sign-up for a free account and try today.

Why orchestrators become a bottleneck in multi-agent AI published

Ably Blog — Tue, 03 Feb 2026 12:42:23 +0000

Complex user tasks often need multiple AI agents working together, not just a single assistant. That's what agent collaboration enables. Each agent has its own specialism - planning, fetching, checking, summarising - and they work in tandem to get the job done. The experience feels intelligent and joined-up, not monolithic or linear. But making that work means more than prompt chaining or orchestration logic. It requires shared state, reliable coordination, and user-visible progress as agents branch out and converge again. This post explores what users now expect, why traditional infrastructure falls short, and how to support truly collaborative AI systems.

What's changing?

The shift from simple question-response to collaborative AI experiences goes beyond continuity or conversation. It's about delegation. Users are starting to expect AI systems that can take a complex request and break it down behind the scenes. That means not one big model doing everything, but a network of agents, each focused on a part of the task, coordinating to deliver a coherent outcome. We've seen this in tools like travel planners, research assistants, and document generators. You don't just want answers, you want progress, structure, and coordination you can see. The AI system shouldn't just feel like a chat thread, it should feel like a team quietly getting on with things while keeping you informed.

What users want, and why this enhances the experience

When users interact with a system powered by multiple agents, they want to feel the benefits of parallelism without the overhead of managing complexity. If one agent is fetching flight data, another handling hotel options, and a third reviewing visa requirements, the user doesn't care about the internal plumbing. They care that their travel plan is evolving visibly and coherently. They want to see that agents are working, understand what's happening in realtime, and be able to intervene or revise things if needed.

Crucially, users expect the state of their task to reflect reality, not just the conversation. If they change a hotel selection manually, the system should adapt. If an agent crashes or stalls, the UI should show it. The value isn't just in faster results, it's in reliability, transparency, and the sense that multiple agents are genuinely collaborating, with each other and with the user - toward a shared goal.

To deliver this, agent systems need to stay in sync. State needs to be shared across agents and user sessions. Progress needs to be surfaced incrementally, not hidden behind a final answer. And context must be preserved so agents don't overwrite or duplicate each other's work. That's what turns a bunch of isolated model calls into a coordinated assistant.

Why this is proving challenging

Multi-agent systems can work today, but the default pattern most tools push you toward is an orchestration-first user experience. Even when multiple agents are running behind the scenes, their activity is typically funnelled through a single orchestrator that becomes the only "voice" the user can see. That hides useful progress, creates a single bottleneck for updates, and limits how fluid the experience can feel.

That's because traditional LLM interfaces assume a single stream of input and a single stream of output. Orchestration frameworks may invoke multiple agents in parallel, but the UI still tends to expose a linear, synchronous workflow: the orchestrator collects results, then reports back. If the user changes direction mid-process, or if an agent needs to react immediately to something in shared state, you're often forced back into "wait for the orchestrator" loops.

The underlying infrastructure assumptions reinforce this. HTTP request/response cycles work well when one component is responsible for coordinating everything, but they make it awkward for multiple agents to maintain an ongoing, direct connection to the user and to shared context. Token streaming helps, but it usually represents one agent's output to one user - not concurrent updates from a group of agents reacting in real time to a changing state.

Ultimately, the challenge isn't that orchestration fails. It's that it constrains app developers. Most systems don't give you fine-grained control over which agent communicates what, when, and how, or an easy way to reflect multi-agent activity directly in the user experience. To build confidence and responsiveness, clients need to know which agents are active, what they're doing, and how that activity relates to the shared, realtime session context - without everything having to be mediated by a heavyweight orchestrator.

Why you need a drop-in AI transport layer

To make multi-agent collaboration work in practice, you need infrastructure that handles concurrency, coordination, and visibility - not just messaging.

The transport layer must support persistent, multiplexed communication where multiple agents can publish updates independently while still participating in the same user session. That gives app developers fine-grained control over the user experience: which agents speak to the user, when they speak, and how progress is presented. Orchestrators can still exist, but they don't have to mediate every user-facing update.

State synchronisation is non-negotiable

Structured data, like a list of selected hotels or the current trip itinerary, should live in a realtime session store that agents and UIs can both read from and write to. This creates a single source of truth, even when updates happen asynchronously, across devices, or outside the chat interface

Presence adds another layer of confidence

When users see which agents are online and working, it sets expectations and builds trust. If an agent goes offline, the system should detect it, not leave the user guessing. This becomes even more important as these systems scale up in production environments where reliability is critical.

Interruption handling rounds it out

Users will change their minds mid-task. Your system needs to respond without the orchestrator agent tearing down and restarting everything. That means listening for user input while processing, canceling or rerouting tasks, and updating the shared state cleanly so individual agents can pick up where they left off or switch strategies on the fly.

How the experience maps to the transport layer

User experience desired	Required transport layer features	Underlying technical implementation
Visible, concurrent agent progress	Multiplexed pub/sub channels	Multiple agents publish progress updates to a shared realtime channel the UI subscribes to
Shared, up-to-date task state	Structured state synchronisation	Use of shared shared session state with clear schemas to reflect selections, status, and choices
Seamless agent-to-agent coordination	Out-of-band messaging support	Internal HTTP APIs or RPC protocols between agents, decoupled from user-facing updates
Awareness of system activity and health	Presence tracking	Agents register presence on connection and broadcast availability or error states
Graceful handling of mid-task changes	Event-driven state updates and recovery	Listen to user changes in shared state and cancel or adjust in-flight work accordingly

Making it work today

Multi-agent collaboration is already happening in planning tools, research systems, and internal automation workflows. The models are not the limiting factor. The hard part is the infrastructure that keeps agents in sync, shares state reliably, and exposes progress to users in real time.

Ably AI Transport gives you the infrastructure needed to support this pattern. Realtime channels, shared state objects, presence, and resilient connections provide the foundations for agents that coordinate reliably and surface their work as it happens. No rebuilds, no custom multiplexing, no home-grown state machinery.

Sign-up for a free developer account and try it out.

Multi-agent AI systems need infrastructure that can keep up

Ably Blog — Fri, 30 Jan 2026 10:49:09 +0000

An Ably AI Transport demo

When you're building agentic AI applications with multiple agents working together, the infrastructure challenges show up fast. Agents need to coordinate, users need visibility into what's happening, and the whole system needs to stay responsive even as tasks branch out across specialised workers.

We built a multi-agent travel planning system to understand these problems better. What we learned applies well beyond holiday booking.

The coordination problem

The demo uses four agents: one orchestrator and three specialists (flights, hotels, activities). When a user asks to plan a trip, the orchestrator delegates sub-tasks to the specialists. Each specialist queries data sources, evaluates options, and reports back. The orchestrator synthesises everything and presents choices to the user.

This mirrors how most teams are actually building agentic systems. You don't build one massive agent that tries to do everything. You build focused agents, give them specific tools, and coordinate between them.

The infrastructure question is: how do you keep everyone (the agents and the user) synchronized as work happens?

Why streaming alone isn't enough

Token streaming solves part of this. The orchestrator can stream its responses back to the user so they're not waiting for complete answers. That's table stakes now for any AI interface.

But streaming tokens from the orchestrator is only part of the problem. Users want visibility into the behaviour of each specialised agent – through their own token streams, structured updates like pagination progress, or the current reasoning of an agent working through a task.

Prompt: Plan a weekend trip to a nearby city

In our AI Transport demo, we also use Ably LiveObjects to publish progress updates from each specialist agent. The user sees which agent is active (tracked via presence), what it's querying, and how much data it's processing. These aren't logs or debug output. They're structured state updates that drive the UI. The agent even decides how to represent its progress to the user, taking raw database query parameters and turning them into natural language descriptions through a separate model call.

This requires infrastructure that can handle multiple publishers updating different parts of the shared state concurrently. The flight agent publishes its progress. The hotel agent publishes its progress. The orchestrator streams tokens (and it doesn't need to care about intermediate progress updates from the specialized agents). All on the same channel, all staying in sync.

Agent searches for flights and hotels based of user's criteria

State that reflects reality, not just conversation

Chat history creates a limited view of what's actually happening. If a user changes their mind, deletes a selection, or modifies something outside the conversation thread, the agent needs to know about it.

We use Ably LiveObjects to maintain the user's current selections (flights, hotels, activities) and agent status. This creates a source of truth that exists independently of the conversation. The orchestrator can query this state directly through a tool call, even if nothing in the chat history explains the change.

The interesting bit: agents can subscribe to changes in this data, so they see updates live. While you could store this in a database and have agents query it via tool calls, the ability to subscribe means agents can react to user context in real time (what the user is doing in the app, data they're manipulating, configuration changes they're making).

When the user asks "what's my current itinerary?", the agent doesn't rely on conversation history. It checks the actual state. If the user deleted their flight selection, the agent sees that immediately.

This separation matters more as systems get complex. The conversation is one interface to the system. The actual state (what's selected, what's in progress, what's completed) needs to exist independently. Agents, users, and other parts of your system all need reliable access to current state, not a reconstruction from message history.

Agent offers hotel options while remembering flight choice

Synchronising different types of state

Not all state is created equal, and your infrastructure needs to handle different patterns:

Structured, bounded state works well with LiveObjects. Progress indicators (percentage complete, items processed), agent status (online, processing, completed), user selections, and configuration settings all have predictable size limits. Clients can subscribe to changes and re-render UI efficiently. Agents can read current state without parsing through message history.

Unbounded state like full conversation history, audit trails, or complete reasoning chains still belongs in messages on a channel. You're appending to a growing log rather than updating bounded data.

Bidirectional state synchronization enables richer interactions. You can sync agent state to users (progress updates, ETAs, task lists), let users configure controls for agents (settings, preferences, constraints), and give agents visibility into user context (where they are in the app, what they're doing, what data they're viewing). Each of these can use structured data patterns for efficient synchronization.

The key is knowing which pattern fits which data, and having infrastructure that supports both.

Decoupling internal coordination from user-facing updates

The agents in our demo communicate with each other over HTTP using agent-to-agent protocols. That's appropriate for internal coordination. It's synchronous, it's request-response, it follows established patterns.

The user-facing updates go over Ably AI Transport. That's where you need state synchronization and the ability for multiple publishers to update different parts of the UI concurrently.

This decoupling matters. Each agent can independently decide how to surface its progress updates and state to the user, while the user maintains a single shared view over updates from all agents.

We also let specialist agents write directly to LiveObjects, bypassing the orchestrator. When the flight agent has progress to report, it writes it. The user sees it. The orchestrator never touches that data (it only needs the final result). This avoids additional coordination and keeps the architecture simpler.

Handling interruptions

Users change their minds. They interrupt. They refine requests mid-task. Your infrastructure needs to support this without rebuilding everything from scratch.

In the demo, you can barge in and interrupt the agent while it's working. The system detects the new input, cancels the in-flight task, updates the state, and kicks off a new search. The UI shows the cancellation, the new request, and the new progress, all without breaking the conversation.

This works because state updates are events on a channel. The agents listen for new user input even while they're processing. When they see it, they can decide whether to cancel current work, adapt it, or complete it first. The infrastructure doesn't dictate this logic (it enables it).

Agent then helps user select activities to do on trip

What presence actually tells you

Before any interaction starts, the UI shows which agents are online. This comes from Presence. Each agent enters presence when it starts up and updates it as its status changes.

Presence serves multiple purposes. Agents can see the online status of users and take action if a user goes offline (canceling tasks or queuing notifications – essential from a cost optimization perspective). In multi-user applications, users can see who else is online in the conversation. And for your operations team, it's observability built into the architecture. This answers a basic question for users: is this system actually working right now?

The enterprise patterns that emerge

This travel demo is deliberately simple, but the patterns map directly to enterprise use cases:

Research workflows where multiple agents pull from different data sources (financial databases, customer records, market data) and coordinate findings. Users need to see progress across all of them, not wait for a final answer.

Document generation where one agent structures the outline, others fill in sections, another handles compliance checks. The state (which sections are complete, which are being reviewed, what's been approved) needs to stay synchronized as different agents work in parallel.

Customer support routing where classification agents determine issue type, specialist agents handle resolution, and orchestration agents manage escalation. Status updates need to flow to support reps, customers, and dashboards in real time.

The common thread: multiple agents, concurrent work, shared state, and humans who need visibility and control. The infrastructure that makes a travel planner responsive and reliable is the same infrastructure that makes these systems work.

Labelled screenshot of AI Travel Agent's moving parts

What this requires from infrastructure

You need a reliable transport layer that allows concurrent agents and clients to communicate in realtime. This isn't just about pub/sub – it's about robust infrastructure, high availability, and guaranteed delivery.

You need state synchronisation that works for both structured data and message logs. Having access to both patterns depending on your needs is critical–bounded state objects for UI updates and configuration, unbounded message streams for conversation history and audit trails.

You need presence so you know what's actually online and available. You need connection recovery so users don't lose context when networks flicker.

Most importantly, you need this to work at the edge – in browsers and mobile apps, not just between backend services. That's where your users are. That's where responsiveness matters. The transport layer needs to be robust enough to handle the reality of client connectivity: spotty networks, mobile handoffs, browser tabs backgrounded and resumed.

The hard part of building multi-agent systems isn't the LLMs. The models are getting better every month. The hard part is the coordination, the state management, the visibility, and the reliability as these systems get more complex.

This is why we built AI Transport. We saw teams struggling with these exact problems: cobbling together WebSocket libraries, building their own state synchronization, dealing with reconnection logic, and watching their systems break under the messiness of real client connectivity. AI Transport gives you the infrastructure layer these systems need, built on Ably's proven reliability at scale, so you can focus on your agents instead of your transport layer.

Building agentic AI experiences? You can ship it now

This demo was built with Ably AI Transport. It's achievable today. You don't need to rebuild your stack to make it happen.

Ably AI Transport provides all you need to support persistent, identity-aware, streaming AI experiences across multiple clients. If you're working on agentic products and want to get this right, improve the AI UX, we'd love to talk.

Anticipatory customer experience: How realtime infrastructure transforms CX

Ably Blog — Wed, 28 Jan 2026 10:27:17 +0000

We're entering a new era of anticipatory customer experience – one that's not just reactive, not just responsive, but truly predictive. In this new model, systems don't wait for friction to appear; they recognise signals early and step in before the user ever feels a slowdown or moment of uncertainty. The bar has shifted: customers now expect brands to predict their needs and act before friction even surfaces. It's a fundamental rewiring of the relationship between companies and the people they serve.

This shift toward predictive customer experiences isn't hypothetical. Anticipatory experiences are happening now, powered by realtime data infrastructure that moves companies from playing catch-up to staying ahead. Think of it as the Age of Anticipation – where realtime signals, reliability, and adaptability form the core of modern CX design.

Anticipatory CX isn't magic, it's just realtime infrastructure done right.

So, if you're building next-generation CX or AI-powered agentic systems, this article outlines the architectural groundwork required to make anticipation real.

What is anticipatory customer experience?

Anticipatory customer experience uses realtime data infrastructure to predict and address customer needs before friction occurs. Unlike reactive support that waits for problems, anticipatory CX leverages continuous data streams, event-driven patterns, and predictive signals to intervene proactively, turning unknowns into reassurance.

Why realtime infrastructure matters for CX: Realtime infrastructure enables the continuous flow of customer signals needed for prediction. Without it, systems rely on stale, batch-processed data that kills foresight. Companies like Doxy.me and HubSpot use realtime platforms to anticipate confusion, delays, and churn risk before customers experience frustration.

From reactive to anticipatory: Why realtime data infrastructure powers predictive CX

Anticipation starts with having the right information at the right moment. But prediction requires fresh, realtime signals flowing continuously through your systems.

The healthcare sector illustrates this shift perfectly. Doxy.me, a telehealth platform trusted by hundreds of thousands of providers, faced a critical challenge: how do you anticipate patient confusion before it derails a virtual appointment? Their answer was "teleconsent" – a feature where healthcare providers walk patients through consent forms collaboratively, in real time.

As the patient reads, fills in fields, and types responses, the provider sees every change as it happens. No refresh required. No lag. No wondering if the patient is stuck on question three. The system detects hesitation patterns and enables providers to intervene before confusion becomes abandonment. This is anticipatory CX in action – predicting friction points and addressing them before they escalate.

But building this required infrastructure that could handle the continuous flow of patient interactions without introducing the very friction it was meant to eliminate. "The more that I can get my team to focus on healthcare business logic and less to focus on infrastructural data synchronisation, the better," explains Heath Morrison from Doxy.me. "Anything that provides higher level APIs to get us more in that space – and not be specialised in the stuff you guys should specialise in – is appealing and valuable to us."

By rebuilding their realtime stack on reliable infrastructure, Doxy.me achieved a 65% cost reduction while transforming their system from a liability into a core strength. Read the full Doxy.me case study →

Retailers are doing similar work, spotting churn risk in realtime and intervening with targeted offers or support before the customer clicks away. Financial services companies are shifting from asking "what happened?" to "what's about to happen?" These aren't reactive fixes. They're anticipatory moves that change outcomes – but only when the underlying data infrastructure can keep pace.

Realtime infrastructure like Ably's makes this possible – it's the unseen layer that ensures systems receive the continuous stream of signals they need to predict accurately, without lag or data loss.

Industries using anticipatory CX

Healthcare: Telehealth platforms use realtime infrastructure to anticipate patient needs, showing "doctor joining now" before patients wonder if something's wrong.

Financial Services: Banks predict fraud patterns and alert customers to unusual activity before money moves.

Retail: E-commerce platforms spot abandonment signals and intervene with targeted offers before checkout failure.

Logistics: Delivery services flag delays and update ETAs before customers start refreshing tracking pages.

Building trust through realtime customer engagement: The infrastructure foundation

Trust is built in moments of uncertainty. And anticipation? It turns unknowns into reassurance.

Think about the last time you booked a rideshare or waited for a delivery. The difference between a company that leaves you guessing and one that proactively updates you – "your driver is two minutes away," "slight delay, new ETA: 3:47pm" – is the difference between anxiety and confidence. Realtime anticipation doesn't just inform, it reassures.

Telehealth platforms have figured this out. When patients see "doctor joining now" before they've even begun to wonder if something's wrong, it changes the entire experience. Logistics companies that flag delays before customers start refreshing tracking pages are doing the same thing: reducing friction before it becomes frustration.

But there's a flip side: when realtime systems fail, trust erodes faster than it built up. A phantom notification, a delayed update, an inaccurate prediction – these aren't just technical hiccups. They're credibility problems. Reliability isn't a nice-to-have, it's the foundation. When customers cite Ably's five-plus years without a global outage, they're not celebrating uptime for its own sake. They're describing the baseline that makes anticipation possible at scale. View Ably's live uptime status.

Ably exists to be that foundation. The reason trust can scale across millions of interactions, without companies needing to worry about the underlying infrastructure failing at the worst moment.

Core technologies behind anticipatory CX

Core technology	Benefit
Realtime pub/sub messaging	WebSocket-based event distribution for instant signal propagation
Event-driven architecture	Composable, adaptive systems that respond to customer signals
Predictive analytics	AI-powered interpretation of continuous data streams
Continuous data streams	Sub-6.5ms message delivery latency without polling
Fault-tolerant infrastructure	99.999% uptime requirements for maintaining trust

Future-proofing customer experience: Event-driven architecture for anticipatory CX

To anticipate effectively, your CX stack needs to evolve as fast as your customers' expectations do. Rigid, monolithic architectures can't keep up with new signals, emerging channels, or changing customer behaviors. The future belongs to composable, event-driven systems.

Doxy.me's evolution illustrates this perfectly. They built their realtime features organically – using PubNub to handle presence detection and state synchronisation, all ephemeral data that disappeared after each session. But as they planned their next phase, they hit a wall: they needed persistence. The ability to decouple patient workflows from video calls, support richer collaboration, maintain state across sessions, and plug in new capabilities without rebuilding their entire stack. They prototyped with Convex and loved the developer experience, but needed production-grade infrastructure that could slot into their Node/TypeScript/Postgres/AWS environment.

Event-driven architectures make this kind of evolution possible. You can layer in predictive capabilities, plug in new communication channels, or add analytics tools – all without tearing everything down and starting over. One enterprise CX leader described it this way: "We used to dread adding new functionality. Now we think in terms of what events we need to listen for and what actions we want to trigger. It has completely changed our velocity."

Ably enables this kind of interoperability – CRMs, chat systems, analytics tools, customer-facing applications all publishing and subscribing to customer events in real time. WebSockets and pub/sub patterns ensure consistent, low-latency communication across every channel, without developers having to reinvent transport logic for each integration. It's the connective tissue that makes anticipatory systems work at scale.

But more moving parts do mean more complexity. Companies need governance frameworks and resilience planning to ensure their adaptive architectures don't become fragile ones. The ones succeeding here aren't necessarily the ones with the newest tech – they're the ones who've built systems that can absorb change without breaking.

The Age of Anticipation is composable. Adaptive, event-driven architecture is what makes foresight scalable.

How to implement anticipatory CX

1. Establish realtime data infrastructure – Replace polling with streaming architecture for continuous signal flow

2. Implement event-driven pub/sub patterns – Enable loosely coupled systems that respond to customer signals

3. Build predictive models using continuous data – Layer AI/ML on top of realtime streams for pattern recognition

4. Create proactive intervention workflows – Design automated responses to predictive signals (offers, alerts, support)

5. Monitor reliability metrics rigorously – Track latency, uptime, message integrity to maintain trust at scale

What makes this different

Most CX discussions focus on speed (faster responses, quicker resolutions.) But anticipation goes deeper. It's about infrastructure that doesn't just move data quickly, but does so reliably enough to build trust and flexibly enough to adapt as expectations evolve. Explore Ably's four pillars of dependability.

Realtime infrastructure is the hidden enabler. It's what makes customer care feel effortless, predictive, and ultimately, more human. Not because it replaces human judgment, but because it removes the friction that gets in the way of delivering exceptional care.

The companies winning in the Age of Anticipation aren't the ones with the flashiest technology demos. They're the ones who've built the unglamorous, reliable, adaptive infrastructure that makes anticipation possible at scale. They've realised that foresight isn't magic – it's architecture.

Business impact of anticipatory customer experience

52% of consumers stopped using brands after bad experiences – making proactive, anticipatory CX non-negotiable (PwC 2025 Customer Experience Survey)
65% cost reduction achieved by Doxy.me through realtime infrastructure that prevents issues versus fixing them reactively
61% of CX leaders deliver proactive communications using AI, while only 6% of laggards do, creating a significant competitive gap (Cisco research)
5+ years without a global outage – Ably's proven track record demonstrates the reliability required for maintaining trust at enterprise scale
40% of companies plan to increase investment in predictive instant experiences in 2025, signalling industry-wide shift to anticipatory models

Three pillars of anticipatory CX

1. Realtime data streams – Fresh, continuous signals flowing through your systems without latency or data loss

2. Reliability at scale – Infrastructure trusted to maintain consistency across millions of interactions, measured in years of uptime

3. Adaptive architecture – Event-driven systems that evolve with customer expectations without requiring rebuilds

Ready to build anticipatory experiences?

Ably's realtime platform delivers the continuous data streams and event-driven patterns your systems need to anticipate customer needs, with the reliability required to maintain trust at scale.

Six-plus years of 100% uptime. Sub-6.5ms message delivery latency. Built-in message integrity guarantees.