Forem: Carolina Carriazo

How to use Ably LiveSync’s MongoDB Connector for realtime and offline data sync

Carolina Carriazo — Thu, 16 Jan 2025 13:01:19 +0000

In light of the recent deprecation of MongoDB Atlas Device Sync (ADS), developers are seeking alternative solutions to synchronize on-device data with cloud databases. Ably LiveSync offers a potential alternative and can replace some of ADS’s functionality, enabling realtime synchronization of database changes to devices at scale. LiveSync allows for a large number of changes to MongoDB to be propagated to end user devices in realtime and store the changes in any number of local storage options - from an embedded database to in-memory storage.

For instance, imagine an inventory app that needs to broadcast stock updates to multiple devices in realtime. Ably LiveSync allows you to automatically subscribe to inventory changes in your database and broadcast this data to millions of clients at scale, allowing them to remain synchronized with the state of your inventory in realtime.

This article explains why on-device storage is critical, explores existing solutions, and demonstrates how Ably LiveSync’s MongoDB connector can help with a brief code tutorial.

Why keep information on-device?

Local storage is a must for apps that need offline access or fast performance— like e-commerce inventory apps, or news apps downloading content for offline browsing. But not every app needs it. If your app is always online or just streams read-only data, you can skip the complexity of a local database. Thankfully, with Ably, you can adapt to your use case, whether you need offline support or just realtime updates. Some of the benefits of on-device storage are:

Offline access: Storing data directly on the device ensures users can seamlessly access and interact with information even when they have no internet connection or are in areas with poor connectivity. This is particularly crucial for users who frequently work in offline environments or travel to locations with unreliable network coverage.
Performance: Applications demonstrate significantly improved response times and reduced latency when accessing data stored locally, as opposed to making time-consuming server calls across the network. This local data access eliminates network-related delays and provides instantaneous data retrieval for critical operations. Cost efficiency: Users experience substantial savings on their data usage and associated costs since the application doesn't need to repeatedly download information from remote servers. This is especially beneficial for users with limited data plans or in regions where mobile data is expensive.
User experience: Users benefit from a consistently smooth and reliable application experience, maintaining uninterrupted access to their data regardless of their network status or connection quality. This reliability helps build user trust and satisfaction with the application.

Options for storing information on device

Modern mobile operating systems provide a variety of ways to store information on device:

iOS: Includes UserDefaults, CoreData, and SQLite, with flexibility for additional solutions based on specific needs.
Android: Provides shared preferences, Room database, and file storage.
Cross-platform frameworks: With React Native, react-native-async-storage is a popular starting library for simple needs. However, for advanced use cases requiring NoSQL-like abilities, some good choices here would be RealmDB (which, unfortunately, as we know, is being deprecated), UnQLite, LevelDB, or Couchbase.

Regardless of your choice of an on-device database and storage methodology, you can use Ably LiveSync to synchronize data from your managed or on-premises database to mobile devices in realtime. This includes MongoDB - as well as Atlas. While we currently support only MongoDB and PostgresSQL, we are working on adding support for other database engines.

What is Ably LiveSync?

Ably LiveSync lets you monitor database changes and reliably broadcast them to millions of frontend clients, keeping them up-to-date in realtime.

LiveSync works with any tech stack and prevents data inconsistencies from dual-writes while avoiding scaling issues from "thundering herds" — sudden surges of traffic that can overwhelm your database.

How to persist data locally with Ably

Let’s explore how to build a simple in-store management app that tracks product inventory using React Native and SQLite for local storage and a Mongo Atlas for our cloud database. Despite Mongo being a document storage and SQLite being a relational database, the two can be used in combination. We are going to use the Ably SDK callback methods to store documents and changes inside our local SQLite database.

Setting up Ably

For simplicity, we’ll stick to TypeScript. Before anything else, create a new React Native project using the CLI:

npx @react-native-community/cli@latest init AwesomeStore

Creating a MongoDB integration rule with Ably

Now we need to create a new channel that streams database changes to your clients. This ensures realtime updates whenever your MongoDB data changes. To create an integration rule that will sync your MongoDB database with Ably, you’ll first have to sign up for an Ably account.

Once that’s done, you should have access to your Ably dashboard. Create an app or select the app you wish to use. Navigate to the Integrations tab > Create a new integration rule > MongoDB. Fill out the Connection URL with your MongoDB connection URL; Database name with your db name (for this example, SQLiteDatabase); and Collection with your collection name (for this example, products). For more information on this process and the parameters involved, check out our docs.

This sets up a new channel, built on top of our core Ably Pub/Sub product, which streams changes (through MongoDB change streams) from your database to your clients.This essentially ensures that any change that occur in your database will be delivered to any device subscribed to a channel.

Creating the local datastore

We’ll create a new file in our project called datastore.js and initialize SQLite:

export const createTables = async (db: SQLiteDatabase) => {

  const query = `CREATE TABLE IF NOT EXISTS products(
        id INT32
        name TEXT NOT NULL
        description TEXT
        quantity INT32
    );`;

  await db.executeSql(query);
};

After the tables are created, we need a way to retrieve store products and update their stock:

export const getProducts = async (db: SQLiteDatabase): Promise<ToDoItem[]> => {
  try {
    const products: StoreProduct[] = [];
    const results = await db.executeSql(`SELECT id, name, description, quantity FROM ${tableName}`);
    results.forEach(result => {
      for (let index = 0; index < result.rows.length; index++) {
        products.push(result.rows.item(index))
      }
    });
    return products;
  } catch (error) {
    console.error(error);
    throw Error('Failed to get products!');
  }
};

export const saveOrUpdateProducts = async (db: SQLiteDatabase, products: StoreProduct[]) => {
  const insertQuery =
    `INSERT OR REPLACE INTO ${tableName}(id, name, description, quantity) values` +
    products.map(i => `(${i.id}, '${i.value}', '${i.description}', '${i.quantity}')`).join(',');

  return db.executeSql(insertQuery);
};

Receiving database changes from MongoDB over Ably

Let’s take a look at how we can receive changes from the configured Ably channel. More information can be found in our documentation, but this is the important snippet we need - setting up the Ably Realtime SDK:

import * as Ably from 'ably';

// Instantiate the Realtime SDK
const ably = new Ably.Realtime('[your API key]');

// Get the channel to subscribe to
const channel = ably.channels.get('store:1');

// Subscribe to messages on the 'store:1' channel
await channel.subscribe((message) => {
    // Print every change detected in the channel
  console.log('Received a change event in realtime: ' + message.data)
});

We need to write a new function that will take the payload of message.data and store it in our database:

const addOrUpdateProduct = async (data: any) => {
    try {
      let product: StoreProduct = {
            id: data.fullDocument.id,
            name: data.fullDocument.name,
            description: data.fullDocument.description,
            quantity: data.fullDocument.quantity,
      }
      const db = await getDBConnection();
      await saveOrUpdateProducts(db, product);
    } catch (error) {
      console.error(error);
    }
  };

We can call our new function in our message subscription:

// Subscribe to messages on the 'store:1' channel
await channel.subscribe((message) => {
      // Print every change detected in the channel
      console.log('Received a change event in realtime: ' + message.data)
      if (data.ns.coll === "products") {
            await addOrUpdateProduct(message.data)
      } else if (data.ns.coll === "store") {
            // Another function which updates changes for the `store` collection
      } else {
            console.warn("Unknown collection");
      }
});

The full workflow

With this setup, the app listens for realtime updates from your MongoDB collection and persists changes locally, ensuring an up-to-date inventory system even when offline.

Final thoughts: What makes Ably different

I hope that gives you a good overview of what Ably LiveSync’s MongoDB Connector can do! Besides providing a potential alternative to Atlas Device Sync, Ably, as a realtime communications platform, is built for scalability and reliability. Here are some features of Ably Pub/Sub, the backbone upon which LiveSync’s database connector is built:

Predictable performance: A low-latency and high-throughput global edge network, with median latencies of <50ms.
Guaranteed ordering & delivery: Messages are delivered in order and exactly once, even after disconnections.
Fault-tolerant infrastructure: Redundancy at regional and global levels with 99.999% uptime SLAs. 99.999999% (8x9s) message availability and survivability, even with datacenter failures.
High scalability & availability: Built and battle-tested to handle millions of concurrent connections at scale.
Optimized build times and costs: Deployments typically see a 21x lower cost and upwards of $1M saved in the first year.Try Ably today and explore our MongoDB connector.

Chat API pricing: Comparing MAU and per-minute consumption models

Carolina Carriazo — Tue, 10 Dec 2024 15:48:47 +0000

Pricing is critical to deciding which chat API you will use - however, it can often feel like there are limited options. Whether you are looking to gradually scale a chat app or anticipate large and sudden spikes in traffic, pricing models can make or break the bank depending on your usage - and most vendors will expect you to accept the one or two industry standards.

Chat API providers predictably fall into a handful of pricing model categories. We’ll explore them in this article by explaining them, comparing them, and ultimately concluding which is best for each particular use case.

Chat APIs: Common pricing models

Chat API pricing models are designed to align with different usage patterns - like a steady user base and usage, or periodic spikes - but they also introduce trade-offs depending on an application’s scale and messaging demands. These models are generally categorized as forms of consumption-based pricing, where costs are tied to how the service is used. Let’s look at the most common pricing models in use today:

Monthly active users (MAU)

The Monthly Active Users (MAU) model is one of the most widely used pricing models in the industry. Providers like CometChat, Sendbird, Twilio, and Stream charge based on the number of unique active users per month.

You pay for each user who interacts with the chat API within a given month, regardless of the number of messages they send or receive. While this can simplify billing, it comes with the tradeoff of assuming the “typical usage” of a monthly active user. For example, an individual MAU may actually use much less active connection time or send much fewer messages than what is assumed for an average MAU. Simply put, this method is not granular.

This model is predictable for applications with small and steady user bases, since, if you’re not expecting much user volatility, it’s easy to estimate costs. But any volatility in workloads, like experiencing a brief viral period and dipping back down, can result in overpaying for peak costs (peak MAUs) in a monthly period.

For chat services operating at scale, the monthly amount spent on peak MAUs often grossly exceeds the bill for actual usage; it wastes allocated resources and money.

There is a pricing model designed to tackle these pricing issues at scale, however - and we use it at Ably.

Per-minute consumption

A per-minute consumption model goes beyond traditional consumption-based pricing by billing customers based on their actual usage of service resources—connection time, channels, and messages. This approach directly addresses the inefficiencies inherent in MAU pricing models. This isn’t a common model in the industry, but we’ve adopted it here at Ably to meet the usage needs of our customers at scale.

Per-minute consumption measures actual usage in fine-grained units, such as:

Connection minutes: The total time devices are connected.
Channel minutes: The time channels remain active.
Message events: Each message sent or received by users.

By tracking usage at this granular level, it ensures customers only pay for what they consume, without overpaying for resources they don’t use. Traffic spikes don’t necessarily lead to hugely increased costs either - the pricing is distributed across these dimensions, smoothing the overall impact. For example, livestreaming events, which may have a huge number of messages at their peak but a low number of channels, would see a more modest increase in cost than if they were billed by user count. Instead of penalizing a single metric, this approach provides greater predictability and reflects resource utilization more holistically.

Per-minute consumption also incentivizes resource optimization, such as reducing idle connections or batching messages, which can further mitigate cost surges during spikes. (Batching comes in handy when many:many chat interactions lead to an exponential increase in delivered messages, which we're implementing soon at Ably on the server side).

Popular pricing models compared

Deciding whether an MAU, message throughput, or per-minute consumption pricing model works for you depends on your use case - but if you are looking to scale a chat application to any considerable degree, as a general rule, per-minute consumption will be the best option.

MAU pricing assumes a “typical user” for billing purposes. This involves bundling resources such as connection time, message throughput, and storage into a fixed monthly fee per active user, which doesn’t accurately reflect the actual usage of the user.

Now imagine a customer operating a live event platform. They’re running a live event for two hours in the month that peaks at 50,000 users. What would the monthly prices look like between an MAU model and a per-minute consumption model?

Let’s say that the MAU model assumes that each “average” user will send 1k messages per month. While the bill comes in based on user count only, built into the cost per user is an assumption on how much one would use (in this case, a total of 50 million messages for 50k users). The MAU model then bills the whole month based on the peak of 50,000 users.

With per-minute consumption, costs reflect the actual connection time and messages used - we’ll estimate generously:

Connection time: 50,000 users × 240 minutes (accounting for pre- and post-event activity) = 12 million connection minutes.
Message volume: 50,000 users sending an average of 200 messages = 10 million messages.
Channels and channel time: let’s say 5 channels x 240 minutes = 1,200 channel minutes.

Even without specific prices to hand, we can see that billing for a typical user is inefficient in this scenario. Per-minute billing focuses on ensuring fairness and transparency for highly volatile traffic situations like these (for more information on this, Matt O’Riordan, Ably’s CEO, talks about pricing model issues in his blog post).

What does this mean in practice?

This table breaks it down:

Model	Examples	Best suited to	Challenges
Monthly Active Users (MAU)	Stream, Sendbird, Twilio	Apps with steady or low user activity	Paying for peak costs during volatile usage periods
Per-minute consumption	Ably	Apps with scalable, high-volume messaging	Requires tracking of usage metrics

Ably’s per-minute consumption model

If the per-minute consumption model we discussed above sounds promising to you, here’s some more information on how this works specifically with Ably.

At Ably, we’ve developed a pricing model designed to align more closely with the needs of realtime chat applications. Unlike traditional MAU or throughput-based models, Ably offers per-minute pricing that scales predictably and transparently with your application.

Here’s how Ably stands out:

Flexibility: Pay only for what you use, with no penalties for growing user bases or unexpected spikes in message throughput.
Scalability: Ably’s infrastructure supports billions of messages daily, with costs optimized for applications of any scale.
Transparency: Ably’s pricing eliminates the hidden costs often associated with rigid MAU or throughput models, giving you full visibility into your expenses.

Ably’s platform is built on a globally-distributed infrastructure designed for high-performance, scalable, and dependable messaging. With support for exactly-once delivery, message ordering, and <50ms global average latency, Ably ensures a seamless chat experience for users anywhere in the world.

Our Chat SDK in private beta offers fully-fledged chat features, like chat rooms at any scale; typing indicators; read receipts; presence tracking; and more. And of course, our per-minute pricing means that your consumption is as cost-effective as possible.

Data integrity in Ably Pub/Sub

Carolina Carriazo — Thu, 21 Nov 2024 14:40:26 +0000

When you publish a message to Ably Pub/Sub, you can be confident that the message will be delivered to subscribing clients, wherever they are in the world.

Ably is fast: we have a 99th percentile transmit latency of <50ms from any of our 635 global PoPs, that receive at least 1% of our global traffic. But being fast isn’t enough; Ably is also dependable and scalable. Ably doesn’t sacrifice data integrity for speed or scale; it’s fast and safe.

This post describes the Ably Pub/Sub architecture and features that guarantee your Pub/Sub message is delivered, in order, exactly once, to clients globally; while protecting against regional data center failures, individual instance failures, and cross-region network partitions.

How Ably regional Pub/Sub architecture and persistence works

Each region in Ably is capable of operating entirely independently, but regions also coordinate with each other to share and replicate messages globally.

In each region, a single Pub/Sub channel has exactly one primary location across a fleet of Ably servers. When a message is published by a client attached to that region, the message is processed and stored by that single Pub/Sub channel location before the message is ACKed. Once the publishing client receives the ACK, they can be confident that the message will not be lost, and will be delivered to all subscribing clients.

As well as exactly one primary location, an Ably Pub/Sub channel also has exactly one secondary location. Both the primary and secondary location durably store a copy of the message before the ACK is sent to the client.

If the primary location of a Pub/Sub channel fails, the secondary location is ready to take over. The secondary location already has a copy of all the required message data and becomes the primary. A new secondary location is created and the message data is replicated to that new location. This means that clients are isolated from individual instance failure.

The message is immediately replicated to the primary and secondary locations in other regions globally who have subscribing clients. Ably will store up to 14 copies of each message, globally, to mitigate against the failure of entire regions of the Ably service.

All messages are persisted durably for two minutes, but Pub/Sub channels can be configured to persist messages for longer periods of time using the persisted messages feature. Persisted messages are additionally written to Cassandra. Multiple copies of the message are stored in a quorum of globally-distributed Cassandra nodes.

How Ably SDK clients interact with the Pub/Sub architecture

Each Ably region is capable of operating independently, so Ably SDK clients can connect to any region. By default clients connect to the region providing the lowest latency, but if an issue with that region is detected (perhaps the region is erroring or is slow to respond), the SDKs will connect to another fallback region, and continue operating as normal.

Clients are isolated from region failures. There is no single point of failure in the regional Ably architecture. Regional failures will not affect the global availability of Ably, because clients will fallback to another region and continue operating.

Exactly-once delivery

Ably Pub/Sub messages support exactly-once delivery, which is achieved through two mechanisms: idempotent publishing, and message delivery on the SDK.

Idempotent publishing: Pub/Sub messages have a unique ID field which is used to deduplicate messages. When a message arrives in a region – either because it was published by an Ably SDK connected to that region, or replicated from another region – the primary Pub/Sub channel location verifies the message’s uniqueness by checking its ID. This idempotency check is performed against 2 minutes of message history, and ensures against a client accidentally publishing a message twice. The message is persisted at the primary location and checked for uniqueness in a single atomic operation, which guards against a race between checking uniqueness and durably storing the message.

The primary location in each region performs the idempotency check both for messages that are published to it directly, and for messages replicated from another region, so that the SDKs can retry the publish against another region, and that message will still be checked for uniqueness and not delivered to subscribing clients twice.

Message delivery on the SDK: On the subscribing client, the messages are delivered with a series ID. If the client disconnects, they can provide the last-seen series ID when reconnecting with the resume operation. This resume operation allows the client to pick back up from the exact point it was processed in the stream of Pub/Sub messages, ensuring no duplicates or gaps in the message stream. By default, the SDKs will retry failed message publishes.

Message ordering

The order of messages published by a Pub/Sub WebSocket client on a single WebSocket will remain fully ordered. The regional location of the Pub/Sub channel will share those messages in order with all the other regional locations. Those Pub/Sub messages will be delivered in the same relative order to all subscribing clients, regardless of the region that client is connected to.

To make sure that regions in Ably Pub/Sub can operate independently, there’s no guaranteed order between clients connected and publishing to different regions. This is to allow the two clients, connected to different regions, to be able to publish concurrently at high throughputs and low latency without needing to coordinate globally with each other. The messages published by each client in their respective WebSocket connections will always retain the same order relative to other messages on that same WebSocket connection.

In classic distributed system parlance, each client retains their own causal consistency so that Hello -> World never becomes World -> Hello.

So, based on what we know about regional Ably architecture, the message ordering that each region sees is defined by when those messages arrive in that region; either by being replicated from another region, or by a client connected to that region performing a message publish. Once the order of messages is established in a single region, all the clients connected to that region will see those messages in the exact same order, but clients in a different region might have a slightly different regional ordering.

Ably's data integrity: a summary

When using a Pub/Sub product, you want it to “just work”. You don’t want to have to worry about if your subscribing clients are going to receive the message you published, or if the messages are going to keep their published order, or if there’s failure in some data center that Ably is running in. You don’t want to have to care. We’ve spent a long time thinking about failure cases, and designing a system that’s both super fast, but also retains the integrity of the data published on Pub/Sub channels.

This post describes the internals for some of the features we use to ensure that Ably Pub/Sub “just works”.

Scaling Kafka with WebSockets

Carolina Carriazo — Tue, 12 Nov 2024 15:43:29 +0000

Kafka is a highly popular realtime data streaming platform, renowned for handling massive volumes of data with minimal latency. Typical use cases include handling user activity tracking, log aggregation and IoT telemetry. Kafka’s architecture, based on distributed partitions, allows it to scale horizontally across multiple brokers. But although Kafka excels at high data throughput, scaling it to manage thousands of client connections can be costly and complex. Kafka’s architecture is designed for efficient data processing, not for handling high volumes of connections. When Kafka is scaled primarily to support more connections, this can lead to unnecessary resources and scale, ultimately raising costs due to inefficient resource usage.

Many organizations choose to offload client connections to a connection management service to optimize scaling for Kafka. These services can work particularly well for managing high numbers of consumer and producer connections, while optimizing Kafka infrastructure for high data throughput.

This article explores how combining Kafka with WebSockets allows developers to scale Kafka efficiently for throughput by using a WebSocket layer to handle client connections.

We’ll dive deeper into how Kafka and WebSockets complement each other in the next section.

Kafka and WebSockets

Kafka's architectural limitations

Kafka was originally designed for closed internal enterprise environments. This leads to architectural constraints when interfacing with high numbers of external consumers or producers. The architecture becomes especially challenging in scenarios where thousands—or even millions—of concurrent connections are required. Some of those limitations include:

Broker connection limits: Each Kafka broker has a connection cap, typically handling between 2,000 to 5,000 concurrent connections. When demand exceeds this limit, Kafka adds new brokers, which increases complexity and cost.
Lack of elasticity: Kafka's architecture isn’t dynamically elastic, meaning it doesn’t automatically scale up or down in response to fluctuating connection loads. Any scaling adjustments require manual intervention, which can lead to delays and additional operational overhead.
Resource under-utilization: To meet capacity demands when connection limits are reached, new brokers have to be added, even if additional data throughput isn’t needed. This can lead to resource inefficiencies, as brokers are added for capacity rather than throughput.
Operational complexity and latency spikes: Frequently scaling Kafka brokers in or out introduces operational risk, since scaling requires data rebalancing each time a new broker is added or removed. This rebalancing process can temporarily cause latency spikes, impacting the performance of realtime data streams.

Using WebSockets to stream Kafka messages

To make Kafka work in a public-facing context and address some of its limitations, you could offload client connections by using one of many open-source solutions for streaming messages to internet-facing clients, to offload the burden of managing client connections.

Many of these solutions are WebSocket-based. WebSockets are well-suited for streaming Kafka messages in realtime, thanks to several properties:

Realtime bi-directional communication: WebSockets support full-duplex connections, enabling data to flow between client and server in realtime. This is ideal for applications like live chat, gaming, dashboards, and collaborative tools, where realtime data flow is critical.
Single, long-lived connections: Unlike protocols that rely on multiple requests, WebSockets use a single long-lived connection. This approach minimizes the setup and teardown costs associated with high volumes of connections, saving resources and reducing latency, particularly in applications with large numbers of concurrent users.
Independent scaling for data ingestion and delivery: By introducing a tier of WebSocket servers, you decouple the data processing layer (Kafka) from the connectivity layer (WebSocket servers). This means you can scale each layer independently—Kafka can handle data processing, while WebSocket servers focus on efficiently delivering messages to clients.
Complex subscription support: Kafka’s pub/sub model aligns well with WebSocket-based subscription handling. By replicating the pub/sub pattern in the WebSocket tier, you can efficiently route data between Kafka and WebSocket clients, ensuring data is only sent to the right consumers.

While WebSockets provide a great choice for integrating with Kafka, and enable you to take advantage of these properties, you then have the problem of how to scale your WebSocket application tier efficiently. This is potentially far more complex than it first seems, as we still have to consider elasticity, persistence, resilience, availability and latency in our infrastructure. For more information on scaling WebSockets specifically, check out our engineering blog post and video on the topic.

An example

Imagine we have a simple gaming application where player scores are updated on a shared leaderboard. The leaderboard displays the current top ten players. I’ll leave the nature of the game to your imagination, but let’s assume that we have thousands of players, and the competition is strong.

In terms of the architecture required, we’ll need a central service that aggregates user scores and maintains the leaderboard, posting updates at regular intervals.

We’ll use Kafka to process incoming scores from players, and to push leaderboard updates to players. So, the aggregation service will:

consume score messages from the score-ingestion topic,
update player scores,
store the scores in a database, and
calculate a new leaderboard at regular intervals and push to the leaderboard-update topic.

In this example, you could use a scalable WebSocket application tier to distribute leaderboard updates to players and push score updates for aggregation. This WebSocket layer allows players to publish their score messages directly to the score-ingestion topic, and receive score updates by consuming messages from the leaderboard-update topic. Here’s what that looks like:

Now, let’s say our gaming app needs to handle 50,000+ concurrent client connections to stream realtime data globally. If we used Kafka alone to handle both the connections and data throughput, we’d need to scale Kafka brokers well beyond what is required for data processing.

For example, a Kafka cluster might need 25 or more brokers (assuming a limit of 2,000 concurrent connections per broker) just to accommodate the client connections, even though just 5 brokers are sufficient to handle the actual data ingestion and streaming workloads. By offloading the connection management to a WebSocket layer, we can deploy a Kafka cluster with just those 5 brokers to handle the data throughput, while our WebSockets application tier manages the 50k+ connections. This approach would result in significant cost savings, as the infrastructure for Kafka is scaled only for the data it processes, not for the volume of connections.

Scaling with a connection management service

As we discovered earlier, while scaling Kafka is complex, scaling a WebSocket service is itself is also a challenge. Managing connection limits, handling reconnections, and maintaining global low-latency service requires significant engineering expertise and operational overhead. You could instead turn to connection management services that specialize in scaling WebSocket infrastructure. By offloading WebSocket connections to a dedicated service, you can focus on optimizing core data processing (e.g., within Kafka), while the connection management service handles the challenges of scalability, elasticity, and fault tolerance. This approach often proves more cost-effective and efficient than building and maintaining a custom infrastructure.

Here at Ably, we have Kafka integrations that enable you to manage WebSocket connections while optimizing your Kafka clusters for data throughput at the same time. Let’s explore them below.

How Kafka and Ably work together

Ably has two features that enable connectivity between Kafka and clients: the Ably Kafka Connector and the Ably Firehose Kafka rule.

The Ably Kafka Connector provides a ready-made integration between Kafka and Ably. It allows for realtime event distribution from Kafka to web, mobile, and IoT clients over Ably’s feature-rich, multi-protocol pub/sub channels. You can use the Ably Kafka Connector to send data from one or more Kafka topics into Ably Channels.

The connector is built on top of Apache Kafka Connect, and can be run locally with Docker, installed into an instance of Confluent Platform, or attached to an AWS MSK cluster through MSK Connect.

If instead you need to send data from Ably to Kafka, use an Ably Firehose Kafka rule. You can use a Kafka rule to send data such as messages, occupancy, lifecycle and presence events from Ably to Kafka. Setting up a Firehose rule is as simple as completing a form in your Ably dashboard:

Kafka-Ably architecture

The architecture of integrating Kafka and Ably leverages each platform’s strengths: Kafka manages data stream processing and storage, while Ably connects with Kafka by acting as either a publisher or consumer of Kafka topics. This setup allows each system to focus on what it does best, creating a more efficient and scalable solution for handling high volumes of realtime data.

As new messages are produced in Kafka, Ably consumes them through the Ably Kafka Connector, converting them into realtime updates that are then relayed to end-users through WebSockets. Or, alternatively, as new messages are produced in Ably, Ably can send these to Kafka topics configured in a Kafka Firehose rule.

This allows Ably to push realtime data to clients—whether they are mobile apps, web browsers, or IoT devices—without Kafka needing to handle direct client connections.

A note on security

One of the core advantages of Kafka is its ability to be deployed securely in on-premise or private cloud environments, where Kafka brokers are kept behind firewalls and protected from external threats. In these setups, Kafka brokers are not exposed directly to the internet, which enhances security, particularly for enterprises handling sensitive data. However, in scenarios where Kafka needs to serve a global audience, directly exposing brokers to the internet can introduce security risks and complexity.

With Ably acting as an intermediary, Kafka brokers do not need to be exposed to the public internet. Ably's secure edge network handles the global client connections, while Kafka remains securely deployed within the organization’s on-premise infrastructure or private cloud. Ably relays data between Kafka and external clients in realtime, ensuring that Kafka is never exposed to potential cyber threats from public networks. For example, Ably Kafka Firehose rules can connect to Kafka clusters over AWS PrivateLink to achieve private connectivity within AWS.

Getting started with Ably

As we’ve seen, scaling Kafka with WebSockets, while avoiding the pitfalls of under-utilizing Kafka resources, can be a complex task. Each layer requires considerable engineering effort to build out, so if you’re considering a connection management service to offload to, Ably's Kafka solutions are designed to handle precisely these use cases.

Explore the Kafka Connector by reading our docs or by following our Kafka Connector setup tutorial. To get started with Ably more generally, check out our quickstart guide. With the Connector in place, connecting WebSocket clients to Kafka becomes as simple as connecting to Ably.

When and how to load balance WebSockets at scale

Carolina Carriazo — Tue, 29 Oct 2024 14:56:31 +0000

WebSockets increasingly power realtime applications for use cases like chat, live data streaming, and collaborative platforms. However, as your application grows, scaling your infrastructure to support more connections becomes crucial for maintaining a smooth and reliable user experience - and for realtime, a globally consistent user experience is especially important. One way you can do this is by implementing efficient load balancing.

In this article, we explore when you might need to use a load balancer in WebSocket environments, the possible challenges in implementation, and best practices for balancing WebSocket traffic.

When do you need load balancing for WebSocket infrastructure?

Not every WebSocket implementation requires a load balancer. In fact, for smaller applications with minimal traffic and simple, localized use cases, a load balancer might be overkill. For instance, if you’re operating a low-traffic WebSocket server that only handles a handful of concurrent connections or is serving users within a single region, manually scaling your infrastructure may be sufficient.

But if you anticipate significant global traffic, load balancing will probably be crucial to your success in scaling your app. Let’s qualify this: there’s more than one way to scale an app - can’t we also scale vertically, and just increase our resources on a single server without introducing a load balancer?

Unfortunately, that’s probably not going to be a viable approach for WebSockets in particular. Some relational databases can fail over to read-only if the primary machine goes offline, but WebSocket, as a stateful protocol, needs persistent bi-directional communication. So if the one server fails - and it might, especially with a spontaneous spike in traffic - there is no backup (you can read more about this in our article about vertical vs horizontal scaling for WebSockets).

With this in mind, here’s when you should consider load balancing WebSockets over multiple servers:

You have a high volume of concurrent connections: When your application begins to handle thousands or millions of simultaneous WebSocket connections, a single server can quickly become overwhelmed. A load balancer is needed to distribute traffic across multiple servers, preventing performance degradation or downtime.
You are serving geographically distributed users: If your application has users spread across different regions, latency can become a major issue. In this case, load balancing helps route users to the closest server or datacenter, reducing response times and improving the overall user experience.
You need redundancy and high availability: If uninterrupted service is critical, load balancing is necessary to provide redundancy and failover. By distributing traffic across multiple servers or regions, your infrastructure becomes more resilient to server or regional failures, ensuring continued service availability.
You are scaling up: As your infrastructure scales horizontally (adding more servers) or globally (expanding across regions), a load balancer is crucial for managing traffic distribution, preventing any single server from becoming a bottleneck.

For example, imagine you're running a live sports streaming platform that broadcasts realtime updates to millions of fans worldwide. During peak events, like the final match of a major tournament, your platform experiences a surge in WebSocket connections from users across multiple regions. Without load balancing, a single server might struggle to handle the increased traffic, resulting in dropped connections and delayed updates for users. By using a load balancer, you can distribute these connections across multiple globally-distributed servers, ensuring low latency and a seamless experience for all of your users, regardless of where they are.

The challenges of load balancing WebSocket connections

Because WebSocket connections are stateful and long-lived, load balancing them presents a few unique challenges that are not shared with traditional HTTP connections. They include:

Connection state and data synchronization: Once a WebSocket connection is established, traffic between the client and server needs to remain on the same server throughout the session. What happens if one user in a chat goes offline - how do we keep online status and messages current across the network in realtime? In the context of load balancing, persistent open connections to clients, or sticky sessions, are one way of resolving this issue. But sticky sessions also have to be handled carefully, since sticky sessions tie a client to a specific server, and can lead, in some cases, to uneven load distribution and create hotspots in your infrastructure (algorithms like consistent hashing can help with this).
Handling failover and redundancy: In stateless HTTP-based systems, requests can easily be routed to other servers. However, with WebSockets, the connection needs to be re-established with a new server, which can disrupt the communication flow. In the event of a server or regional failure, we of course want traffic to seamlessly failover to a backup server or region without disrupting the user experience. As we saw earlier, implementing that failover requires multiple servers, and load balancing to ensure low latency and redundancy is a complex task. You may need to consider different load balancing algorithms before landing on the best approach for your use case.
Maintaining connection limits: Each WebSocket connection consumes resources on the server, which limits the number of connections a single server can handle. This makes load balancing trickier because it’s not just about evenly distributing traffic but also making sure servers don’t exceed their connection limits. Load balancers need to monitor and intelligently distribute connections while preventing servers from being overloaded.
Keeping global latency low: If you have users across the globe, your load balancer should ensure clients are connected to the server nearest to them, while also managing fallback options when servers or entire regions become unavailable. This becomes increasingly difficult as global traffic grows, potentially leading to high latencies for users far from the datacenter.
Complexity of horizontal scaling: If you continue to scale your WebSocket infrastructure horizontally, each new server added increases by default the complexity of maintaining sticky sessions, managing state, and avoiding connection drops. Without the right balancing strategies, you can quickly run into bottlenecks or imbalances across your infrastructure.

Best practices for load balancing WebSocket traffic

Load balancing best practices are intrinsically tied to horizontal scaling best practices, since load balancers only become an infrastructure consideration when multiple servers are introduced.

Between the two, there is bound to be some overlap. That said, here are some load balancer-specific best practices:

Use sticky sessions strategically

As we mentioned earlier, sticky sessions are often essential for WebSocket connections, as they ensure that clients remain connected to the same server throughout their session. However, solely relying on stickiness can lead to uneven load distribution. Consider using a load balancer that prioritizes reconnecting clients to the same server but allows for redistribution if a server becomes overloaded or fails.

Use load balancers globally

If you’re operating globally and at scale, consider routing traffic based on proximity to a datacenter or server. Regional load balancers will help to distribute the load while minimizing latency as much as possible. Make sure they have a fallback mechanism to reroute traffic, in case one region fails.

Implement automatic reconnection logic

Automatic reconnection logic mitigates disruptions when a stateful WebSocket connection drops due to server failure. The reconnection logic should also detect when a load balancer has rerouted a session to a new server - this ensures data consistency.

Make sure your load balancers have automated failover

If a server goes down, automated failover on your load balancers mean that traffic will be re-routed with minimal, if any, downtime. This is crucial to maintain a seamless experience with WebSocket connections.

Understand your load limits and have a fallback strategy

Run load and stress testing to understand how your system behaves under peak load, and enforce hard limits (for example, maximum number of concurrent WebSocket connections) to have some predictability. And in the event that a WebSocket connection breaks, have a fallback mechanism (for example, Socket.IO has HTTP long polling as a fallback) and reconnection strategy (like the automatic reconnection logic suggested above).

Have a load shedding strategy

When servers are reaching capacity, a load shedding strategy allows you to gracefully degrade service by limiting or dropping low-priority connections to maintain overall performance and prevent system overload. This approach can help avoid sudden, large-scale disruptions, even under high load.

Solutions for load balancing strategy

WebSockets have successfully powered rich realtime experiences across apps, browsers, and devices. Scaling out WebSocket infrastructure horizontally comes with load balancing, which, as we’ve seen, is a complex and delicate undertaking.

Ultimately, horizontal scaling is the only sustainable way to support WebSocket connections at scale. But the additional workload of maintaining such infrastructure on one’s own can pull focus away from your core product.

If your organization has the resources and expertise, you can attempt a self-build; but we generally don’t recommend this. Managed WebSocket solutions will allow you to offload the complexities of load balancing and horizontal scaling, as they often have globally-distributed networks with scaled infrastructure already taken into account.

Ably takes care of the load balancing for you by offering a globally-distributed platform designed for scaling realtime applications. With Ably, you can avoid the complexities of managing WebSocket infrastructure in-house and ensure reliable, fault-tolerant communication at scale. Some of our features include:

Predictable performance: A low-latency and high-throughput global edge network, with median latencies of <50ms.
Guaranteed ordering & delivery: Messages are delivered in order and exactly once, with automatic reconnections.
Fault-tolerant infrastructure: Redundancy at regional and global levels with 99.999% uptime SLAs. 99.999999% (8x9s) message availability and survivability, even with datacenter failures.
High scalability & availability: Built and battle-tested to handle millions of concurrent connections at scale.
Optimized build times and costs: Deployments typically see a 21x lower cost and upwards of $1M saved in the first year.

WebSocket reliability in realtime infrastructure

Carolina Carriazo — Thu, 17 Oct 2024 14:44:07 +0000

WebSocket, as a low-latency, bidirectional communications protocol, has become a mainstay of the modern realtime landscape. Developers turn to WebSocket to power chat, live experiences, fan engagement, and countless other realtime use cases at scale. But how reliable is WebSocket in supporting those experiences?

In this article, we explore what we mean when we refer to reliability within a realtime WebSocket infrastructure, how to ensure WebSocket reliability at scale, and what you need to build yourself (vs. what you get out-of-the-box with a realtime platform provider).

With all of this in mind, we’ll help you determine the best WebSocket implementation for your use case.

What do we mean by reliability?

When we talk about reliability in the context of WebSockets, we're referring to a system's ability to deliver data consistently over time, as expected, and without interruptions. This should be true even if an individual component in the infrastructure fails.

Availability - uptime - is naturally tied to reliability, but reliability also requires additional mechanisms. More specifically, in the context of WebSocket-powered realtime experiences, there are a few distinct concepts around reliability to unpack:

Failure recovery: A realtime system needs to be able to continue operating should a single component fail. In practice, strategies like message acknowledgments, retries, and message persistence can overcome such failures. For example, a chat user should be able to receive messages even if the server in their local region unexpectedly goes offline.
Redundancy and global distribution: Redundancy refers to exceeding the required capacity to continue service. Multiple databases that retain the same data set ensure data redundancy. A globally-distributed network is often an important element of this, since if a local database region goes down, we want to be able to keep the whole system running regardless.
Data integrity: Integrity refers to the actual accuracy of the data delivered. Guaranteed message delivery (and to the right recipients), correct message ordering, and delivery semantics (for example, exactly-once or at-least-once delivery) are aspects of data integrity. In fact, it’s a benchmark of a dependable realtime service unto itself, but is deeply connected to reliability.

When a realtime system can remain available and reliable even when multiple components fail, it is considered to be fault-tolerant. For a deeper dive into fault-tolerant architecture in distributed systems, we recommend this article from our CTO, Paddy Byers.

What reliability do WebSockets provide on their own?

While WebSockets provide a good foundation for bidirectional communication, they don’t inherently come with any functionality to ensure reliable service:

No delivery guarantees or message ordering

WebSockets themselves don’t guarantee message delivery, or that a message will be delivered exactly once. Messages can get lost if the connection drops unexpectedly, or if your application doesn’t have built-in mechanisms to handle failed deliveries. Without these guarantees, You may experience incomplete or misordered message delivery. To avoid message loss, you will have to build retries, acknowledgments, and message persistence yourself, or use a library of platform-as-a-service (PaaS) that adds these guarantees.

No automatic reconnections

If a WebSocket connection drops, it doesn’t automatically reconnect. To keep your connection alive, you need to build custom reconnection logic to detect dropped connections and re-establish them. Again, this requires custom handling, or a PaaS.

Complex to scale

WebSocket is stateful, so horizontally scaling WebSocket to multiple servers means adding to the complexity of your infrastructure. This can compromise global distribution and redundancy if built out incorrectly.

Building a reliable WebSocket infrastructure yourself

Given that reliability involves failure recovery, data integrity and global redundancy, building a reliable WebSocket infrastructure essentially means supplying the reliability functionality that WebSockets can’t provide on their own. We’ll now cover the details of what building these components entails, and the time and cost implications of undertaking this yourself.

Making the infrastructure reliable

In the last section, we mentioned delivery guarantees, message ordering, automatic connections, and flexible horizontal scaling. These are all things you need to build out yourself if you’re creating infrastructure from scratch. Here are more specifics on what this involves:

Message delivery guarantees: You need to build custom mechanisms for message acknowledgments, retries, and persistence. This prevents message loss if a connection drops or an unexpected failure occurs.
Reconnection logic: Reliable infrastructure requires reconnection logic that can detect and manage dropped connections, so that users can resume sessions without interruption. You’ll need to build out state management to keep track of connection status and sync lost messages once reconnected.
Horizontal scaling with sticky sessions: To scale WebSocket connections horizontally, you’ll need to configure load balancers and enable sticky sessions so that each client is always routed to the same server. It’s important to properly manage this, since failures could lead to increased latencies and poor user experiences.
Data replication and redundancy: Reliable WebSocket infrastructure requires data replication across multiple servers and regions. This goes hand in hand with scaling - achieving this requires a globally-distributed network with redundant systems that can automatically route traffic to an available server if one fails.

In addition to all of this, you’ll need to consider building out auxiliary systems to support these major reliability functions, like monitoring and alerting in case something goes wrong.

The cost of a self build

With all that in mind, the financial and time cost of actually building and maintaining reliable WebSocket infrastructure at scale is significant, to the point where we’ve written a whole research report on this. Engineers planning a self build often underestimate the resources required to deliver WebSocket infrastructure, leading to project delays and cost overruns:

Time and resources: Over 70% of companies reported that building WebSocket infrastructure required more than 3 months of engineering time.
High infrastructure costs: The operating cost of building globally distributed infrastructure can be unpredictable and substantial. Those surveyed in our report quoted between $100k-200k in annual costs.
Global scaling issues: As mentioned, WebSockets (and their libraries) are hard to scale. Socket.IO, for instance, is limited to a single datacenter or region. Without a fallback mechanism, this means that if the region goes offline, the entire messaging system goes down. High latencies for users located further away from the datacenter can also degrade the user experience.

Even after considerable time and resources have been invested, self-built infrastructure often fails to meet evolving business needs or user expectations over time. A custom-built solution is typically designed for a specific use case, making it difficult to adapt or innovate as requirements shift. In contrast, managed platforms offer composable realtime, allowing you to easily scale and extend your use case without being constrained by your self build’s parameters.

An alternative to building it yourself: WebSocket services

If you are looking to build out realtime infrastructure, reliability and integrity are non-negotiable to the quality of your service. However, as we’ve covered, building this level of dependability in-house is resource-intensive and risky.

Reliability at scale is especially fraught with challenges, since building out global replication increases infrastructural complexity exponentially, and has high cost implications.

By using a realtime PaaS, you can avoid the challenges of managing WebSocket infrastructure yourself, and allow your team to focus on what matters most - a great user experience, and product innovation.

These managed platforms eliminate the need to invest heavily in building and maintaining WebSocket infrastructure in-house. There are plenty available, but naturally, we are most familiar with Ably. Our Four Pillars of Dependability (performance, integrity, reliability and availability) guide everything we do, and have enabled us to provide a WebSocket platform with:

Predictable performance: A low-latency and high-throughout global edge network, with median latencies of <50ms.
Guaranteed ordering & delivery: Messages are delivered in order and exactly once, even after disconnections.
Fault-tolerant infrastructure: Redundancy at regional and global levels with 99.999% uptime SLAs. 99.999999% (8x9s) message availability and survivability, even with datacenter failures.
High scalability & availability: Built and battle-tested to handle millions of concurrent connections at scale.
Optimized build times and costs: Deployments typically see a 21x lower cost and upwards of $1M saved in the first year.