DEV Community

Cover image for Scalability in Data-Intensive applications - Fan-Out, Throughput, Twitter problem, Percentile
Oleksandr Kashytskyi
Oleksandr Kashytskyi

Posted on

1 1 1

Scalability in Data-Intensive applications - Fan-Out, Throughput, Twitter problem, Percentile

Introduction

As applications grow, they need to handle more users, more data, and more requests efficiently. Scalability is a term used to describe a system's ability to cope with an increasing load. But how do we ensure that a system scales well? Let's explore some key concepts.

Identifying Bottlenecks

To scale a system effectively, it's essential to analyze its load parameters. Different systems have different constraints, and finding bottlenecks helps in optimizing performance. Here are several key factors to consider:

  • Fan-out: It's a term which describes the number of requests a service or endpoint makes to other services in order to serve a single incoming call. A high fan-out can lead to increased latency and system overload.

  • Throughput: In batch processing systems like Hadoop, the focus is on records processed per second rather than individual response times.

  • Response Time Distribution: Measuring response time is not just about average values but understanding the distribution of values.

The Twitter Problem

A classic example of scalability challenges is Twitter's timeline system. There are two endpoints, one to create a new post and another to fetch newest 20 posts.

A naive approach would be to query the database every time a user requests their home timeline. This results in expensive read operations and high latency.

Instead, Twitter solves this problem by maintaining a cache for each user's homepage. As cache memory is very expensive, there can't be stored all data needed for response, but it's possible to easily store IDs of last 20 posts in the timeline. This approach increases write complexity (It will be necessary to maintain both posts in a database and cache storage), but significantly improves GET request performance.

Measuring Response Time Effectively

Response time can vary significantly depending on system load. One of the best ways to analyze it is through percentiles, rather than averages.

99.9th percentile is often used to track performance (Some companies, like AWS AWS for example use 99.99th percentile). The reasoning behind this is that the top 0.1% of users are usually the most valuable customers, often transferring the most data or making the most critical requests.

Example: Sentry Monitoring: In observability tools like Sentry, response time percentiles help identify slowest transactions affecting real users, allowing engineers to optimize performance accordingly.

Conclusion

Scaling a system is not just about handling more traffic but ensuring efficiency, and optimal resource allocation.

Scalability is essential for handling growing demands in data-intensive applications. Identifying bottlenecks like fan-out, throughput, and response time distribution helps optimize performance. Finally, using percentiles instead of averages ensures a more accurate measure of system performance, helping engineers focus on critical optimizations.

Image of Datadog

Create and maintain end-to-end frontend tests

Learn best practices on creating frontend tests, testing on-premise apps, integrating tests into your CI/CD pipeline, and using Datadog’s testing tunnel.

Download The Guide

Top comments (0)

Image of Timescale

PostgreSQL for Agentic AI — Build Autonomous Apps on One Stack 1️⃣

pgai turns PostgreSQL into an AI-native database for building RAG pipelines and intelligent agents. Run vector search, embeddings, and LLMs—all in SQL

Build Today

👋 Kindness is contagious

Explore a trove of insights in this engaging article, celebrated within our welcoming DEV Community. Developers from every background are invited to join and enhance our shared wisdom.

A genuine "thank you" can truly uplift someone’s day. Feel free to express your gratitude in the comments below!

On DEV, our collective exchange of knowledge lightens the road ahead and strengthens our community bonds. Found something valuable here? A small thank you to the author can make a big difference.

Okay