Designing a Centralized Rate Limiter for Java Microservices — The Why, The How, and The Lessons

Krithika Subramaniyan — Fri, 17 Oct 2025 20:20:07 +0000

When you work with distributed systems long enough, you start to realize that the hardest problems aren’t just about scaling up — they’re about staying consistent while scaling.

A few months ago, I realized that off-the-shelf API rate limiting packages wouldn’t meet our needs.

Most packages enforce limits per service, which means if you want to allow 100 requests per second across the entire system and you have 4 services, you’d have to split the limit arbitrarily — 25 requests per second per service. This doesn’t account for uneven traffic or dynamic usage, and it quickly becomes hard to maintain as the system scales.

To address this, I designed a centralized rate limiting mechanism that allows each service to enforce limits locally while still respecting a global threshold, ensuring fairness and predictable behavior across the distributed system.

The Problem

In a distributed environment, rate limiting isn’t just a performance safeguard — it’s a fairness mechanism.
It prevents one service from monopolizing shared resources, protects downstream APIs, and helps maintain predictable latency.

But implementing it independently in every microservice led to:

Different algorithms (fixed window, token bucket, sliding window)
Hardcoded thresholds scattered across code
No consistent behavior across services

Our goal was to build a reliable, consistent mechanism that works across all services for a team, while allowing teams to maintain independent limits.

The Design Journey

The first question was whether to implement a central throttling service or a distributed enforcement mechanism.

Option 1: Central Service

A separate rate-limiting service (e.g., a REST endpoint) seemed appealing at first. It could manage rules dynamically and give real-time control.

But after some design exploration, I realized:

Every microservice would have to make a network call to this service.
Under heavy traffic, that meant hundreds of thousands of extra calls per second — creating new latency and potential points of failure.
Ironically, the rate limiter itself could become the bottleneck.

Option 2: Distributed Enforcement with Shared Logic

Instead, I implemented the logic so that all services for a given team access a shared Redis cluster containing the global limit and counters.

Each service reads the Redis configuration from its local service config.
The library updates counters atomically in Redis to enforce the global team limit.
Different teams can maintain separate limits and separate Redis clusters, so one team’s traffic doesn’t affect another team.

This approach ensures consistent throttling within a team while keeping the system distributed and resilient. The shared library makes integration straightforward, but the primary goal remains enforcing global limits reliably in a distributed system.

Architecture Overview

At a high level, the design looks like this:

Team A Redis Cluster (100 req/sec)
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Service A1    │   │ Service A2    │   │ Service A3    │
│ RateLimiter   │   │ RateLimiter   │   │ RateLimiter   │
└───────────────┘   └───────────────┘   └───────────────┘

Team B Redis Cluster (200 req/sec)
┌───────────────┐   ┌───────────────┐
│ Service B1    │   │ Service B2    │
│ RateLimiter   │   │ RateLimiter   │
└───────────────┘   └───────────────┘

Each service embeds the rate limiter library, which:

Connects to Redis to store and retrieve counters.
Applies per-client or per-API rate limits.
Throws a TooManyRequestsException when thresholds are exceeded.

This keeps throttling logic centralized in behavior, but distributed in execution.

Implementation Snapshot

Tech stack

Java 17
Spring Boot 3
Redis for distributed counter management

1️⃣ Configuration

Each microservice adds the library in its pom.xml:

<dependency>
    <groupId>com.example</groupId>
    <artifactId>rate-limiter-library</artifactId>
    <version>1.0.0</version>
</dependency>

And configures limits in application.yml:

rate-limiter:
  redis:
    host: redis-host
    port: 6379
  limit:
    requests-per-second: 100

2️⃣ Usage

if (!rateLimiter.isAllowed("serviceA-client")) {
    throw new TooManyRequestsException("Rate limit exceeded");
}

3️⃣ Core Logic (Simplified)

public boolean isAllowed(String clientId) {
    String key = "rate:" + clientId;
    long count = redisTemplate.opsForValue().increment(key);

    if (count == 1) {
        redisTemplate.expire(key, 1, TimeUnit.SECONDS);
    }

    return count <= limitPerSecond;
}

This uses Redis atomic operations to maintain distributed counters safely across services.

The Impact

Once packaged and documented, adoption was fast. The library was quickly embraced across the microservices ecosystem.

The results:

Consistent throttling behavior across all services
Zero downtime during configuration changes
Reduced rate-related downstream errors by ~70%

Most importantly, developers no longer had to think about rate limiting — it just worked.

Lessons Learned

1. Make adoption effortless.
Developers love tools that “just work.” Providing a simple interface and YAML-based configuration made integration straightforward and low-effort.

2. Centralization ≠ Single Point of Failure.
True centralization is about shared logic, not shared runtime. The library model balanced both autonomy and consistency.

3. Design for change.
Rate limits evolve as usage grows. Externalizing configuration meant teams could adapt without code changes or redeployments.

4. Reliability matters
Even small inconsistencies can cascade in distributed systems; focus on atomic operations and predictable behavior.

Final Thoughts

Building this rate limiter taught me that good architecture is as much about empathy as it is about engineering.
The goal wasn’t just to stop traffic — it was to help dozens of teams build faster, safer systems without worrying about the plumbing.

Rate limiting is more than an implementation detail.
It’s a quiet, invisible layer of reliability — one that keeps the whole system graceful under pressure.

Forem: Krithika Subramaniyan

[Boost]