Forem: Donia Shaban

Rate Limiting in ASP.NET Core

Donia Shaban — Sat, 23 May 2026 05:59:50 +0000

What is Rate Limiting?

Rate limiting controls how many requests a client can make to your API within a specific time window. ASP.NET Core 7+ ships with built-in rate limiting middleware, so you don't need any third-party packages. It protects your API from abuse (DoS attacks), ensures fair usage among clients, controls infrastructure costs, and keeps the service stable under load.

Setup

using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

These two namespaces are essential. System.Threading.RateLimiting contains the core algorithms and options classes. Microsoft.AspNetCore.RateLimiting contains the middleware and the [EnableRateLimiting] / [DisableRateLimiting] attributes.

Registering the Middleware

builder.Services.AddRateLimiter(options => { ... });

This registers the rate limiter service with the DI container. Inside the lambda you define one or more policies — each policy is a named rule that you can apply to endpoints later.

app.UseRateLimiter();

This plugs the middleware into the pipeline. It must be called before app.MapControllers() so it intercepts requests before they reach your controllers.

Policy 1 — Fixed Window Limiter (`"DefaultPolicy"`)

options.AddFixedWindowLimiter("DefaultPolicy", limiterOptions =>
{
    limiterOptions.Window = TimeSpan.FromMinutes(1);
    limiterOptions.PermitLimit = 100;
    limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    limiterOptions.QueueLimit = 10;
});

The idea: Time is divided into fixed, non-overlapping windows (e.g. 0:00–1:00, 1:00–2:00, ...). Each window gets a fresh counter. The moment the window ends, the counter resets completely regardless of when requests arrived inside it.

Line by line:

Window = TimeSpan.FromMinutes(1) — each time window lasts 1 minute. When the minute ends, the counter resets to zero.
PermitLimit = 100 — at most 100 requests are allowed within each 1-minute window. Request #101 gets rejected with HTTP 429 Too Many Requests.
QueueProcessingOrder = QueueProcessingOrder.OldestFirst — when the limit is hit, excess requests can be queued. This says: serve the oldest waiting request first (FIFO). The alternative is NewestFirst (LIFO).
QueueLimit = 10 — at most 10 requests can wait in the queue. If the queue is also full, the request is immediately rejected without waiting.

The problem with Fixed Window: If 100 requests arrive in the last 5 seconds of minute 1, and another 100 arrive in the first 5 seconds of minute 2, you get 200 requests in a 10-second span — a burst — because both windows reset independently. This is why Sliding Window exists.

Policy 2 — Sliding Window Limiter (`"SlidingWindow"`)

options.AddSlidingWindowLimiter("SlidingWindow", limiterOptions =>
{
    limiterOptions.Window = TimeSpan.FromMinutes(1);
    limiterOptions.PermitLimit = 100;
    limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    limiterOptions.QueueLimit = 10;
    limiterOptions.SegmentsPerWindow = 6;
    limiterOptions.AutoReplenishment = true;
});

The idea: Instead of one big window, the window is split into smaller segments. As each segment expires, the requests that were counted in that segment become available again. The window "slides" forward segment by segment, so limits are enforced more smoothly and bursts at window boundaries are prevented.

Line by line:

Window = TimeSpan.FromMinutes(1) — the total window is still 1 minute.
PermitLimit = 100 — still 100 requests per window overall.
SegmentsPerWindow = 6 — the 1-minute window is split into 6 segments of 10 seconds each. Every 10 seconds, the oldest segment "falls off" and its request count is freed up.
AutoReplenishment = true — the replenishment (freeing up expired segments) happens automatically in the background. If you set this to false, you'd have to call TryReplenish() manually, which is rare.
QueueProcessingOrder and QueueLimit — same meaning as Fixed Window above.

Concrete example: At second 0 you send 100 requests. At second 10, segment 1 expires, freeing 10 slots (100/6 ≈ 16 per segment, but proportionally). The counter decrements gradually rather than resetting all at once — much fairer and burst-resistant.

Policy 3 — Concurrency Limiter (`"Concurrency"`)

options.AddConcurrencyLimiter("Concurrency", limiterOptions =>
{
    limiterOptions.PermitLimit = 50;
    limiterOptions.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    limiterOptions.QueueLimit = 100;
});

The idea: This doesn't care about time windows at all. It limits how many requests are being processed simultaneously at any given moment. Think of it like a semaphore — a permit is acquired when a request enters and released when it finishes.

Line by line:

PermitLimit = 50 — at most 50 requests can be actively running at the same time. If a 51st request comes in while all 50 slots are busy, it either queues or gets rejected.
QueueLimit = 100 — up to 100 requests can wait in line for a slot to free up.
QueueProcessingOrder = QueueProcessingOrder.OldestFirst — the oldest queued request gets the next freed slot.

When to use it: Ideal for protecting expensive operations like DB-heavy endpoints or file processing, where you care about CPU/connection pool exhaustion rather than request rate over time.

Policy 4 — Per-User Policy (`"ApiUserPolicy"`)

options.AddPolicy("ApiUserPolicy", httpContext =>
    RateLimitPartition.GetFixedWindowLimiter(
        partitionKey: httpContext.User.Identity?.Name ?? "anonymous",
        factory: _ => new FixedWindowRateLimiterOptions
        {
            Window = TimeSpan.FromMinutes(1),
            PermitLimit = 1000,
            AutoReplenishment = true
        }));

The idea: This is a partitioned policy — meaning each user gets their own independent rate limit counter. User A's requests don't affect User B's quota. This is how real-world APIs (like GitHub's API) work: each authenticated user has their own limit.

Line by line:

RateLimitPartition.GetFixedWindowLimiter(...) — creates a Fixed Window limiter, but partitioned per key rather than global.
partitionKey: httpContext.User.Identity?.Name ?? "anonymous" — the partition key is the authenticated username. If the user is not authenticated, all unauthenticated users share the key "anonymous" (meaning they share one limit together — a deliberate design to pressure them into authenticating).
PermitLimit = 1000 — authenticated users get a generous limit of 1000 requests/minute, suitable for API consumers with tokens.
AutoReplenishment = true — the window resets automatically without manual intervention.

Policy 5 — Per-IP Policy (`"IpPolicy"`)

csharp

options.AddPolicy("IpPolicy", httpContext =>
    RateLimitPartition.GetSlidingWindowLimiter(
        partitionKey: httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown",
        factory: _ => new SlidingWindowRateLimiterOptions
        {
            Window = TimeSpan.FromMinutes(1),
            PermitLimit = 100,
            SegmentsPerWindow = 6,
            AutoReplenishment = true
        }));

The idea: Same partitioned concept as above, but the partition key is the client's IP address instead of the username. This is useful for public endpoints where users aren't authenticated — you limit by IP to prevent one machine from hammering your API.

Line by line:

partitionKey: httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown" — extracts the caller's IP address as a string (e.g. "192.168.1.5"). If somehow the IP can't be determined, falls back to "unknown".
Uses a Sliding Window internally (same as Policy 2) — smoother enforcement, no burst problem at boundaries.
PermitLimit = 100 with SegmentsPerWindow = 6 — each IP gets 100 requests/minute, enforced per 10-second segments.

Applying Policies to Endpoints

There are two ways:

Via attribute on a Controller action:

[HttpGet]
[EnableRateLimiting(policyName: "DefaultPolicy")]
public async Task<IActionResult> Get(...) { ... }

The [EnableRateLimiting] attribute on the Get action tells the middleware: apply "DefaultPolicy" to this specific endpoint. Other actions in the same controller (like GetById, Post, Put, Delete) have no attribute, so they are not rate-limited by default.

Via Minimal API:

app.MapGet("/api/products-mn", async (...) =>
{
    ...
}).RequireRateLimiting("DefaultPolicy");

For Minimal APIs, you chain .RequireRateLimiting("policyName") on the endpoint definition. The second Minimal API endpoint (/api/products-mn/{productId:int}) has no .RequireRateLimiting() call, so it's unrestricted.

What Happens When the Limit is Exceeded?

When a request is rejected (limit hit + queue full), the middleware automatically returns HTTP 429 Too Many Requests. You can customize the rejection response globally using options.OnRejected:

options.OnRejected = async (context, cancellationToken) =>
{
    context.HttpContext.Response.StatusCode = 429;
    await context.HttpContext.Response.WriteAsync("Too many requests. Please slow down.");
};

Summary of All Policies

Policy	Algorithm	Limit	Partition By
DefaultPolicy	Fixed Window	100 req/min	Global
SlidingWindow	Sliding Window	100 req/min	Global
Concurrency	Concurrency	50 simultaneous	Global
ApiUserPolicy	Fixed Window	1000 req/min	Per username
IpPolicy	Sliding Window	100 req/min	Per IP address

Caching in ASP.NET Core

Donia Shaban — Wed, 20 May 2026 03:35:57 +0000

Caching in ASP.NET Core — A Complete Guide

Caching is the single most impactful performance optimization you can apply to a web app. The idea is simple: instead of recomputing or re-fetching the same data on every request, you store it somewhere fast and serve it from there. ASP.NET Core gives you three distinct caching mechanisms, each operating at a different layer of your app. Let's break each one down.

Why do we actually use caching?

A 100ms delay causes 7% fewer conversions — meaning if your checkout page takes even one tenth of a second longer, you statistically lose customers. Amazon calculated this years ago and it still holds. A 3-second load time loses 40% of users entirely — they leave before the page finishes.

The reason caching is the go-to fix is the bottleneck breakdown the PDF shows: 60% of slowness comes from the database, 25% from slow API calls, and only 15% from memory/other issues. Caching directly attacks the biggest problem — the DB. Instead of hitting SQL Server 1000 times for the same product data, you hit it once and serve the cached result 999 times. That's where the "80–90% DB load reduction" figure comes from.

The Netflix example (70% startup time reduction) is real — they heavily cache user profiles, recommendation lists, and content metadata in Redis so that when you open the app, almost nothing needs a live DB query.

Where do the actual problems come from?

Database (60% of problems) — This is the classic N+1 query problem, missing indexes, fetching entire tables when you need 3 rows, and hitting the DB on every single request for data that barely changes (like a list of product categories). Caching fixes this directly.

APIs (25% of problems) — Calling external services (payment gateways, weather APIs, third-party data) on every request. If you call an exchange rate API on every page load, you're adding 200–500ms of network latency every time. Cache that response for 5 minutes and the latency disappears.

Memory (15% of problems) — This is actually caused by bad caching, not a lack of it. When you cache without expiration policies or cache huge objects carelessly, you put pressure on the GC (Garbage Collector). The server starts spending CPU time collecting memory instead of serving requests. This is why the "set expiration policies" rule matters so much — cache is not free RAM, it's borrowed RAM.

The performance metrics from the PDF are also worth internalizing for your own apps: target average response time under 200ms, keep CPU below 70%, memory below 80%, and HTTP 5xx errors below 0.1%. Those are the numbers you'd put on a production monitoring dashboard.

1. In-Memory Cache (`IMemoryCache`)

What it is: Data is stored directly in the server process's RAM. It's the fastest cache available — a dictionary lookup with no network round-trip.

Where it lives: Inside your application process. If you restart the server, the cache is gone. If you have multiple servers behind a load balancer, each server has its own isolated cache.

Best for: Single-server apps, frequently read but rarely changed data (e.g., lookup tables, config values, user roles).

How to use it:

// Program.cs
builder.Services.AddMemoryCache();

// In your service
public class ProductService(IMemoryCache cache, AppDbContext db)
{
    public async Task<Product?> GetProductAsync(int id)
    {
        if (cache.TryGetValue($"product:{id}", out Product? cached))
            return cached;

        var product = await db.Products.FindAsync(id);

        var options = new MemoryCacheEntryOptions()
            .SetAbsoluteExpiration(TimeSpan.FromMinutes(10))
            .SetSlidingExpiration(TimeSpan.FromMinutes(2));

        cache.Set($"product:{id}", product, options);
        return product;
    }
}

Key concepts to know:

Absolute expiration — the item is always removed after X time, no matter how much it's accessed.
Sliding expiration — the timer resets every time the item is accessed; it's evicted only if nobody touches it for X time.
Eviction policies — when RAM gets tight, ASP.NET Core uses LRU (Least Recently Used) to drop items. You can also set CacheItemPriority to protect critical entries.
Cache Stampede — if 100 requests arrive simultaneously on a cache miss, they all hit the DB at once. Use GetOrCreateAsync or a SemaphoreSlim lock to handle this safely.

2. Distributed Cache (`IDistributedCache`)

What it is: Data is stored in an external shared store — Redis being the most common choice — that all your servers can reach. When you scale out to 3 servers, they all read from and write to the same cache.
Where it lives: Outside your app process, usually Redis or SQL Server.

Best for: Multi-server deployments, session data, anything that must be consistent across servers, large-scale apps.

How to use it (with Redis):

// Program.cs
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = "localhost:6379";
    options.InstanceName = "MyApp:";
});

// In your service
public class ProductService(IDistributedCache cache, AppDbContext db)
{
    public async Task<Product?> GetProductAsync(int id)
    {
        var key = $"product:{id}";
        var cached = await cache.GetStringAsync(key);

        if (cached is not null)
            return JsonSerializer.Deserialize<Product>(cached);

        var product = await db.Products.FindAsync(id);

        var options = new DistributedCacheEntryOptions()
            .SetAbsoluteExpirationRelativeToNow(TimeSpan.FromMinutes(10));

        await cache.SetStringAsync(key, JsonSerializer.Serialize(product), options);
        return product;
    }
}

Notice that unlike IMemoryCache, the distributed cache stores bytes/strings, so you serialize/deserialize yourself. This is the price of network storage — but the payoff is consistency across all your servers.

3. Output Caching & Response Caching

These two are often confused. They both cache the HTTP response, but they work at different layers.

Response Caching — sets HTTP cache headers (Cache-Control, Expires) that tell the client or proxy to cache the response. The server itself doesn't store anything; it just instructs whoever's asking.

Output Caching (ASP.NET Core 7+) — the server caches the full HTTP response and serves it directly on repeat requests, before your controller action even executes. No business logic, no DB query.

Output Caching setup:

// Program.cs
builder.Services.AddOutputCache(options =>
{
    options.AddBasePolicy(b => b.Cache());
    options.AddPolicy("Products", b => b.Cache()
        .Expire(TimeSpan.FromMinutes(5))
        .Tag("products-tag"));
});

app.UseOutputCache();

// In your controller
[HttpGet("products")]
[OutputCache(PolicyName = "Products")]
public async Task<IActionResult> GetProducts() { ... }

// Invalidation: purge a specific tag
await _cache.EvictByTagAsync("products-tag", token);

Response Caching setup:

// Program.cs
builder.Services.AddResponseCaching();
app.UseResponseCaching();

// In your controller
[HttpGet("products")]
[ResponseCache(Duration = 300, VaryByQueryKeys = new[] { "category" })]
public async Task<IActionResult> GetProducts(string category) { ... }

The ResponseCache attribute tells the framework to emit Cache-Control: public, max-age=300 headers. The browser or CDN does the actual caching.

Quick Comparison

Feature	In-Memory	Distributed	Output Cache	Response Cache
Where stored	Server RAM	Redis / SQL Server	Server (middleware)	Client / Proxy
Survives restart?	No	Yes	No (unless backed by Redis)	Yes (in browser)
Multi-server safe?	No	Yes	Yes	N/A
Caches	Any object	Bytes / strings	Full HTTP response	Full HTTP response
Granularity	Fine (per key)	Fine (per key)	Per endpoint/route	Per URL
Best for	Fast lookups, small data	Sessions, scaled apps	Read-heavy API endpoints	Public, static-ish content

Forem: Donia Shaban

Rate Limiting in ASP.NET Core

What is Rate Limiting?

Setup

Registering the Middleware

Policy 1 — Fixed Window Limiter ("DefaultPolicy")

Policy 2 — Sliding Window Limiter ("SlidingWindow")

Policy 3 — Concurrency Limiter ("Concurrency")

Policy 4 — Per-User Policy ("ApiUserPolicy")

Policy 5 — Per-IP Policy ("IpPolicy")

Applying Policies to Endpoints

What Happens When the Limit is Exceeded?

Summary of All Policies

Caching in ASP.NET Core

Caching in ASP.NET Core — A Complete Guide

Why do we actually use caching?

Where do the actual problems come from?

1. In-Memory Cache (IMemoryCache)

2. Distributed Cache (IDistributedCache)

3. Output Caching & Response Caching

Quick Comparison

Policy 1 — Fixed Window Limiter (`"DefaultPolicy"`)

Policy 2 — Sliding Window Limiter (`"SlidingWindow"`)

Policy 3 — Concurrency Limiter (`"Concurrency"`)

Policy 4 — Per-User Policy (`"ApiUserPolicy"`)

Policy 5 — Per-IP Policy (`"IpPolicy"`)

1. In-Memory Cache (`IMemoryCache`)

2. Distributed Cache (`IDistributedCache`)