Forem: Ilya Masliev

Building a Resilience Engine in Python: Internals of LimitPal (Part 2)

Ilya Masliev — Fri, 06 Feb 2026 11:26:06 +0000

How the executor pipeline, clock abstraction, and circuit breaker architecture actually work.

If you haven’t read Part 1, the short version:

Resilience shouldn’t be a pile of decorators.
It should be a system.

Part 1 explained the motivation.

This post is about how the system is built.

The core design constraint

I started with one rule:

Every resilience feature must compose cleanly with others.

Most libraries solve a single concern well.

But composition is where systems break.

Retry + rate limiting + circuit breaker is not additive.
It’s architectural.

So LimitPal is built around one idea:

👉 A single execution pipeline

Everything plugs into it.

The executor pipeline

Every call flows through the same stages:

Circuit breaker → Rate limiter → Retry loop → Result recording

Not arbitrary order.

This ordering is deliberate.

Step 1: Circuit breaker first

Fail fast.

If the upstream service is already down,
don’t waste tokens,
don’t trigger retries,
don’t create load.

This protects your own system.

Step 2: Rate limiter

Only after we know execution is allowed
do we consume capacity.

This ensures:

breaker failures don’t eat quota
retries still respect rate limits
burst behavior stays predictable

Step 3: Retry loop

Retry lives inside the limiter window.

Not outside.

This is important.

If retry lived outside,
one logical call could consume infinite capacity.

Inside the window:

A call is a budgeted operation.

That constraint keeps systems stable under stress.

Step 4: Result recording

Success/failure feedback feeds the breaker.

This closes the loop.

The executor isn’t just running code —
it’s adapting to system health.

Why decorators fail here

Decorators look composable.

They aren’t.

Each decorator:

owns its own time model
owns its own retry logic
owns its own failure semantics

Stack them and you get:

emergent behavior you didn’t design

The executor forces:

a shared clock
a shared failure model
a shared execution lifecycle

That’s what makes the system predictable.

The clock abstraction (the hidden hero)

Time is the hardest dependency in resilience systems.

Retries depend on time.
Rate limiting depends on time.
Circuit breakers depend on time.

If every component calls time.time() directly:

You lose control.

LimitPal introduces a pluggable clock:

class Clock(Protocol):
    def now(self) -> float: ...
    def sleep(self, seconds: float) -> None: ...
    async def sleep_async(self, seconds: float) -> None: ...

Everything uses this.

Not system time.

Production clock

Uses monotonic time:

immune to system clock jumps
safe under NTP sync
stable under container migrations

MockClock

Tests become deterministic:

clock.advance(5.0)

No waiting.
No flakiness.
No race conditions.

You can simulate minutes of retry behavior instantly.

This isn’t a testing trick.

It’s architectural control over time.

Circuit breaker architecture

The breaker is a state machine:

CLOSED → OPEN → HALF_OPEN → CLOSED

But the tricky part isn’t the states.

It’s transition discipline.

CLOSED

Normal operation.

Failures increment a counter.
Success resets it.

When threshold reached → OPEN.

OPEN

All calls fail immediately.

No retry.
No limiter usage.

Just fast rejection.

After recovery timeout → HALF_OPEN.

HALF_OPEN

Limited probing phase.

We allow a small number of calls.

If they succeed → CLOSED.
If they fail → back to OPEN.

This prevents retry storms after recovery.

The breaker is not just protection.

It’s a stability regulator.

Why retry must be jittered

Exponential backoff without jitter is dangerous.

If 1,000 clients retry at the same time:

You get a synchronized spike.

You kill the service again.

Jitter spreads retries across time.

Instead of:

all retry at t=1s

You get:

retry in [0.9s, 1.1s]

Small randomness → large stability gain.

This is one of those details that separates toy resilience
from production resilience.

Key-based isolation

Limiters operate per key:

user:123
tenant:acme
ip:10.0.0.1

Each key gets its own bucket.

This prevents one bad actor
from starving everyone else.

Internally this means:

dynamic bucket allocation
TTL eviction
bounded memory
optional LRU trimming

Without this,
rate limiting becomes a memory leak.

Sync + async parity

Most Python libraries choose:

sync OR async

LimitPal enforces parity.

Same API.
Different executor.

executor.run(...)
await executor.run(...)

No hidden behavior differences.

This matters when codebases mix:

background workers
HTTP servers
CLI tools

One mental model everywhere.

The real goal

LimitPal isn’t about rate limiting.

Or retry.

Or circuit breakers.

It’s about:

making failure behavior explicit and composable

Resilience stops being ad-hoc glue
and becomes architecture.

That’s the difference between:

“I added retry”

and

“I designed a failure strategy.”

What’s next

Planned work:

observability hooks
adaptive rate limiting
Redis backend
bulkhead pattern
framework integrations

Because resilience doesn’t end at execution.
It extends into operations.

Closing thought

Distributed systems fail.

That’s not optional.

What’s optional is whether failure behavior is:

accidental
or engineered

LimitPal is an attempt to engineer it.

Docs:
https://limitpal.readthedocs.io/

Repo:
https://github.com/Guli-vali/limitpal

If you like deep infrastructure tools — feedback welcome.

I Felt Like a Clown Wiring 5 Libraries Just to Build a Resilient API Client

Ilya Masliev — Thu, 05 Feb 2026 13:11:28 +0000

So I wrote one that unifies everything.

The moment it broke me

I just wanted a simple API client.

import httpx

async def fetch_user(user_id: str):
    async with httpx.AsyncClient() as client:
        r = await client.get(f"https://api.example.com/users/{user_id}")
        return r.json()

That lasted about 5 minutes.

Because real APIs:

rate limit you
timeout
return 503
sometimes completely die
and retries can DDoS your own service

So I did what every Python dev does.

I started stacking libraries.

The decorator tower of doom

First: rate limiting.

Then: retry.

Then: circuit breaker.

And suddenly my function looked like this:

@breaker
@retry(...)
@sleep_and_retry
@limits(...)
async def fetch_user(...):
    ...

And I hated it.

Not because it didn’t work — but because it didn’t scale.

Problems:

3+ libraries
fragile decorator ordering
conflicting abstractions
async quirks
painful testing
scattered observability
dependency sprawl

And this was for one function.

Now imagine 10 APIs. Per-user limits. Background jobs. Webhooks.

You’re no longer writing business logic.

You’re babysitting resilience glue code.

The idea: resilience as a pipeline

What if resilience wasn’t decorator soup?

What if every call flowed through a single orchestrator?

from limitpal import (
    AsyncResilientExecutor,
    AsyncTokenBucket,
    RetryPolicy,
    CircuitBreaker
)

executor = AsyncResilientExecutor(
    limiter=AsyncTokenBucket(capacity=10, refill_rate=100/60),
    retry_policy=RetryPolicy(max_attempts=3),
    circuit_breaker=CircuitBreaker(failure_threshold=5)
)

result = await executor.run("user:123", api_call)

No decorators.
No stacking libraries.
No fragile glue.

One execution pipeline.

That’s what LimitPal is.

What LimitPal actually gives you

LimitPal is a toolkit for building resilient Python clients and services.

It combines:

rate limiting (Token / Leaky bucket)
retry with exponential backoff + jitter
circuit breaker
composable limiters
a resilience executor that orchestrates everything

And:

full async + sync support
zero dependencies
thread-safe by default
deterministic time control for tests
key-based isolation (per-user / per-IP / per-tenant)

The goal isn’t more features.

The goal is fewer moving parts.

The resilience pipeline (this is the key idea)

Every call goes through:

Circuit breaker check
→ Rate limiter
→ Execute + retry loop
→ Record result

This ordering matters.

You’re not just “adding retry”.

You’re designing failure behavior as a system.

breaker stops cascading failures
limiter protects infrastructure
retry handles temporary issues
executor keeps it coherent

One mental model instead of five.

The testing problem nobody talks about

Time-based logic is brutal to test.

Traditional approach:

time.sleep(1)

Slow. Flaky. Non-deterministic.

LimitPal uses a pluggable clock.

So tests become:

clock.advance(1.0)

Instant. Deterministic.

You can simulate minutes of retries in milliseconds.

For teams that care about reliability, this is a game changer.

Real example

A resilient HTTP client in ~10 lines:

executor = AsyncResilientExecutor(
    limiter=AsyncTokenBucket(capacity=10, refill_rate=100/60),
    retry_policy=RetryPolicy(max_attempts=3, base_delay=0.5),
    circuit_breaker=CircuitBreaker(failure_threshold=5)
)

async def fetch():
    return await httpx.get("https://api.example.com")

result = await executor.run("api", fetch)

You automatically get:

burst control
exponential retry
cascading failure protection
clean async semantics

No decorator tower.

When should you use this?

Use LimitPal if you:

build API clients
call flaky third-party services
run background jobs
need per-user limits
care about deterministic tests
want clean async support

If you only need retry — smaller libs are fine.

If you need composition, that’s the niche.

Part 2: internals

This post is about the idea.

In Part 2 I’ll go deep into:

how the executor pipeline works
circuit breaker state machine
clock abstraction design
composite limiter architecture
failure modeling

Because resilience isn’t magic.

It’s architecture.

Try it

pip install limitpal

Docs:
https://limitpal.readthedocs.io/

Repo:
https://github.com/Guli-vali/limitpal

If it saves you pain — stars are welcome ⭐