Forem: Nehemiah

Building swiftdeploy: A Policy-Gated Deployment CLI

Nehemiah — Wed, 06 May 2026 20:41:57 +0000

In a world of "one-click" managed solutions, it’s easy to let the underlying mechanics of infrastructure become a mystery. But for those of us who want production-grade control and a deep understanding of the "why" behind the "how", managed solutions can sometimes feel like a black box, feels magical until something breaks at 2am and you have no mental model of what's actually happening under the hood.
So I built swiftdeploy — a deployment CLI that does the same job as those platforms, but entirely in code I wrote, understand, and can reason about at any layer.
This is the story of how it works, why I made the choices I did, and the one technical problem that turned out to be far more interesting than I expected.
The Philosophy: Own Your Abstractions
There's a version of this project where I reach for an existing solution. Argo CD, Flux — all excellent tools. But using them at this stage would have given me a working deployment pipeline and almost no understanding of what a deployment pipeline actually is.
The constraint I set myself was simple: if I can't explain exactly what happens between typing a command and traffic reaching my app, I don't get to use that tool.
This meant writing my own template engine around Jinja2, my own nginx config generator, my own health-check loop, and eventually my own policy engine integration. Every layer I owned became a layer I understood. That understanding compounds.
The engineering philosophy here isn't "reinvent everything." It's reinvent the things that teach you something. Deployment orchestration teaches you an enormous amount about networking, process lifecycle, and operational trust. That's worth the friction.
What swiftdeploy Actually Does
At its core, swiftdeploy is a CLI that manages a Docker Compose stack with a twist: nothing happens without a policy check first.

swiftdeploy init  #generate nginx.conf + docker-compose.yaml from your manifest 
swiftdeploy deploy # policy check → start stack → health check loop 
swiftdeploy promote # scrape metrics → policy check → switch canary/stable mode 
swiftdeploy status # live terminal dashboard with real-time policy compliance 
swiftdeploy audit # parse history.jsonl → generate audit_report.md

Every deploy is gated. Every promotion is evidence-based. Every decision is logged.
The Policy Sidecar: OPA as the Brain
The most deliberate architectural choice in this project is that the CLI never makes allow/deny decisions itself. All decision logic lives in Open Policy Agent, running as a sidecar container.
The CLI's job is to collect facts and ask questions. OPA's job is to answer them.
The Architecture

Here's something that initially seems like a minor detail but is actually load-bearing: nginx must never be able to reach OPA.
If nginx could reach the policy engine, a malformed request from the internet could theoretically influence policy evaluation. The trust boundary would be blurred. So OPA runs on an internal-only Docker network.
The CLI reaches OPA via a loopback-bound host port. Only processes running directly on the host can touch it. A container can't reach a loopback port on the host by default.
This isn't theoretical hardening. It's a concrete, verifiable guarantee that public traffic can never trigger a policy evaluation.
The Most Interesting Technical Challenge: P99 From a Histogram
The canary promotion gate blocks you from promoting if P99 latency exceeds 500ms. Simple requirement. Surprisingly interesting implementation.
Prometheus doesn't give you a P99 directly. It gives you a histogram — a set of cumulative bucket counts with upper bounds. To get a percentile, you have to interpolate across those buckets.
The naive approach — just read the bucket that contains the 99th percentile — gives you a ceiling, not a value. If your 99th observation falls somewhere inside the 0.25 bucket, you'd report 250ms regardless of whether the true value was 80ms or 249ms.
The correct approach is linear interpolation within the containing bucket. Find where the target rank (99% × total count) falls, identify which bucket it lands in, then interpolate based on how far through that bucket's count range you are.
But there's a second problem: you can't just read a snapshot. Prometheus counters are cumulative and monotonically increasing. If you scrape once and see 1000 requests in the 0.25 bucket, that's every request since the process started — not the last 30 seconds.
The solution is to take two scrapes separated by a time window and compute deltas. The bucket counts in the second scrape minus the first give you the distribution for only that window. Then you run the interpolation on those deltas.
This is what makes the promote gate actually meaningful: it's not checking the lifetime health of the service, it's checking the last 30 seconds of traffic before you ask it to carry more.
The Audit Trail
Every status scrape appends a JSON record to history.jsonl. Running swiftdeploy audit parses that file and produces a markdown report with metrics trends, and a dedicated violations section.
The design principle here is that observability is not optional. The history file survives stack restarts. The audit report gives you a narrative of exactly what your system was doing and what the policy engine was saying about it at every point.
In Conclusion
This project taught me a lot of things that are usually hidden under abstractions, The metrics interpolation was the most satisfying problem. It looks like a small utility function. It's actually the difference between a gate that measures something real and one that just performs measurement.
And the network isolation detail — the one that's easy to skip — is the one that determines whether your security model is real or decorative.
That's the thing about owning your abstractions. The interesting problems are hiding inside the details that pre-built platforms quietly handle for you.

Building a Real-Time Attack Detection Daemon

Nehemiah — Wed, 29 Apr 2026 22:50:21 +0000

Imagine you're running a busy coffee shop. On a normal day, about 30 customers walk in per hour. You know your regulars, you know the rhythm. Then one afternoon, 300 people rush in through the door in two minutes — and they're not ordering coffee, they're just slamming every cabinet open and closed.
You'd notice. You'd react.
That's exactly what this project does — but for an online service, instead of a coffee shop. It watches every single HTTP request coming into a server, learns what "normal" looks like, and automatically sounds the alarm (and slams the door shut) when something looks wrong.
Let's walk through how it works, piece by piece.
Step 1: Reading the Logs — The Monitor
The first thing the detector needs to do is read traffic data from the log, the reverse proxy, Nginx in this case is configured to write logs in JSON format, which makes them easy to parse programmatically. Each line looks like this:

{
  "timestamp": "1705318496.123",
  "source_ip": "1.2.3.4",
  "method": "GET",
  "path": "/login",
  "status": "200",
  "response_size": "4821"
}

Every field tells you something: who sent the request, when, what they asked for, and whether the server responded OK (status 200) or with an error (status 404, 500, etc.).
The monitor's job is to tail this file — meaning it reads new lines as they appear parses it and drops it into a queue for the detector to process.
This runs as a daemon — a background process that never stops. Not a cron job, not a script you run once. Always on, always watching.
Step 2: The Sliding Window — Counting Requests Over Time
Now the detector has a stream of incoming log entries. The first question it needs to answer is:
How fast is this IP sending requests right now?
You might think — just count all their requests! But that doesn't work. If an IP sent 10,000 requests six hours ago and 2 requests in the last minute, they're not attacking right now. You need to know the recent rate, not the all-time total.
This is where a sliding window comes in.
Think of it like a 60-second sliding ruler on a timeline:
As time moves forward, the window moves with it. Requests older than 60 seconds slide out of the left side. New requests come in on the right. At any moment, you can count how many requests are inside the window to get the current rate.
In code, we use a deque (a double-ended queue) — a list that's cheap to add to on the right and remove from on the left. We keep one window per IP address, plus one global window that counts all requests from all IPs combined.
Step 3: The Rolling Baseline — Learning What "Normal" Looks Like
Here's the thing about anomaly detection: you can't hardcode a threshold like "block anyone over 10 req/s." Why? Because traffic patterns are different at 3am versus 3pm. A small company might have 0.1 req/s average; a big one might have 50 req/s average. A threshold that's too low creates false alarms. Too high and you miss real attacks.
The solution is to let the system learn what normal looks like from real traffic, and update that knowledge continuously.
We do this with a rolling 30-minute baseline.
Every second, we count how many requests came in and store that number. After 30 minutes, from those numbers we calculate two things:
Mean (average) — the typical number of requests per second.
Standard deviation — how much the traffic normally varies around that average. Low stddev means traffic is very steady. High stddev means it's naturally spiky.
Anything beyond 3 standard deviations (3σ) from the mean is statistically very unlikely under normal traffic — it only happens by chance about 0.3% of the time. So if we see it, something unusual is probably happening. This runs every 60 seconds so the baseline is always fresh. If traffic naturally grows over the day, the baseline grows with it. It's self-adapting.
Step 4: Making a** Decision — The Anomaly Detection Log**
Now we have everything we need to answer the key question:
Is this traffic suspicious?
We use two tests, and either one can trigger an alert:
Test 1: The Z-Score
The z-score measures how many standard deviations the current rate is from the mean.
Test 2: The Multiplier
Z-scores can be misleading when traffic is very low (e.g., at 3am when stddev is near zero). So we also check: is the current rate more than 5 times the mean?
Test 3: Error Surge Detection
If an IP is generating lots of 4xx/5xx errors (like hammering /login and failing), that's a signal too. We check whether their error rate is 3× the normal error rate, and if so, we tighten the thresholds — making detection more sensitive for that IP.
What Happens When Something Is Flagged?
Step 5: Blocking With iptables
Once an IP is flagged, we need to actually stop its traffic. We use iptables — Linux's built-in firewall, built directly into the kernel.
Think of iptables as a bouncer standing at the network door. You give it a list of rules, and it checks every packet against that list before letting it through.
After this runs, packets from that IP are dropped at the kernel level — they never even reach Nginx.
Step 6: Auto-Unban With Backoff
We don't ban forever on the first offence (well, almost never). The unbanner follows a tiered schedule:

**Offence    Ban Duration**
 1st           10 minutes
 2nd           30 minutes
 3rd           2 hours
 4th           Permanent

Every 30 seconds, the unbanner checks whether any bans have expired. When it unbans an IP, it remembers the tier — so if that same IP attacks again, the next ban is longer. Repeat offenders escalate toward permanent.

Nginx writes a log line
        │
        ▼
monitor.py reads the new line, parses JSON
        │
        ▼
detector.py adds timestamp to IP's sliding window deque
        │
        ▼
detector.py calculates current rate (len(window) / 60)
        │
        ▼
detector.py computes z-score against rolling baseline
        │
     z > 3.0 or rate > 5x mean?
        │
       YES
        │
        ▼
blocker.py runs: iptables -I INPUT -s <ip> -j DROP && \
iptables -I FORWARD -s <ip> -j DROP
        │
        ▼
notifier.py fires Slack message + writes audit log entry
        │
        ▼
(10 minutes later) unbanner.py removes the iptables rule

All of these steps happen asynchronously inside a single Python process using threads.
What I Learned Building This
A few things that surprised me along the way:
1. Hardcoded thresholds are fragile. The very first version I sketched out used if rate > 20: ban(). That would have been a disaster — blocking legitimate traffic during a busy period, missing attacks at quiet times. The rolling baseline was the most important design decision.
2. iptables chain selection matters enormously. Using INPUT instead of FORWARD meant SSH dropped but HTTP kept flowing. Understanding how Docker intercepts packets before the kernel's normal routing is something a lot of guides skip over.
3. Standard deviation is surprisingly useful. Before this project, stddev felt like a statistics-class abstraction. Using it here to define "normal variance" made it concrete — it's just a measure of how wiggly your traffic normally is.

Security tooling doesn't have to be mysterious. At its core, this project is just: read data, count things, compare to normal, act when something's off. The same pattern underlies intrusion detection systems, fraud detection, network monitoring, and a lot of other "scary" security tools.
Once you understand the pieces — sliding windows, rolling baselines, z-scores, iptables — you can compose them into something genuinely useful.