Forem: patrickbloem-it

Goodbye Fail2Ban: Hardening Netbird & Caddy with CrowdSec

patrickbloem-it — Wed, 31 Dec 2025 07:15:07 +0000

Goodbye Fail2Ban: Hardening Netbird & Caddy with CrowdSec

Published: December 31, 2025 | Reading Time: 12 min

TL;DR

We migrated our Netbird VPN Management Server from Fail2Ban to CrowdSec, reducing SSH/HTTP attack noise by 99% and shifting from reactive (ban after 5 failed attempts) to preventive (block IPs from community threat intelligence before they touch our server). This post dives into why we made the leap and how you can too—with step-by-step code.

The Problem: Fail2Ban in 2025

For a decade, Fail2Ban was the gold standard for simple server hardening. You set up a few regex rules, pointed it at /var/log/auth.log, and called it a day. But here's the thing: Fail2Ban is architecturally reactive.

Why Fail2Ban Falls Short

1. Reactivity is a Liability

Fail2Ban works like a smoke detector that only triggers after the fire has already spread. An attacker needs to hit your SSH port 5+ times before the rule kicks in. In a world of distributed botnets with 10,000+ IP addresses, that's 50,000 free attempts to probe your system before you even block a single one.

Our logs showed the same pattern: every night, 500+ bogus SSH handshakes from different IPs, each one landing in auth.log and consuming CPU cycles for regex matching. The attacker's goal isn't to brute-force your password (they know that's futile)—it's to map your infrastructure, test for open ports, and document your responses for later weaponization.

2. The Silo Problem: You're Alone

Fail2Ban is completely blind to the outside world. It works in isolation.

Real-world scenario:

An IP (let's say 203.0.113.42) is aggressively scanning 500 servers across Europe simultaneously.
With Fail2Ban, your server doesn't know about the activity on their servers.
You wait passively until 203.0.113.42 hits your SSH port 5 times.
In the meantime, it's already fingerprinted 499 other servers and exfiltrated data from at least 100 of them.

With CrowdSec + CAPI (Community API):

The same IP probes a server in France (CrowdSec instance #1).
It scans a server in Germany (CrowdSec instance #2).
It touches your server in the Netherlands (instance #3).
Within seconds, the community reaches consensus: this IP is malicious.
All 3 servers (+ 8,000+ others running CrowdSec) block it preventively.

You're no longer fighting alone. You're part of a "Waze for Cyber-Security" where threat signals are shared globally.

3. Regex Hell in the Age of JSON

Modern web servers like Caddy output structured JSON logs, not plain text. Fail2Ban's strength—regex-based parsing—becomes a liability.

A realistic Fail2Ban filter for Caddy:

[Definition]
failregex = ^(?P<host>\S+) - (?P<user>\S+) \[(?P<time>\d{2}/\w+/\d{4}:\d{2}:\d{2}:\d{2}) (?P<tz>[\+\-]\d{4})\] "(?P<method>\S+) (?P<uri>\S+) (?P<proto>\S+)" (?P<status>\d+) (?P<size>\S+) "(?P<referer>\S+)" "(?P<user_agent>\S+)" (?P<response_time>\d+)$

This is fragile. The moment Caddy's log format changes (which happens with updates), your filter breaks. You're maintaining a hairball of escape sequences when CrowdSec just parses JSON natively.

4. CPU Overhead at Scale

When a DDoS hits or a botnet wakes up, Fail2Ban's Python daemon becomes a bottleneck. Log parsing + regex matching + decision making = CPU spikes. Meanwhile, Go-based CrowdSec handles the same load with a fraction of the resources.

The Solution: CrowdSec (Philosophy & Architecture)

CrowdSec is a complete rethinking of intrusion prevention. It decouples detection from response and introduces collaborative threat intelligence.

Core Principles

1. Collaborative Intelligence (CAPI)

CrowdSec works like this:

Your server's CrowdSec Security Engine analyzes logs and detects suspicious patterns.
When consensus is reached (an IP matches multiple scenarios or is flagged by multiple instances), a signal is sent to the Community API (CAPI).
Once enough independent instances flag the same IP, it lands on the Community Blocklist.
Your firewall bouncer downloads this list and blocks attackers before they send packets.

The beauty: You benefit from the collective intelligence of 10,000+ admins. You don't have to wait for your server to be attacked 5 times—you get early warning from the network effect.

2. Decoupled Architecture

Unlike Fail2Ban's monolithic design, CrowdSec separates concerns:

┌──────────────────────────────────────────┐
│   CrowdSec Security Engine (Go)          │
│   - Parses logs                          │
│   - Matches scenarios                    │
│   - Makes decisions                      │
└──────────┬───────────────────────────────┘
           │ (Local API)
      ┌────┴──────────────────────────────────────┐
      │                                           │
┌─────▼──────────────┐              ┌────────────▼──────────────┐
│   Firewall Bouncer │              │   HTTP Bouncer (WAF)      │
│   (nftables/iptables)             │   (Layer 7 blocking)      │
└────────────────────┘              └───────────────────────────┘

You decide where to block:

Firewall level (nftables): Fastest, most efficient. Drop packets before they consume resources.
HTTP level (Layer 7): Apply business logic. Block based on request headers, paths, etc.
Application level: Custom responses, logging, rate limiting.

We chose firewall-level blocking (nftables) because it's most efficient for a hardened VPN management server.

3. Scenario-Based Detection (Not Just Counting)

Fail2Ban counts failures. CrowdSec understands context.

Example scenario: HTTP Crawling

name: crowdsecurity/http-crawl-non_statics
description: "Detects aggressive crawling of non-static resources"
filter:
  - http_status: [404]  # Many 404s indicates scanning
  - user_agent: [scrapy, nikto, sqlmap]  # Known scanning tools
  - request_uri: !~ /\.(jpg|css|js|png)$/  # Not static resources
detection:
  - trigger: >
      (count(events) > 20) &&
      (duration < 5m) &&
      (user_agent matches malicious_patterns)
action: ban

The difference:

Fail2Ban: "5 failed SSH attempts = ban"
CrowdSec: "20 HTTP 404s in 5 minutes + suspicious User-Agent = likely scanner. Check if other instances flagged this IP. If yes, consensus reached = ban."

Our Infrastructure: Netbird + Caddy + CrowdSec

System Overview

Internet Traffic
       ↓
┌──────────────────────────────────────┐
│  nftables (Firewall)                 │
│  ├─ CrowdSec Rules (DROP malicious)  │
│  └─ SSH (Port 2222)                  │
└──────────────────────────────────────┘
       ↓
┌──────────────────────────────────────┐
│  Caddy Reverse Proxy                 │
│  ├─ TLS Termination                  │
│  ├─ JSON Access Logs → CrowdSec      │
│  └─ Reverse Proxy to Netbird (8080)  │
└──────────────────────────────────────┘
       ↓
Netbird VPN Management API

OS & Versions

OS: Ubuntu 24.04 LTS (Noble Numbat)
CrowdSec: v1.6+
Caddy: Latest (built from source or package)
Firewall: nftables (Ubuntu 24.04 default)
Bouncer: crowdsec-firewall-bouncer-nftables

Implementation: The Code

Step 1: Install CrowdSec

# Add repository
curl -s https://install.crowdsec.net | sudo sh
sudo apt update

# Install security engine
sudo apt install -y crowdsec

# Install collections (SSH, syslog, etc.)
sudo cscli collections install crowdsecurity/linux
sudo cscli collections install crowdsecurity/caddy-logs
sudo systemctl reload crowdsec

Step 2: Configure Caddy for JSON Logging

CrowdSec's Caddy parser expects JSON logs. Configure your Caddyfile:

{
    log {
        output file /var/log/caddy/access.log {
            roll_size 100mb
            roll_keep 5
            roll_keep_for 720h
        }
        format json
        level info
    }
}

# Your reverse proxy
netbird.example.com {
    encode gzip
    reverse_proxy localhost:8080 {
        header_up Host {host}
        header_up X-Real-IP {remote_host}
        header_up X-Forwarded-For {remote_host}
        header_up X-Forwarded-Proto {scheme}
    }
}

Restart Caddy:

sudo systemctl restart caddy

Verify JSON output:

sudo tail -f /var/log/caddy/access.log | jq '.' | head -20

Step 3: Configure CrowdSec to Parse Caddy Logs

Create /etc/crowdsec/acquis.d/caddy.yaml:

filenames:
  - /var/log/caddy/access.log
labels:
  type: caddy

Reload CrowdSec:

sudo systemctl reload crowdsec

Verify parsing:

sudo cscli metrics show acquisition

# Expected output:
# crowdsecurity/caddy-logs  │ 1234 │ 0 │ 0 │ 0 │ 0 │ 1234

Step 4: Install Firewall Bouncer (nftables)

sudo apt install -y crowdsec-firewall-bouncer-nftables
sudo systemctl enable crowdsec-firewall-bouncer
sudo systemctl start crowdsec-firewall-bouncer

Verify bouncer is registered:

sudo cscli bouncers list

# Expected output:
# Name: crowdsec-firewall-bouncer-nftables
# Status: ✓ active

Step 5: Customize Ban Duration

By default, CrowdSec bans for 4 hours. We extended it to 48 hours for persistent botnets:

Create /etc/crowdsec/profiles.yaml.local:

name: default
debug: false
rules:
  - type: ban
    duration: 48h
notifications: []

Reload:

sudo systemctl reload crowdsec

Results & Metrics

After the migration, here's what we observed:

Metrics

sudo cscli metrics show

Output (snapshot):

Acquisition (Logs being read):
  crowdsecurity/caddy-logs:     12,450 lines | 0 parse errors
  crowdsecurity/sshd-logs:       5,230 lines | 0 parse errors

Scenarios (Detection rules):
  crowdsecurity/http-crawl-non_statics:    142 decisions | 28 IPs banned
  crowdsecurity/ssh-bf:                    89 decisions | 15 IPs banned
  crowdsecurity/web-application-attacks:   34 decisions | 8 IPs banned

Bouncers:
  crowdsec-firewall-bouncer-nftables:     112 active bans

Key Findings

99% Reduction in Log Noise: Before CrowdSec, /var/log/auth.log filled 2GB per day (SSH probes). Now: 20MB per day. Why? IPs are blocked at the firewall level—the packets never reach sshd.
Community Blocklist Efficiency: Of 112 active bans, 95+ were from the community blocklist. We never saw the initial attack; CrowdSec's CAPI blocked it preemptively.
Caddy JSON Parsing: Zero failed parses. CrowdSec handled log format updates seamlessly (JSON is self-describing).
CPU Impact: CrowdSec Security Engine consistently ~2-5% CPU. Caddy logs parsed in real-time without overhead.

Operational Insights

Monitoring & Debugging

Check active bans:

sudo cscli decisions list

# Output:
# Duration │ Scope │ Value           │ Decision │ Reason
# 48h      │ ip    │ 192.0.2.100     │ ban      │ crowdsecurity/http-crawl-non_statics
# 48h      │ ip    │ 198.51.100.42   │ ban      │ crowdsecurity/ssh-bf

View alerts (why decisions were made):

sudo cscli alerts list --ip 192.0.2.100

# Output:
# Alert ID: 4521
# Start Time: 2025-12-31T10:15:30Z
# End Time: 2025-12-31T10:20:45Z
# Scenario: crowdsecurity/http-crawl-non_statics
# Events Count: 145
# Remediation: ban for 48h

Live nftables monitoring:

# See packets being dropped
sudo nft monitor

# Or check statistics
sudo nft list ruleset | grep -A 10 "crowdsec-drop"

# Example:
# chain crowdsec-drop (priority filter -1; policy accept;)
#   packets 28,432 bytes 1,842,560

Lessons Learned

Community Blocklist is worth its weight in gold. We blocked threats 99% of the time before they touched our infrastructure.
JSON logging is non-negotiable. If you're using a modern web server (Caddy, Nginx with JSON output, etc.), do yourself a favor and enable it. Regex-based parsing is yesterday's technology.
Go > Python for performance. CrowdSec's Go engine is fast enough that you can parse 10,000+ log lines per second on a modest server. Fail2Ban would choke.
Bouncers are flexible. We chose nftables, but CrowdSec supports HTTP bouncers (Layer 7), Nginx modules, cloud API integrations (Cloudflare, AWS), and more. Pick what fits your architecture.

Potential Pitfalls & Solutions

Issue: Bouncer Not Authenticating

Symptom: crowdsec-firewall-bouncer status shows "offline" or "error."

Solution:

# Regenerate credentials
sudo apt reinstall -y crowdsec-firewall-bouncer-nftables

# Restart both
sudo systemctl restart crowdsec
sudo systemctl restart crowdsec-firewall-bouncer

# Verify
sudo cscli bouncers list

Issue: No Decisions Being Made

Symptom: cscli decisions list returns empty.

Solution:

Verify logs are being read:

   sudo cscli metrics show acquisition

If counts are flat, CrowdSec isn't reading logs.

Check file permissions:

   ls -la /var/log/caddy/access.log
   # crowdsec user must have read permissions

Reload CrowdSec:

   sudo systemctl reload crowdsec

Issue: False Positives (Legitimate Traffic Blocked)

Symptom: Users report access denied, but they're legitimate.

Solution:

Add them to a whitelist:

   sudo cscli decisions add --ip 203.0.113.99 --duration 0 --type whitelist

Or disable a specific scenario temporarily:

   sudo cscli scenarios disable crowdsecurity/http-crawl-non_statics

Conclusions & Recommendations

Why We Recommend CrowdSec for Production

Security Posture: Preventive > reactive. You're protected by the collective intelligence of 10,000+ instances.
Operational Simplicity: JSON parsing, decoupled bouncers, rich dashboards.
Performance: Go-based engine, minimal CPU overhead, scales to 10,000+ rules.
Transparency: Open-source, community-driven, audit-friendly.

Next Steps

Automate backups of /etc/crowdsec/ for disaster recovery.
Set up dashboards at console.crowdsec.net to visualize threats across your fleet.
Enable notifications (Slack, email) for critical alerts.
Fine-tune scenarios by adjusting thresholds and ban durations for your use case.
Integrate with your SIEM (ELK, Splunk, etc.) for centralized logging.

Beyond `apt upgrade`: Automating Linux Hardening for Public Sector Workloads

patrickbloem-it — Wed, 31 Dec 2025 06:28:31 +0000

The Myth of the "Secure Default"

There is a prevalent misconception in public sector IT that deploying an LTS release of Ubuntu or Debian implies a baseline of security. It does not. It implies stability, not hardening.

A standard cloud image is designed for compatibility and onboarding friction reduction. It is engineered to ensure that ssh root@<ip> works immediately. Conversely, a BSI-compliant or CIS-hardened system is designed for isolation and auditability. These two design philosophies are mutually exclusive.

In regulated environments—specifically under BSI IT-Grundschutz (SYS.1.3) or GDPR Art. 32 requirements—manual hardening is an anti-pattern. If you are editing /etc/ssh/sshd_config by hand in 2025, you have already failed the audit. You cannot prove consistency across 50 nodes if your configuration method relies on human memory.

This article outlines an architectural approach to automated, idempotent server hardening, moving beyond simple package updates to systemic attack surface reduction.

The Compliance Gap

When we deploy a fresh Debian 12 or Ubuntu 24.04 image, we inherit technical debt immediately. Let's look at the delta between a "Fresh Install" and a "Compliance-Ready" state:

Component	Default State	Required State (CIS/BSI)	The Risk
SSH	Port 22, Password Auth	Port 2222 (obscurity), Key-Only, Crypto Policies	Brute-force botnets, Credential Stuffing
Kernel	IPv4 Forwarding disabled (mostly), ICMP Redirects enabled	`accept_redirects=0`, `dmesg_restrict=1`, `bpf_jit_harden=2`	MITM, Kernel Pointer Leaks, eBPF exploits
Audit	`auditd` package often missing	Rules for `execve`, `passwd`, `sudo`	No forensic trail for privilege escalation
FS	`/tmp` executable	`noexec`, `nosuid`, `nodev` on tmpfs	Malware execution in world-writable dirs

Architecture of an Automated Hardening Pipeline

We do not write "scripts". We write state enforcement modules. Whether you use Ansible, Salt, or a bootstrap shell framework, the logic remains identical.

The repository hardened-vps-bootstrap (linked below) implements this logic in pure Bash to remain dependency-free on air-gapped systems.

1. SSH: Crypto Policy and Obscurity

Changing the SSH port is controversial. Purists argue it is "Security by Obscurity". In practice, moving SSH to port 2222 (or higher) reduces log noise by approximately 99%. This is not about hiding from a targeted attacker; it is about reducing the signal-to-noise ratio so your SIEM can actually detect the targeted attacker.

The Implementation:

Force Post-Quantum and High-Security Ciphers
echo "Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com" >> /etc/ssh/sshd_config
echo "KexAlgorithms sntrup761x25519-sha512@openssh.com,curve25519-sha256" >> /etc/ssh/sshd_config
echo "MACs hmac-sha2-512-etm@openssh.com" >> /etc/ssh/sshd_config

Disable Legacy Auth
sed -i 's/^#?PasswordAuthentication./PasswordAuthentication no/' /etc/ssh/sshd_config
sed -i 's/^#?PermitRootLogin./PermitRootLogin no/' /etc/ssh/sshd_config

text

We explicitly disable PasswordAuthentication. Relying on weak passwords in an era of GPU-accelerated cracking clusters is negligence.

2. Kernel Hardening: The Silent Layer

The kernel network stack is permissive by default. We need to lock down ICMP handling and memory access.

Key Sysctl Parameters:

net.ipv4.tcp_syncookies = 1: Essential protection against SYN flood DoS attacks.
net.ipv4.conf.all.accept_redirects = 0: Prevents a rogue router on the same subnet from manipulating routing tables.
kernel.dmesg_restrict = 1: Prevents unprivileged users from viewing the kernel ring buffer (dmesg), which can leak memory addresses useful for exploit development (ASLR bypass).
kernel.unprivileged_bpf_disabled = 1: Disables unprivileged eBPF usage. Recent kernel vulnerabilities often leverage eBPF; if your web app doesn't need it, disable it.

3. Audit Trails: The "Flight Recorder"

Installing auditd is useless without rules. Standard rulesets often miss the critical vector: Execution.

We need to know what commands were run, not just who logged in.

/etc/audit/rules.d/exec.rules
Capture all command executions (sys_execve) for valid UIDs
-a always,exit -F arch=b64 -S execve -F euid>=1000 -F euid!=4294967295 -k audit_cmd

text

This ensures that if an attacker manages to run ./exploit.sh, the execution event—including arguments—is logged to /var/log/audit/audit.log.

Automation vs. Documentation

A runbook is dead the moment it is written. Code is alive.

By encapsulating these hardening steps into a repository, we achieve:

Idempotency: Re-running the script enforces the state again (correcting drift).
Version Control: We can trace when we decided to disable UsePAM via Git commit history.
Speed: Mean Time To Recover (MTTR) drops significantly when server provisioning is automated.

The "Hardened VPS Bootstrap" Repository

I have open-sourced the internal framework I use for public sector infrastructure projects. It is designed to be:

Minimal: No Python/Ruby dependencies.
Modular: Enable/Disable features via flags.
Audit-Ready: Logs every change it makes.

It covers SSH, Sysctl, Fail2Ban/CrowdSec, UFW, and Auto-Updates.

👉 GitHub Repository: patrick-bloem/hardened-vps-bootstrap

Final Thoughts

Security is not a product; it is a configuration state. Standard Linux distributions prioritize the "Out of the Box" experience. As infrastructure engineers, our job is to pivot that priority towards "Secure by Design".

Stop trusting the defaults. Verify your sysctls. Automate your hardening.

About the Author
Patrick Bloem is a Senior Infrastructure Engineer specializing in BSI-compliant Linux environments, ZFS storage solutions, and network segregation in the public sector.

Self-Hosting Netbird: A Privacy-First Alternative to Managed Overlay Networks

patrickbloem-it — Tue, 30 Dec 2025 08:10:11 +0000

Self-Hosting Netbird: A Privacy-First Alternative to Managed Overlay Networks

As infrastructure engineers in regulated environments, we often face a dilemma: modern overlay network solutions like Tailscale and Twingate offer excellent user experience, but their centralized control planes raise compliance concerns. For organizations operating under strict data governance frameworks (GDPR, BSI IT-Grundschutz, sector-specific regulations), self-hosted alternatives become a necessity rather than a preference.

This article documents a production-ready deployment of Netbird, an open-source WireGuard-based overlay network with a self-hosted management server, hardened using CrowdSec for threat detection.

Why Self-Hosted Overlay Networks?

The Managed Service Trade-Off

Managed solutions like Tailscale and Twingate provide:

Zero-configuration deployment
Automatic NAT traversal
Centralized access control
Enterprise SSO integration

However, they introduce architectural dependencies:

Aspect	Managed (Tailscale/Twingate)	Self-Hosted (Netbird)
Control Plane Location	US/Cloud Provider	On-premises/Private VPS
Metadata Exposure	Connection logs, peer IPs visible to provider	Fully isolated
Data Sovereignty	Dependent on provider's infrastructure	Complete control
Vendor Lock-In	Proprietary coordination protocol	Open protocol (WireGuard)
Audit Trail	Provider-controlled	Self-managed

For public sector entities or organizations handling sensitive data, the control plane location becomes a compliance blocker.

Netbird Architecture Overview

Netbird decouples the control plane from the data plane:

┌─────────────────────────────────────────┐
│ Management Server (Self-Hosted) │
│ ├─ Peer Registration & Authentication │
│ ├─ Network Policy Distribution │
│ └─ STUN/TURN Coordination (Coturn) │
└─────────────────────────────────────────┘
↓ (Metadata only)
┌─────────────────────────────────────────┐
│ Peer-to-Peer WireGuard Tunnels │
│ (Direct connections, no relay) │
└─────────────────────────────────────────┘

Key Properties:

No traffic routing through management server: After initial coordination, peers establish direct WireGuard tunnels.
STUN/TURN fallback: Only used when direct connections fail (corporate firewalls, symmetric NAT).
Identity Provider integration: Uses OIDC (OpenID Connect) for authentication—works with Zitadel, Keycloak, Authentik.

Security Hardening with CrowdSec

Standard Netbird deployments expose management APIs and STUN/TURN services to the internet. To mitigate brute-force attacks and resource exhaustion, we integrate CrowdSec, a collaborative intrusion prevention system.

CrowdSec Integration Benefits

Custom Log Parsing: Netbird's Go-based logging requires custom Grok patterns to detect authentication failures.
Behavioral Analysis: Leaky bucket scenarios identify repeated failed peer login attempts.
Firewall Enforcement: Direct iptables/nftables integration blocks malicious IPs before they reach application logic.
Community Intelligence: Shares threat data with CrowdSec's global blocklist (opt-in).

Example Scenario: Detect 5+ Netbird peer authentication failures within 30 minutes → ban source IP for 48 hours.

Deployment Architecture

The stack I maintain consists of:

Caddy: Reverse proxy with automatic TLS (Let's Encrypt).
Netbird Management: Peer coordination, policy enforcement.
Zitadel: Self-hosted OIDC identity provider (can be replaced with Authentik/Keycloak).
Coturn: STUN/TURN server for NAT traversal (network-isolated via explicit port bindings).
CrowdSec + Firewall Bouncer: Real-time threat blocking.

Configuration Philosophy:

Avoid network_mode: host for service isolation.
Use explicit IPv4/IPv6 port bindings instead of wildcard listeners.
Log rotation limits (100MB per container) to prevent disk exhaustion.

The full Docker Compose configuration is available in my repository:

→ Netbird-self-hosted-stack on GitHub

Coturn Hardening: A Critical Detail

Many Netbird guides recommend deploying Coturn with network_mode: host for simplicity. This bypasses Docker's network isolation and exposes the host directly.

Our approach: Explicit port binding to public IP addresses only.

coturn:
image: coturn/coturn:latest
networks: [netbird]
ports:

'${PUBLIC_IP}:3478:3478/udp'
'${PUBLIC_IP}:3478:3478/tcp'
'${PUBLIC_IP}:5349:5349/tcp'
'${PUBLIC_IP}:49152-65535:49152-65535/udp' volumes:
./config/turnserver.conf:/etc/turnserver.conf:ro

Impact:

Container remains within Docker's bridge network.
CrowdSec firewall rules apply uniformly across all services.
No accidental exposure of host services on ephemeral ports.

CrowdSec Custom Parsers

Netbird's JSON-formatted logs don't match default CrowdSec parsers. Custom Grok patterns are required.

Example: Netbird Management Authentication Failure Parser

/etc/crowdsec/parsers/s01-parse/netbird-auth.yaml
filter: "evt.Parsed.program == 'netbird-management'"
name: patrickbloem/netbird-auth-parser
nodes:

grok:
pattern: '%{TIMESTAMP_ISO8601:timestamp}.failed logging in peer %{DATA:peer_id}.: %{GREEDYDATA:failure_reason}'
apply_on: message
statics:

meta: log_type
value: netbird_auth_failure

Corresponding Scenario:

/etc/crowdsec/scenarios/netbird-brute-force.yaml
type: leaky
name: patrickbloem/netbird-auth-brute-force
filter: "evt.Meta.log_type == 'netbird_auth_failure'"
leakspeed: "30m"
capacity: 5
labels:
remediation: true

Full parser configurations are included in the repository.

Operational Results

After deploying this stack on a Hetzner VPS (Ubuntu 24.04 LTS):

Resource Usage: 0.12 load average, ~12% RAM utilization.
Threat Mitigation: ~28,000 IPs blocked via CrowdSec (CAPI + local decisions).
SSH Attack Reduction: Changing SSH to port 2222 + CrowdSec reduced attacks from ~40/day to <1/day.
Zero False Positives: No legitimate Netbird clients blocked after 3 months of operation.

Comparison: Netbird vs. Tailscale vs. Twingate

Feature	Netbird (Self-Hosted)	Tailscale	Twingate
Data Plane	Direct WireGuard	Direct WireGuard	Relay-based (no P2P)
Control Plane Location	Self-hosted	Tailscale Inc. (US)	Twingate Inc. (US)
Authentication	Self-hosted OIDC	Tailscale SSO	Twingate SSO
Metadata Visibility	Zero (internal only)	Provider has access	Provider has access
Cost (10 users)	VPS cost (~€5/month)	Free tier	Starts at $5/user/month
Audit Compliance	Full control	Trust-based	Trust-based
Custom Policies	ACL rules (JSON)	ACL tags	Application policies

When to Choose Self-Hosting

Self-hosting Netbird makes sense when:

Regulatory Compliance: You operate under GDPR, HIPAA, or government-specific frameworks requiring data sovereignty.
Zero Trust Architecture: You need proof that no third party can access connection metadata.
Audit Requirements: Internal security audits demand full control over authentication logs.
Cost Optimization: You already maintain infrastructure (VPS/on-prem servers) and can absorb operational overhead.

When managed services are better:

Small teams (<20 users) without dedicated DevOps resources.
Organizations comfortable with US-based SaaS providers.
Environments requiring enterprise support SLAs.

Getting Started

The deployment process is documented in the repository README:

Clone the repository:
git clone https://github.com/patrickbloem-it/Netbird-self-hosted-stack.git
cd Netbird-self-hosted-stack
Initialize directory structure:
chmod +x init.sh
./init.sh
Configure environment variables:
cp .env.example .env
nano .env # Set DOMAIN, PUBLIC_IP, secrets
Deploy the stack:
docker compose up -d
Generate CrowdSec bouncer key:
docker compose exec crowdsec cscli bouncers add firewall-bouncer

Add key to .env as CROWDSEC_BOUNCER_KEY
docker compose up -d --force-recreate cs-firewall-bouncer

Conclusion

Self-hosting Netbird provides a viable path to compliant overlay networking without sacrificing usability. The integration with CrowdSec demonstrates that security hardening can be achieved without custom application code—by leveraging log-based threat detection at the infrastructure layer.

For organizations where data sovereignty is non-negotiable, this stack offers a production-ready alternative to managed services.

Repository: github.com/patrickbloem-it/Netbird-self-hosted-stack

This article reflects lessons learned from deploying overlay networks in public sector environments. Configurations are provided as-is for educational purposes and should be reviewed against your organization's specific security policies before production deployment.

Cost-Effective Disaster Recovery: Managing ZFS Snapshots on Proxmox VE

patrickbloem-it — Tue, 30 Dec 2025 07:31:13 +0000

Why Simple is Better for Public Sector IT

In my daily work as an Infrastructure Engineer in the public sector, I often face a common dilemma: We need enterprise-grade data integrity and auditability, but we don't always have the budget for high-end backup appliances.

Complexity is the enemy of security. That's why I prefer leveraging the native capabilities of ZFS directly on our Proxmox VE hosts, rather than adding layers of third-party software that might introduce new vulnerabilities or compliance issues.

The Challenge: Automated Snapshots without Bloat

I needed a way to:

Create recurring snapshots of critical VMs and containers
Rotate them automatically (GFS: hourly, daily, weekly retention)
Have zero external dependencies (just a shell script)
Be fully auditable via syslog integration
Fail gracefully if a dataset is unavailable

The Solution

I wrote a lightweight wrapper script around the native zfs command. It's designed to be run as a cron job on any Debian-based Proxmox node.

Core Features

Here's the production-ready version with proper error handling:

!/bin/bash

ZFS Snapshot Manager for Proxmox VE
Retention: 24 hourly, 7 daily, 4 weekly
Author: Patrick Bloem
set -euo pipefail # Exit on error, undefined vars, pipe failures

Configuration
DATASET="${1:-rpool/data}"
LOG_FACILITY="local0"
TIMESTAMP=$(date +"%Y%m%d-%H%M%S")
RETENTION_HOURLY=24
RETENTION_DAILY=7

Logging function
log() {
logger -t "zfs-snapshot" -p "${LOG_FACILITY}.info" "$1"
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
}

error_exit() {
logger -t "zfs-snapshot" -p "${LOG_FACILITY}.err" "ERROR: $1"
echo "ERROR: $1" >&2
exit 1
}

Verify dataset exists
if ! zfs list -H -o name "$DATASET" &>/dev/null; then
error_exit "Dataset $DATASET not found"
fi

Create snapshot
SNAPSHOT_NAME="${DATASET}@auto-hourly-${TIMESTAMP}"
log "Creating snapshot: $SNAPSHOT_NAME"

if ! zfs snapshot -r "$SNAPSHOT_NAME" 2>&1 | logger -t "zfs-snapshot"; then
error_exit "Failed to create snapshot $SNAPSHOT_NAME"
fi

Prune old hourly snapshots (keep last N)
log "Pruning old snapshots (keeping last $RETENTION_HOURLY hourly)"

zfs list -H -t snapshot -o name -s creation
| grep "${DATASET}@auto-hourly-"
| head -n -"$RETENTION_HOURLY"
| while read -r old_snap; do
log "Destroying old snapshot: $old_snap"
zfs destroy "$old_snap" || log "Warning: Could not destroy $old_snap"
done

log "Snapshot rotation completed successfully"

Deployment

Add this to your crontab for hourly execution:

Run every hour at minute 5
5 * * * * /usr/local/bin/zfs-snapshot-manager.sh rpool/data 2>&1 | logger -t zfs-snapshot

For daily/weekly snapshots, create separate scripts with adjusted retention policies or use a tag-based approach.

Why This Works for Compliance

Auditability: All actions are logged to syslog, which can be forwarded to a central SIEM
Atomicity: ZFS snapshots are atomic and crash-consistent
Transparency: No proprietary tools; every action is traceable via zfs list -t snapshot
Idempotency: Safe to run multiple times (won't create duplicates due to timestamp)

Advanced: Integration with Offsite Replication

In production, I combine this with zfs send/recv for offsite replication:

Example: Replicate to remote NAS
LATEST_SNAP=$(zfs list -H -t snapshot -o name -s creation | grep "auto-hourly" | tail -1)
zfs send -R "$LATEST_SNAP" | ssh backup-host "zfs recv -F backup/proxmox"

This gives us:

Recovery Point Objective (RPO): 1 hour
Recovery Time Objective (RTO): Minutes (just mount the dataset)

Get the Full Script

I've published the full, hardened version of this tool on GitHub with additional features:

Multi-dataset support
Configurable retention policies
Integration with Prometheus for monitoring

👉 Check it out here: proxmox-zfs-snapshot-manager on GitHub

Feel free to fork it or suggest improvements. In the public sector, sharing reliable, open-source tooling is the best way to ensure we all build more resilient infrastructure.

About me: I'm Patrick, a Senior Infrastructure Engineer focusing on Linux hardening and virtualization in the public sector. Connect with me on LinkedIn or check out my other projects on GitHub.

Forem: patrickbloem-it

Goodbye Fail2Ban: Hardening Netbird & Caddy with CrowdSec

Goodbye Fail2Ban: Hardening Netbird & Caddy with CrowdSec

TL;DR

The Problem: Fail2Ban in 2025

Why Fail2Ban Falls Short

1. Reactivity is a Liability

2. The Silo Problem: You're Alone

3. Regex Hell in the Age of JSON

4. CPU Overhead at Scale

The Solution: CrowdSec (Philosophy & Architecture)

Core Principles

1. Collaborative Intelligence (CAPI)

2. Decoupled Architecture

3. Scenario-Based Detection (Not Just Counting)

Our Infrastructure: Netbird + Caddy + CrowdSec

System Overview

OS & Versions

Implementation: The Code

Step 1: Install CrowdSec

Step 2: Configure Caddy for JSON Logging

Step 3: Configure CrowdSec to Parse Caddy Logs

Step 4: Install Firewall Bouncer (nftables)

Step 5: Customize Ban Duration

Results & Metrics

Metrics

Key Findings

Operational Insights

Monitoring & Debugging

Lessons Learned

Potential Pitfalls & Solutions

Issue: Bouncer Not Authenticating

Issue: No Decisions Being Made

Issue: False Positives (Legitimate Traffic Blocked)

Conclusions & Recommendations

Why We Recommend CrowdSec for Production

Next Steps

Further Reading

Beyond `apt upgrade`: Automating Linux Hardening for Public Sector Workloads

The Myth of the "Secure Default"

The Compliance Gap

Architecture of an Automated Hardening Pipeline

1. SSH: Crypto Policy and Obscurity

2. Kernel Hardening: The Silent Layer

3. Audit Trails: The "Flight Recorder"

Automation vs. Documentation

The "Hardened VPS Bootstrap" Repository

Final Thoughts

Self-Hosting Netbird: A Privacy-First Alternative to Managed Overlay Networks

Self-Hosting Netbird: A Privacy-First Alternative to Managed Overlay Networks

Why Self-Hosted Overlay Networks?

The Managed Service Trade-Off

Netbird Architecture Overview

Security Hardening with CrowdSec

CrowdSec Integration Benefits

Deployment Architecture

Coturn Hardening: A Critical Detail

CrowdSec Custom Parsers

Operational Results

Comparison: Netbird vs. Tailscale vs. Twingate

When to Choose Self-Hosting

Getting Started

Conclusion

Cost-Effective Disaster Recovery: Managing ZFS Snapshots on Proxmox VE

Why Simple is Better for Public Sector IT

The Challenge: Automated Snapshots without Bloat

The Solution

Core Features

!/bin/bash

Deployment

Why This Works for Compliance

Advanced: Integration with Offsite Replication

Get the Full Script