Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Sentrix: An AI SRE Copilot That Debates Its Own Scaling Decisions
Cover image for Sentrix: An AI SRE Copilot That Debates Its Own Scaling Decisions

Sentrix: An AI SRE Copilot That Debates Its Own Scaling Decisions

1
Comments
2 min read
Why Your Chaos Experiments Are Probably Wasting Time (and How to Fix It)

Why Your Chaos Experiments Are Probably Wasting Time (and How to Fix It)

3
Comments 2
3 min read
Why AI SRE tools don't work (and what we're doing differently)
Cover image for Why AI SRE tools don't work (and what we're doing differently)

Why AI SRE tools don't work (and what we're doing differently)

4
Comments 2
4 min read
How a 2% Latency Spike Collapses a 20-Service System and How to Prevent It
Cover image for How a 2% Latency Spike Collapses a 20-Service System and How to Prevent It

How a 2% Latency Spike Collapses a 20-Service System and How to Prevent It

1
Comments
3 min read
Your Retry Config is Wrong (And So Was Mine)

Your Retry Config is Wrong (And So Was Mine)

1
Comments
8 min read
Linux Privileges:Peeling Back the Curtain Of How Linux Really Handles Users, Privileges, and Processes
Cover image for Linux Privileges:Peeling Back the Curtain Of How Linux Really Handles Users, Privileges, and Processes

Linux Privileges:Peeling Back the Curtain Of How Linux Really Handles Users, Privileges, and Processes

4
Comments
5 min read
Observability and Failure Recovery in Distributed Financial Systems: When Correct Systems Still Break
Cover image for Observability and Failure Recovery in Distributed Financial Systems: When Correct Systems Still Break

Observability and Failure Recovery in Distributed Financial Systems: When Correct Systems Still Break

1
Comments
5 min read
Throw a Prompt at your IDE and see it get done!

Throw a Prompt at your IDE and see it get done!

3
Comments
1 min read
Docker Monitoring Without a Platform: docker stats + cgroups (DevOps)
Cover image for Docker Monitoring Without a Platform: docker stats + cgroups (DevOps)

Docker Monitoring Without a Platform: docker stats + cgroups (DevOps)

Comments
3 min read
Introducing the Zen of DevOps
Cover image for Introducing the Zen of DevOps

Introducing the Zen of DevOps

16
Comments
7 min read
Systems That Don’t Gaslight You: Engineering for Clarity Under Failure

Systems That Don’t Gaslight You: Engineering for Clarity Under Failure

1
Comments
5 min read
Complexity Is a Liability (Until It Isn't)
Cover image for Complexity Is a Liability (Until It Isn't)

Complexity Is a Liability (Until It Isn't)

1
Comments
12 min read
Automation Scales Decisions, Not Understanding
Cover image for Automation Scales Decisions, Not Understanding

Automation Scales Decisions, Not Understanding

1
Comments
9 min read
Documentation That Works When Everything Breaks

Documentation That Works When Everything Breaks

1
Comments
5 min read
The Architecture Drift Nobody Measures
Cover image for The Architecture Drift Nobody Measures

The Architecture Drift Nobody Measures

2
Comments 2
9 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.