Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Future of SRE: Why AI is the "Force Multiplier" Your Infrastructure Needs

The Future of SRE: Why AI is the "Force Multiplier" Your Infrastructure Needs

Comments
3 min read
CPU Limits in Kubernetes: Mostly Harmful, Occasionally Essential
Cover image for CPU Limits in Kubernetes: Mostly Harmful, Occasionally Essential

CPU Limits in Kubernetes: Mostly Harmful, Occasionally Essential

Comments
3 min read
Stop Guessing: Using Error Budgets to Drive Engineering Decisions

Stop Guessing: Using Error Budgets to Drive Engineering Decisions

Comments
1 min read
The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025
Cover image for The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

Comments
3 min read
Fixing Prometheus namespace monitoring
Cover image for Fixing Prometheus namespace monitoring

Fixing Prometheus namespace monitoring

Comments 1
2 min read
I Reverse-Engineered the Google SRE "NALS" Interview (Here is the Flowchart)
Cover image for I Reverse-Engineered the Google SRE "NALS" Interview (Here is the Flowchart)

I Reverse-Engineered the Google SRE "NALS" Interview (Here is the Flowchart)

Comments
4 min read
Vibe Coding: From Hell to Heaven in One Insight

Vibe Coding: From Hell to Heaven in One Insight

1
Comments 1
3 min read
Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced
Cover image for Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Comments
4 min read
When AI Writes Your Code, DevOps Becomes the Last Line of Defense

When AI Writes Your Code, DevOps Becomes the Last Line of Defense

4
Comments
4 min read
AWS SRE's First Day with GCP: 7 Surprising Differences

AWS SRE's First Day with GCP: 7 Surprising Differences

Comments 3
6 min read
After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief
Cover image for After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

Comments
3 min read
The Hidden Cost of Adding Just One More Feature
Cover image for The Hidden Cost of Adding Just One More Feature

The Hidden Cost of Adding Just One More Feature

1
Comments
5 min read
Introduction to System Design: A Beginner’s Guide

Introduction to System Design: A Beginner’s Guide

2
Comments
4 min read
Embracing AIOps: The Intelligent Evolution of DevOps in December 2025

Embracing AIOps: The Intelligent Evolution of DevOps in December 2025

5
Comments
2 min read
# From 400 Alerts/Night to 8: The SRE Playbook That Saved My Team’s Sanity

# From 400 Alerts/Night to 8: The SRE Playbook That Saved My Team’s Sanity

Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.