Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Vibe Coding: From Hell to Heaven in One Insight

Vibe Coding: From Hell to Heaven in One Insight

1
Comments 1
3 min read
Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced
Cover image for Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Comments
4 min read
When AI Writes Your Code, DevOps Becomes the Last Line of Defense

When AI Writes Your Code, DevOps Becomes the Last Line of Defense

4
Comments
4 min read
Map a Kubernetes cluster with one command
Cover image for Map a Kubernetes cluster with one command

Map a Kubernetes cluster with one command

Comments
1 min read
AWS SRE's First Day with GCP: 7 Surprising Differences

AWS SRE's First Day with GCP: 7 Surprising Differences

Comments 3
6 min read
After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief
Cover image for After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

Comments
3 min read
The Hidden Cost of Adding Just One More Feature
Cover image for The Hidden Cost of Adding Just One More Feature

The Hidden Cost of Adding Just One More Feature

1
Comments
5 min read
Embracing AIOps: The Intelligent Evolution of DevOps in December 2025

Embracing AIOps: The Intelligent Evolution of DevOps in December 2025

5
Comments
2 min read
# From 400 Alerts/Night to 8: The SRE Playbook That Saved My Team’s Sanity

# From 400 Alerts/Night to 8: The SRE Playbook That Saved My Team’s Sanity

Comments
3 min read
USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

Comments
7 min read
A Complete Production-Ready Checklist for Smooth, Safe Deployments
Cover image for A Complete Production-Ready Checklist for Smooth, Safe Deployments

A Complete Production-Ready Checklist for Smooth, Safe Deployments

1
Comments
1 min read
Utility Sector Outage Prep with Load Tests
Cover image for Utility Sector Outage Prep with Load Tests

Utility Sector Outage Prep with Load Tests

Comments
8 min read
Bash Scripting for Non-Coders
Cover image for Bash Scripting for Non-Coders

Bash Scripting for Non-Coders

Comments
37 min read
Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Comments
2 min read
From Signals to Reliability: SLOs, Runbooks and Post-Mortems
Cover image for From Signals to Reliability: SLOs, Runbooks and Post-Mortems

From Signals to Reliability: SLOs, Runbooks and Post-Mortems

Comments
13 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.