Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

Rightsizing Kubernetes Requests with the In-Place Vertical Pod Autoscaler

2
Comments
3 min read
A Complete Production-Ready Checklist for Smooth, Safe Deployments
Cover image for A Complete Production-Ready Checklist for Smooth, Safe Deployments

A Complete Production-Ready Checklist for Smooth, Safe Deployments

1
Comments
1 min read
USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

USRE: Unifying DevOps, SRE, Security & Compliance for the Next Generation of SaaS

Comments
7 min read
Utility Sector Outage Prep with Load Tests
Cover image for Utility Sector Outage Prep with Load Tests

Utility Sector Outage Prep with Load Tests

Comments
8 min read
Bash Scripting for Non-Coders
Cover image for Bash Scripting for Non-Coders

Bash Scripting for Non-Coders

Comments
37 min read
Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Comments
2 min read
From Signals to Reliability: SLOs, Runbooks and Post-Mortems
Cover image for From Signals to Reliability: SLOs, Runbooks and Post-Mortems

From Signals to Reliability: SLOs, Runbooks and Post-Mortems

Comments
13 min read
The Lie of the Global Average: Why Taming Complex SLIs Requires Bucketing

The Lie of the Global Average: Why Taming Complex SLIs Requires Bucketing

2
Comments
6 min read
A practical guide to observability TCO and cost reduction

A practical guide to observability TCO and cost reduction

11
Comments
13 min read
EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations
Cover image for EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations

EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations

8
Comments
6 min read
🏗️ Building the Platform That Empowers Reliability by Design
Cover image for 🏗️ Building the Platform That Empowers Reliability by Design

🏗️ Building the Platform That Empowers Reliability by Design

Comments
3 min read
Modern CTO Podcast: The AI SRE Hype and How to Get it Right

Modern CTO Podcast: The AI SRE Hype and How to Get it Right

Comments
1 min read
How to reduce on-call friction using AI Voice Agent
Cover image for How to reduce on-call friction using AI Voice Agent

How to reduce on-call friction using AI Voice Agent

Comments 1
4 min read
Тулзы для работы с сотнями серверов

Тулзы для работы с сотнями серверов

Comments
1 min read
The Samurai Server: Why "Heroic" Systems Always Die

The Samurai Server: Why "Heroic" Systems Always Die

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.