Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Golden Signals: A Practical Implementation Guide
Cover image for The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
Beyond Logs: Implementing Tracing and Golden Signals for Distributed Systems

Beyond Logs: Implementing Tracing and Golden Signals for Distributed Systems

5
Comments
2 min read
⚔️ Kubernetes Civil War: When VPA Fights the Scheduler (And Your Pods Pay the Price)
Cover image for ⚔️ Kubernetes Civil War: When VPA Fights the Scheduler (And Your Pods Pay the Price)

⚔️ Kubernetes Civil War: When VPA Fights the Scheduler (And Your Pods Pay the Price)

Comments
6 min read
The Only Prometheus Metrics I Actually Alert On

The Only Prometheus Metrics I Actually Alert On

Comments
7 min read
đź§  The Hidden Brain of Kubernetes: How Pod Scheduling Really Works (And Why It's Smarter Than You Think)
Cover image for đź§  The Hidden Brain of Kubernetes: How Pod Scheduling Really Works (And Why It's Smarter Than You Think)

đź§  The Hidden Brain of Kubernetes: How Pod Scheduling Really Works (And Why It's Smarter Than You Think)

Comments
4 min read
Transitioning to SRE at FAANG: Strategic Interview Prep and Skill Alignment for Experienced Software Engineers

Transitioning to SRE at FAANG: Strategic Interview Prep and Skill Alignment for Experienced Software Engineers

Comments
15 min read
Your AI workload is not your infrastructure’s problem. Until it is.

Your AI workload is not your infrastructure’s problem. Until it is.

Comments
4 min read
Agent SRE — SLOs, Error Budgets, and Circuit Breakers for AI Agents

Agent SRE — SLOs, Error Budgets, and Circuit Breakers for AI Agents

Comments
5 min read
Disk Has Space But Can't Create Files? (Linux Inode Exhaustion)
Cover image for Disk Has Space But Can't Create Files? (Linux Inode Exhaustion)

Disk Has Space But Can't Create Files? (Linux Inode Exhaustion)

1
Comments
3 min read
Incident response / On-call: hardening & best practices cho secret rotation (triệu chứng nguyên nhân cách fix)

Incident response / On-call: hardening & best practices cho secret rotation (triệu chứng nguyên nhân cách fix)

Comments
3 min read
Why AI and Automation Are Not Always the Right Answer in DevOps
Cover image for Why AI and Automation Are Not Always the Right Answer in DevOps

Why AI and Automation Are Not Always the Right Answer in DevOps

Comments
3 min read
Your on-call engineer just got paged. Here's what happens to the postmortem.
Cover image for Your on-call engineer just got paged. Here's what happens to the postmortem.

Your on-call engineer just got paged. Here's what happens to the postmortem.

Comments
2 min read
Why On-Call Burnout Is an Onboarding Problem (and You Probably Don't See It)
Cover image for Why On-Call Burnout Is an Onboarding Problem (and You Probably Don't See It)

Why On-Call Burnout Is an Onboarding Problem (and You Probably Don't See It)

Comments
1 min read
How Architecture Leaves Fingerprints in Latency Data
Cover image for How Architecture Leaves Fingerprints in Latency Data

How Architecture Leaves Fingerprints in Latency Data

Comments
2 min read
Incident Management: Building Effective On-Call Rotations and Runbooks

Incident Management: Building Effective On-Call Rotations and Runbooks

Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.