Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
SREday SF 2025: Human Centered SRE In An AI World
Cover image for SREday SF 2025: Human Centered SRE In An AI World

SREday SF 2025: Human Centered SRE In An AI World

Comments
7 min read
Why S3, NFS, and EFS Are Not Block Storage

Why S3, NFS, and EFS Are Not Block Storage

Comments
2 min read
⚙️ 7 AI-Powered Prompts That Supercharge Your Terraform Workflow
Cover image for ⚙️ 7 AI-Powered Prompts That Supercharge Your Terraform Workflow

⚙️ 7 AI-Powered Prompts That Supercharge Your Terraform Workflow

1
Comments
3 min read
Your Observability Bill Just Hit $1M—Here's Why Telemetry Pipelines Aren't Optional Anymore
Cover image for Your Observability Bill Just Hit $1M—Here's Why Telemetry Pipelines Aren't Optional Anymore

Your Observability Bill Just Hit $1M—Here's Why Telemetry Pipelines Aren't Optional Anymore

3
Comments
2 min read
Crash Dumps in Linux Kernel & Application Deep Dive

Crash Dumps in Linux Kernel & Application Deep Dive

2
Comments
3 min read
Building a Modern Network Observability Stack: Combining Prometheus, Grafana, and Loki for Deep Insight
Cover image for Building a Modern Network Observability Stack: Combining Prometheus, Grafana, and Loki for Deep Insight

Building a Modern Network Observability Stack: Combining Prometheus, Grafana, and Loki for Deep Insight

Comments
6 min read
The Silent Co-Pilot: How AI is redefining the Network and the Network Engineer
Cover image for The Silent Co-Pilot: How AI is redefining the Network and the Network Engineer

The Silent Co-Pilot: How AI is redefining the Network and the Network Engineer

Comments
5 min read
VMware Snapshots Explained: Internals, Pitfalls, and Deep Dive into Base + Delta Mechanics

VMware Snapshots Explained: Internals, Pitfalls, and Deep Dive into Base + Delta Mechanics

6
Comments
4 min read
The Real State of Helm Chart Reliability (2025): Hidden Risks in 100+ Open‑Source Charts
Cover image for The Real State of Helm Chart Reliability (2025): Hidden Risks in 100+ Open‑Source Charts

The Real State of Helm Chart Reliability (2025): Hidden Risks in 100+ Open‑Source Charts

Comments
23 min read
Thoughts on SLA
Cover image for Thoughts on SLA

Thoughts on SLA

3
Comments
3 min read
Our Status Page Lied to Us: 7 Steps to Building a Communication Platform Customers Actually Trust
Cover image for Our Status Page Lied to Us: 7 Steps to Building a Communication Platform Customers Actually Trust

Our Status Page Lied to Us: 7 Steps to Building a Communication Platform Customers Actually Trust

2
Comments
9 min read
Stop Losing Launches to “Tiny Bugs”: 7 Engineering Principles Every PM Should Know
Cover image for Stop Losing Launches to “Tiny Bugs”: 7 Engineering Principles Every PM Should Know

Stop Losing Launches to “Tiny Bugs”: 7 Engineering Principles Every PM Should Know

Comments
2 min read
How to Become an SRE Engineer
Cover image for How to Become an SRE Engineer

How to Become an SRE Engineer

Comments
9 min read
The Cost of Confusing SRE, DevOps, and Platform Engineering
Cover image for The Cost of Confusing SRE, DevOps, and Platform Engineering

The Cost of Confusing SRE, DevOps, and Platform Engineering

Comments
4 min read
Constraints and creativity: Partial rollout feature without a server component
Cover image for Constraints and creativity: Partial rollout feature without a server component

Constraints and creativity: Partial rollout feature without a server component

Comments
3 min read
Implementing Graceful Shutdown in Go
Cover image for Implementing Graceful Shutdown in Go

Implementing Graceful Shutdown in Go

3
Comments
2 min read
The 3 Commands That Turn Chaos into Clarity in DevOps
Cover image for The 3 Commands That Turn Chaos into Clarity in DevOps

The 3 Commands That Turn Chaos into Clarity in DevOps

2
Comments
4 min read
How We Built AI That Prevents Cloud Incidents Before They Happen
Cover image for How We Built AI That Prevents Cloud Incidents Before They Happen

How We Built AI That Prevents Cloud Incidents Before They Happen

Comments
2 min read
Mastering LVM: From Basics to Advanced Migration, Backup & Recovery

Mastering LVM: From Basics to Advanced Migration, Backup & Recovery

1
Comments
6 min read
Microservices and the Myth of Fault Isolation
Cover image for Microservices and the Myth of Fault Isolation

Microservices and the Myth of Fault Isolation

Comments
3 min read
Importance of Graceful Shutdown in Kubernetes

Importance of Graceful Shutdown in Kubernetes

Comments
7 min read
The Hidden Cost of AI in SRE: Why Automation Hasn’t Fixed Burnout
Cover image for The Hidden Cost of AI in SRE: Why Automation Hasn’t Fixed Burnout

The Hidden Cost of AI in SRE: Why Automation Hasn’t Fixed Burnout

1
Comments
2 min read
The Merge Queue Scaling Problem Every Growing Team Hits

The Merge Queue Scaling Problem Every Growing Team Hits

Comments
1 min read
Breaking Things on Purpose: What I Learned from Netflix’s Chaos Monkey

Breaking Things on Purpose: What I Learned from Netflix’s Chaos Monkey

8
Comments 4
2 min read
🔐Raise your hand if you use SSH every day without actually knowing what it does. Yeah, me too😁 you’re definitely not alone.
Cover image for 🔐Raise your hand if you use SSH every day without actually knowing what it does. Yeah, me too😁 you’re definitely not alone.

🔐Raise your hand if you use SSH every day without actually knowing what it does. Yeah, me too😁 you’re definitely not alone.

9
Comments 3
4 min read
loading...