Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Did OpenTelemetry deliver on its promise in 2023?

Did OpenTelemetry deliver on its promise in 2023?

Comments
9 min read
Telemetry Debt Is Not “Missing Logs” — It’s Missing Proof

Telemetry Debt Is Not “Missing Logs” — It’s Missing Proof

Comments
6 min read
The Old Guard vs. The New Way: Traditional Infrastructure Management vs. Modern DevOps

The Old Guard vs. The New Way: Traditional Infrastructure Management vs. Modern DevOps

Comments
5 min read
Kubernetes rollouts: promote on SLOs, not on "pods are Ready"
Cover image for Kubernetes rollouts: promote on SLOs, not on "pods are Ready"

Kubernetes rollouts: promote on SLOs, not on "pods are Ready"

1
Comments
2 min read
Terraform isn't Dying. But Platform Teams Are Done With It.
Cover image for Terraform isn't Dying. But Platform Teams Are Done With It.

Terraform isn't Dying. But Platform Teams Are Done With It.

2
Comments
9 min read
Epilogue — Toward Engineering with a Worldview

Epilogue — Toward Engineering with a Worldview

1
Comments
3 min read
Your AI Agent Is Available, Fast, and Making Terrible Decisions

Your AI Agent Is Available, Fast, and Making Terrible Decisions

1
Comments
6 min read
Hosted control plane: when it simplifies operations and when it adds complexity
Cover image for Hosted control plane: when it simplifies operations and when it adds complexity

Hosted control plane: when it simplifies operations and when it adds complexity

Comments
11 min read
Terraform Provisioners: The Most Misunderstood Feature in IaC

Terraform Provisioners: The Most Misunderstood Feature in IaC

1
Comments
3 min read
The Most Expensive Kubernetes Mistake: Memory Limits
Cover image for The Most Expensive Kubernetes Mistake: Memory Limits

The Most Expensive Kubernetes Mistake: Memory Limits

1
Comments 2
3 min read
How much torment can my little homelab take? Part 1.
Cover image for How much torment can my little homelab take? Part 1.

How much torment can my little homelab take? Part 1.

2
Comments
10 min read
Chaos by Design: Production Maintenance Drills on Kubernetes

Chaos by Design: Production Maintenance Drills on Kubernetes

2
Comments
5 min read
Chapter 6 — Sagas & Compensating Transactions: Building “Retryable Conversations”

Chapter 6 — Sagas & Compensating Transactions: Building “Retryable Conversations”

2
Comments
7 min read
Trust Is an Engineering Output: How Teams Earn Credibility When Systems Break

Trust Is an Engineering Output: How Teams Earn Credibility When Systems Break

2
Comments
5 min read
OpenTelemetry vs. Telegraf - Choosing the Right Monitoring Tool

OpenTelemetry vs. Telegraf - Choosing the Right Monitoring Tool

Comments
11 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.