Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%
Cover image for Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Comments 1
2 min read
How We Handle SSL Certificate Expiration Alerts at Scale
Cover image for How We Handle SSL Certificate Expiration Alerts at Scale

How We Handle SSL Certificate Expiration Alerts at Scale

Comments
6 min read
Platform Engineering: Building an Internal Developer Platform That Teams Actually Use
Cover image for Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Comments
2 min read
This is what separates teams that scale from teams that survive:

This is what separates teams that scale from teams that survive:

1
Comments
1 min read
# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

Comments
5 min read
Chaos Engineering for Teams That Aren't Netflix
Cover image for Chaos Engineering for Teams That Aren't Netflix

Chaos Engineering for Teams That Aren't Netflix

Comments
3 min read
When should you use canary deployments?

When should you use canary deployments?

1
Comments
5 min read
Your AI Agent Doesn't Have a Feature Problem. It Has an On-Call Rotation Problem. published: true

Your AI Agent Doesn't Have a Feature Problem. It Has an On-Call Rotation Problem. published: true

1
Comments
5 min read
SFMC API Rate Limits: The Cascading Failure Pattern

SFMC API Rate Limits: The Cascading Failure Pattern

Comments
6 min read
Status pages, trust, and the limits of a green dashboard
Cover image for Status pages, trust, and the limits of a green dashboard

Status pages, trust, and the limits of a green dashboard

1
Comments
3 min read
Backpressure in document pipelines is an architecture problem first

Backpressure in document pipelines is an architecture problem first

Comments
2 min read
Designing Alerts That Matters using Amazon CloudWatch
Cover image for Designing Alerts That Matters using Amazon CloudWatch

Designing Alerts That Matters using Amazon CloudWatch

Comments
4 min read
Lab: next lab sre

Lab: next lab sre

Comments
6 min read
Why Your Kubernetes Pod Keeps Getting Killed — And It's Not an OOMKill

Why Your Kubernetes Pod Keeps Getting Killed — And It's Not an OOMKill

1
Comments
10 min read
How to Choose a European Dedicated Server: Tier III vs Tier II Data Centers Explained
Cover image for How to Choose a European Dedicated Server: Tier III vs Tier II Data Centers Explained

How to Choose a European Dedicated Server: Tier III vs Tier II Data Centers Explained

Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.