Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
SRE Fundamentals: Defining SLOs, SLIs, and Error Budgets That Actually Work

SRE Fundamentals: Defining SLOs, SLIs, and Error Budgets That Actually Work

Comments
2 min read
SFMC Monitoring Alert Fatigue: Signal vs Noise

SFMC Monitoring Alert Fatigue: Signal vs Noise

Comments
4 min read
ComunicaOps Parte 3.: Loops de Feedback

ComunicaOps Parte 3.: Loops de Feedback

Comments
3 min read
Why uptime and synthetic monitors still matter in the age of APM
Cover image for Why uptime and synthetic monitors still matter in the age of APM

Why uptime and synthetic monitors still matter in the age of APM

2
Comments
4 min read
I built "sysview" — a beautiful terminal system monitor for developers
Cover image for I built "sysview" — a beautiful terminal system monitor for developers

I built "sysview" — a beautiful terminal system monitor for developers

Comments
3 min read
The Midnight Incident: When Being On-Call Means Losing Sleep

The Midnight Incident: When Being On-Call Means Losing Sleep

Comments
2 min read
I built an AI that remembers every production incident. Here's what changed.

I built an AI that remembers every production incident. Here's what changed.

Comments 1
3 min read
S3 Is Starting to Feel Like a File System — But Not Quite

S3 Is Starting to Feel Like a File System — But Not Quite

1
Comments
2 min read
SRE vs DevOps: the sequencing mistake that burns most startups.
Cover image for SRE vs DevOps: the sequencing mistake that burns most startups.

SRE vs DevOps: the sequencing mistake that burns most startups.

Comments 1
3 min read
My First dev.to Post — And a 1-Evening SRE System That Changed Our On-Call

My First dev.to Post — And a 1-Evening SRE System That Changed Our On-Call

Comments
2 min read
Your Kubernetes backups are lying to you

Your Kubernetes backups are lying to you

Comments
4 min read
80% of GitHub Repos Still Use Static AWS Credentials in 2026

80% of GitHub Repos Still Use Static AWS Credentials in 2026

Comments
4 min read
How to Fixed a Kubernetes CrashLoopBackOff in Production

How to Fixed a Kubernetes CrashLoopBackOff in Production

1
Comments
2 min read
Incident response / On-call: timeouts — operational runbook (playbook thực chiến)

Incident response / On-call: timeouts — operational runbook (playbook thực chiến)

Comments
3 min read
From MVP to Production: Scaling a Speech AI Service
Cover image for From MVP to Production: Scaling a Speech AI Service

From MVP to Production: Scaling a Speech AI Service

Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.