Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Postmortem Framework

Postmortem Framework

Comments
4 min read
Platform Developer Portal

Platform Developer Portal

Comments
3 min read
Runbook Template Library

Runbook Template Library

Comments
3 min read
Chaos Engineering Toolkit

Chaos Engineering Toolkit

Comments
4 min read
The AI Incident Report Template I Actually Use for Wrong Answers and Tool Failures

The AI Incident Report Template I Actually Use for Wrong Answers and Tool Failures

5
Comments
3 min read
3am Incident Response: What I Learned from 200+ Pages
Cover image for 3am Incident Response: What I Learned from 200+ Pages

3am Incident Response: What I Learned from 200+ Pages

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries
Cover image for Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries
Cover image for Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Error Budgets in Practice: A No-BS Guide
Cover image for Error Budgets in Practice: A No-BS Guide

Error Budgets in Practice: A No-BS Guide

Comments
2 min read
12 DevOps Tools You Should Be Using in 2026 (SREs Included)
Cover image for 12 DevOps Tools You Should Be Using in 2026 (SREs Included)

12 DevOps Tools You Should Be Using in 2026 (SREs Included)

3
Comments
5 min read
The SRE's Guide to Surviving Tool Sprawl
Cover image for The SRE's Guide to Surviving Tool Sprawl

The SRE's Guide to Surviving Tool Sprawl

Comments
2 min read
Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất) (playbook thực chiến)

Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất) (playbook thực chiến)

Comments
3 min read
Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

Comments
15 min read
FinOps for SREs: Cutting Costs Without Breaking Things

FinOps for SREs: Cutting Costs Without Breaking Things

1
Comments
3 min read
Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Comments
13 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.