Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Pre-Flight Checklist: 9 Things to Analyze Before Cutting Any AWS Cost

The Pre-Flight Checklist: 9 Things to Analyze Before Cutting Any AWS Cost

Comments
14 min read
Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Comments
13 min read
AI-Powered Code Generation and Testing in .NET:

AI-Powered Code Generation and Testing in .NET:

Comments
15 min read
The 2026 "Google SRE" Interview: Why Senior Software Engineers Fail the NALSD Round
Cover image for The 2026 "Google SRE" Interview: Why Senior Software Engineers Fail the NALSD Round

The 2026 "Google SRE" Interview: Why Senior Software Engineers Fail the NALSD Round

1
Comments
2 min read
How to Audit Your Monitoring Stack (Before the Next Incident Does It for You)

How to Audit Your Monitoring Stack (Before the Next Incident Does It for You)

2
Comments
5 min read
I Reduced Our Alert Volume by 90%. Here's the Playbook
Cover image for I Reduced Our Alert Volume by 90%. Here's the Playbook

I Reduced Our Alert Volume by 90%. Here's the Playbook

Comments
2 min read
Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất)

Noisy alerts làm kiệt sức on-call: thiết kế alert theo SLO (ít nhưng chất)

1
Comments
3 min read
🚀 Python for SRE/DevOps: Building SDKs + Jenkins Automations

🚀 Python for SRE/DevOps: Building SDKs + Jenkins Automations

1
Comments
3 min read
What 99.9% Uptime Actually Means: 8.7 Hours of Downtime Per Year
Cover image for What 99.9% Uptime Actually Means: 8.7 Hours of Downtime Per Year

What 99.9% Uptime Actually Means: 8.7 Hours of Downtime Per Year

Comments
4 min read
Silent Failures: The Bug That Won't Page You

Silent Failures: The Bug That Won't Page You

1
Comments
3 min read
Infrastructure dilemma

Infrastructure dilemma

1
Comments
2 min read
Incident Debugging in Production Systems (Part 2)
Cover image for Incident Debugging in Production Systems (Part 2)

Incident Debugging in Production Systems (Part 2)

Comments
3 min read
Hành trình DevOps: 12 bài học giúp hệ thống ổn định hơn (và bạn bớt trực đêm)

Hành trình DevOps: 12 bài học giúp hệ thống ổn định hơn (và bạn bớt trực đêm)

1
Comments
5 min read
Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

Semantic Kernel for Enterprise AI: Architecting Production-Grade LLM Integration in .NET

1
Comments
15 min read
Our Production System Went Down at 2:13AM — Here’s Exactly What Happened

Our Production System Went Down at 2:13AM — Here’s Exactly What Happened

1
Comments
1 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.