Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
What is performance engineering: A Gatling take

What is performance engineering: A Gatling take

Comments
8 min read
A practical guide to observability TCO and cost reduction

A practical guide to observability TCO and cost reduction

6
Comments
13 min read
The Lie of the Global Average: Why Taming Complex SLIs Requires Bucketing

The Lie of the Global Average: Why Taming Complex SLIs Requires Bucketing

Comments
6 min read
How AI-Powered Observability Actually Changes Life For CIOs
Cover image for How AI-Powered Observability Actually Changes Life For CIOs

How AI-Powered Observability Actually Changes Life For CIOs

Comments
5 min read
Reverse Proxy en Docker con Nginx y SSL automático
Cover image for Reverse Proxy en Docker con Nginx y SSL automático

Reverse Proxy en Docker con Nginx y SSL automático

Comments
7 min read
How to reduce on-call friction using AI Voice Agent
Cover image for How to reduce on-call friction using AI Voice Agent

How to reduce on-call friction using AI Voice Agent

Comments
4 min read
The Hidden Currency of Tech Leadership: The Resilience Loop

The Hidden Currency of Tech Leadership: The Resilience Loop

Comments
1 min read
Building an Air-gapped Hardened Kubernetes Cluster with Kubespray
Cover image for Building an Air-gapped Hardened Kubernetes Cluster with Kubespray

Building an Air-gapped Hardened Kubernetes Cluster with Kubespray

Comments
3 min read
The Samurai Server: Why "Heroic" Systems Always Die

The Samurai Server: Why "Heroic" Systems Always Die

Comments
4 min read
End-to-End DevSecOps Project (Movies Finder)
Cover image for End-to-End DevSecOps Project (Movies Finder)

End-to-End DevSecOps Project (Movies Finder)

Comments
2 min read
AWS Multi-Account Guardrails: A Complete Blueprint for Secure, Automated Cloud Governance
Cover image for AWS Multi-Account Guardrails: A Complete Blueprint for Secure, Automated Cloud Governance

AWS Multi-Account Guardrails: A Complete Blueprint for Secure, Automated Cloud Governance

Comments
9 min read
What Engineers Can Learn From the Cloudflare Outage (November 2025)

What Engineers Can Learn From the Cloudflare Outage (November 2025)

Comments
4 min read
EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations
Cover image for EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations

EKS Standard vs. EKS Auto Mode: The Evolutionary Leap in Kubernetes Operations

8
Comments
6 min read
Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

Comments
3 min read
HTTP/1.1 vs HTTP/2 vs HTTP/3 – Which One Are You Still Using in 2025?

HTTP/1.1 vs HTTP/2 vs HTTP/3 – Which One Are You Still Using in 2025?

1
Comments 1
3 min read
How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

Comments
3 min read
The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025
Cover image for The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

Comments
3 min read
Domain controller decommission SOP

Domain controller decommission SOP

Comments
4 min read
Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced
Cover image for Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Comments
4 min read
Map a Kubernetes cluster with one command
Cover image for Map a Kubernetes cluster with one command

Map a Kubernetes cluster with one command

Comments
1 min read
After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief
Cover image for After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

Comments
3 min read
Building AI SRE: Our journey
Cover image for Building AI SRE: Our journey

Building AI SRE: Our journey

Comments
4 min read
The Hidden Cost of Adding Just One More Feature
Cover image for The Hidden Cost of Adding Just One More Feature

The Hidden Cost of Adding Just One More Feature

1
Comments
5 min read
StatusGator Alternative in 2025: Why IT Managers Pick IsDown
Cover image for StatusGator Alternative in 2025: Why IT Managers Pick IsDown

StatusGator Alternative in 2025: Why IT Managers Pick IsDown

Comments
14 min read
Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Comments
2 min read
loading...