Forem

Site Reliability Engineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Migrating Applications from VMs to K8s

Migrating Applications from VMs to K8s

9
Comments
3 min read
Como continuar a execução de um build do Jenkins quando um stage falha

Como continuar a execução de um build do Jenkins quando um stage falha

6
Comments
4 min read
Having On-call Nightmares? Runbooks can Help you Wake Up.

Having On-call Nightmares? Runbooks can Help you Wake Up.

7
Comments
5 min read
How to track your product's SLO/ErrorBudget: A simple tool to keep track of things!

How to track your product's SLO/ErrorBudget: A simple tool to keep track of things!

7
Comments
3 min read
Episode 3: To Boldly Debug

Episode 3: To Boldly Debug

3
Comments
1 min read
So you Want an SRE Tool. Do you Build, Buy, or Open Source?

So you Want an SRE Tool. Do you Build, Buy, or Open Source?

3
Comments
6 min read
Kubernetes Health Checks - 2 Ways to Improve Stability in Your Production Applications

Kubernetes Health Checks - 2 Ways to Improve Stability in Your Production Applications

9
Comments
10 min read
Infracost diff - "git diff" but for cloud costs

Infracost diff - "git diff" but for cloud costs

7
Comments
2 min read
How to: Pingdom super powered status sage

How to: Pingdom super powered status sage

2
Comments
3 min read
Performance Engineering - The Reliability Edition

Performance Engineering - The Reliability Edition

3
Comments
5 min read
It's all Chaos! And it Makes for Resilience at Scale

It's all Chaos! And it Makes for Resilience at Scale

4
Comments
4 min read
How to Build an SRE Team with a Growth Mindset

How to Build an SRE Team with a Growth Mindset

4
Comments
6 min read
How We Built and Use Runbook Documentation at Blameless

How We Built and Use Runbook Documentation at Blameless

16
Comments 2
5 min read
SigNoz : Open-source alternative to DataDog

SigNoz : Open-source alternative to DataDog

24
Comments 2
3 min read
Lessons from Slack, GCP and Snowflake outages

Lessons from Slack, GCP and Snowflake outages

4
Comments
3 min read
Deep Dive into Docker Internals - Union Filesystem

Deep Dive into Docker Internals - Union Filesystem

30
Comments
10 min read
SRE2AUX: How Flight Controllers were the first SREs

SRE2AUX: How Flight Controllers were the first SREs

3
Comments
20 min read
Overview of Incident Lifecycle in SRE

Overview of Incident Lifecycle in SRE

1
Comments
11 min read
My DevOps learning path

My DevOps learning path

3
Comments
5 min read
How do you wrap your head around observability?

How do you wrap your head around observability?

49
Comments 13
1 min read
Introduce Chaos Platform 2.0 for Azure

Introduce Chaos Platform 2.0 for Azure

7
Comments
2 min read
What Is Nix and Why You Should Use It

What Is Nix and Why You Should Use It

9
Comments
7 min read
Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

2
Comments
14 min read
Reliability as an Inseparable Part of Software Engineering

Reliability as an Inseparable Part of Software Engineering

3
Comments
5 min read
Getting Started as an SRE? Here are 3 Things You Need to Know.

Getting Started as an SRE? Here are 3 Things You Need to Know.

5
Comments
5 min read
loading...