Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Towards More Effective Incident Postmortems
Cover image for Towards More Effective Incident Postmortems

Towards More Effective Incident Postmortems

2
Comments
10 min read
Site Reliability Engineering: Afrontando el riesgo y los desastres
Cover image for Site Reliability Engineering: Afrontando el riesgo y los desastres

Site Reliability Engineering: Afrontando el riesgo y los desastres

17
Comments
12 min read
Prometheus blackbox_exporter; Unconventional Way
Cover image for Prometheus blackbox_exporter; Unconventional Way

Prometheus blackbox_exporter; Unconventional Way

6
Comments
2 min read
Chaos Engineering  — How to safely inject failure?
Cover image for Chaos Engineering  — How to safely inject failure?

Chaos Engineering  — How to safely inject failure?

4
Comments
6 min read
Feelings during incident response
Cover image for Feelings during incident response

Feelings during incident response

23
Comments
3 min read
A Reading List & Repo List 📚 for Learning DevOps, SRE, and Automation(w/Python)

A Reading List & Repo List 📚 for Learning DevOps, SRE, and Automation(w/Python)

16
Comments 1
2 min read
Falando sobre SRE - Parte 01 - Uma breve introdução

Falando sobre SRE - Parte 01 - Uma breve introdução

8
Comments
7 min read
Chaos Engineering — What and who is a chaos engineer?
Cover image for Chaos Engineering — What and who is a chaos engineer?

Chaos Engineering — What and who is a chaos engineer?

16
Comments 2
4 min read
Why You Need A Microservice Catalog

Why You Need A Microservice Catalog

5
Comments
9 min read
Have there been more reliability incidents lately?

Have there been more reliability incidents lately?

16
Comments 14
1 min read
6 Responsibilities of a Devops Engineer

6 Responsibilities of a Devops Engineer

7
Comments
2 min read
Single Sign-On SSH: User Story
Cover image for Single Sign-On SSH: User Story

Single Sign-On SSH: User Story

3
Comments
2 min read
Retrying groups of tightly coupled tasks in Ansible

Retrying groups of tightly coupled tasks in Ansible

13
Comments 2
3 min read
Cleaning up Zookeeper Logs and Snapshots

Cleaning up Zookeeper Logs and Snapshots

8
Comments
1 min read
How does deployment work at your organization?

How does deployment work at your organization?

72
Comments 73
1 min read
Visualize Google Cloud Billing data in Grafana with BigQuery

Visualize Google Cloud Billing data in Grafana with BigQuery

5
Comments 2
2 min read
go apps + jaeger tracing

go apps + jaeger tracing

9
Comments 2
1 min read
April Fools and the Broken Promises of One-off Hacks
Cover image for April Fools and the Broken Promises of One-off Hacks

April Fools and the Broken Promises of One-off Hacks

129
Comments 8
4 min read
DevOps Engineer vs. SRE?

DevOps Engineer vs. SRE?

10
Comments 6
1 min read
Ask DEV: LightWeight APM for Kubernetes using OpenTelemetry?

Ask DEV: LightWeight APM for Kubernetes using OpenTelemetry?

5
Comments
2 min read
Dreams and Nightmares of Ops
Cover image for Dreams and Nightmares of Ops

Dreams and Nightmares of Ops

34
Comments 2
10 min read
Have you considered Site Reliability Engineering as a path?

Have you considered Site Reliability Engineering as a path?

66
Comments 12
1 min read
Towards Operational Excellence — Part 3
Cover image for Towards Operational Excellence — Part 3

Towards Operational Excellence — Part 3

7
Comments
11 min read
Towards Operational Excellence — Part 2
Cover image for Towards Operational Excellence — Part 2

Towards Operational Excellence — Part 2

7
Comments
11 min read
SRE in layman’s terms (4 core concepts)
Cover image for SRE in layman’s terms (4 core concepts)

SRE in layman’s terms (4 core concepts)

6
Comments
4 min read
loading...