Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Como evitar problemas de "Zabbix poller processes more than 75% busy"

Como evitar problemas de "Zabbix poller processes more than 75% busy"

1
Comments
2 min read
AWS Cost Optimization: Periodic Deletion of ECR Container Images
Cover image for AWS Cost Optimization: Periodic Deletion of ECR Container Images

AWS Cost Optimization: Periodic Deletion of ECR Container Images

10
Comments
5 min read
How to transfer forked repository which original is private in GitHub

How to transfer forked repository which original is private in GitHub

Comments
2 min read
On-Call Cookbook
Cover image for On-Call Cookbook

On-Call Cookbook

1
Comments 1
3 min read
One Year of DevOps at Idus: Reflections and Learnings

One Year of DevOps at Idus: Reflections and Learnings

Comments
4 min read
O básico de mirror do Istio
Cover image for O básico de mirror do Istio

O básico de mirror do Istio

2
Comments 1
5 min read
AWS Cert Manager integration with Prometheus with Domain Name
Cover image for AWS Cert Manager integration with Prometheus with Domain Name

AWS Cert Manager integration with Prometheus with Domain Name

3
Comments
3 min read
Terraform Dynamic Blocks: Advanced Use Cases and Examples
Cover image for Terraform Dynamic Blocks: Advanced Use Cases and Examples

Terraform Dynamic Blocks: Advanced Use Cases and Examples

5
Comments
9 min read
How to Release a Service

How to Release a Service

Comments
2 min read
How to easily start Backstage

How to easily start Backstage

2
Comments
3 min read
Demystifying Service Level acronyms and Error Budgets

Demystifying Service Level acronyms and Error Budgets

Comments
9 min read
“Automating VPC Peering in AWS with Terraform”

“Automating VPC Peering in AWS with Terraform”

Comments
3 min read
What are SLI, SLO and SLA, and Why are they important in SRE?

What are SLI, SLO and SLA, and Why are they important in SRE?

Comments
3 min read
Kubernetest (on-prem) master node and worker node associations.

Kubernetest (on-prem) master node and worker node associations.

Comments
1 min read
SQLServer service status monitoring on Windows with Prometheu.

SQLServer service status monitoring on Windows with Prometheu.

Comments
1 min read
Amazon Forecast : Best Practices and Anti-Patterns implementing AIOps
Cover image for Amazon Forecast : Best Practices and Anti-Patterns implementing AIOps

Amazon Forecast : Best Practices and Anti-Patterns implementing AIOps

6
Comments
4 min read
How to delete all AWS resources using aws-nuke

How to delete all AWS resources using aws-nuke

6
Comments
2 min read
Definindo SLO - "Let Go!"
Cover image for Definindo SLO - "Let Go!"

Definindo SLO - "Let Go!"

2
Comments
2 min read
Executing bash script commands in a sub-shell to manage status code and output

Executing bash script commands in a sub-shell to manage status code and output

1
Comments
2 min read
Networking 101: Back to School
Cover image for Networking 101: Back to School

Networking 101: Back to School

4
Comments 1
6 min read
SRE vs DevOps vs SysAdmin
Cover image for SRE vs DevOps vs SysAdmin

SRE vs DevOps vs SysAdmin

2
Comments 1
3 min read
LLMs in Amazon Bedrock: Observability Maturity Model
Cover image for LLMs in Amazon Bedrock: Observability Maturity Model

LLMs in Amazon Bedrock: Observability Maturity Model

14
Comments
7 min read
On The Importance of End-to-End Monitoring for IoT
Cover image for On The Importance of End-to-End Monitoring for IoT

On The Importance of End-to-End Monitoring for IoT

2
Comments
2 min read
DevOps and SRE: A Collaborative Journey Towards Reliable Software Delivery

DevOps and SRE: A Collaborative Journey Towards Reliable Software Delivery

Comments
4 min read
Roles and Responsibilities Matrix

Roles and Responsibilities Matrix

2
Comments
5 min read
loading...