Forem

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Build C Projects Like a Pro: A Guide to Idiomatic Makefiles
Cover image for Build C Projects Like a Pro: A Guide to Idiomatic Makefiles

Build C Projects Like a Pro: A Guide to Idiomatic Makefiles

1
Comments 2
7 min read
Amazon API Gateway Observability Best Practices with Datadog
Cover image for Amazon API Gateway Observability Best Practices with Datadog

Amazon API Gateway Observability Best Practices with Datadog

1
Comments
4 min read
Chaos Engineering in Production: Building Resilient Systems with Chaos Mesh

Chaos Engineering in Production: Building Resilient Systems with Chaos Mesh

Comments
1 min read
HashiCorp Nomad vs. Kubernetes: Understanding the Workload Orchestrator with Practical Examples

HashiCorp Nomad vs. Kubernetes: Understanding the Workload Orchestrator with Practical Examples

Comments
1 min read
When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter
Cover image for When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter

When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter

2
Comments
11 min read
OpenTofu CI/CD Guide: How to Automate Infrastructure Changes with Confidence
Cover image for OpenTofu CI/CD Guide: How to Automate Infrastructure Changes with Confidence

OpenTofu CI/CD Guide: How to Automate Infrastructure Changes with Confidence

3
Comments
3 min read
Cost-Tracking and Model-Spend Monitoring with LiteLLM
Cover image for Cost-Tracking and Model-Spend Monitoring with LiteLLM

Cost-Tracking and Model-Spend Monitoring with LiteLLM

1
Comments
2 min read
Unleashing Resilience: 15+ Essential Chaos Engineering Tools for Robust Systems

Unleashing Resilience: 15+ Essential Chaos Engineering Tools for Robust Systems

Comments
6 min read
AI-Powered Kubernetes Debugging with Python and Ollama
Cover image for AI-Powered Kubernetes Debugging with Python and Ollama

AI-Powered Kubernetes Debugging with Python and Ollama

1
Comments
6 min read
Understanding `kube-system` in Kubernetes: A City Analogy You’ll Never Forget

Understanding `kube-system` in Kubernetes: A City Analogy You’ll Never Forget

6
Comments
2 min read
Top 15 Must-Have CI/CD Tools for DevOps & SRE Success

Top 15 Must-Have CI/CD Tools for DevOps & SRE Success

Comments
6 min read
Why Was My Localhost SSH Taking 3 Seconds? A Deep Dive.

Why Was My Localhost SSH Taking 3 Seconds? A Deep Dive.

Comments
4 min read
🚀 The Ultimate DevOps Emoji Glossary
Cover image for 🚀 The Ultimate DevOps Emoji Glossary

🚀 The Ultimate DevOps Emoji Glossary

1
Comments
2 min read
10 Essential Tips for Setting Up Monitoring for Your SaaS
Cover image for 10 Essential Tips for Setting Up Monitoring for Your SaaS

10 Essential Tips for Setting Up Monitoring for Your SaaS

Comments
5 min read
Kubernetes Node Management - Drain, Cordon and Uncordon

Kubernetes Node Management - Drain, Cordon and Uncordon

6
Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.