Forem

Samson Tanimawo profile picture

Samson Tanimawo

Building the first Agentic SRE Platform. 100 AI agents that detect, investigate, and resolve incidents autonomously.

Location Houston Joined Joined on  Personal website https://novaaiops.com

Pronouns

He/Him/His

Kubernetes Network Policies: Lessons from Production Incidents
Cover image for Kubernetes Network Policies: Lessons from Production Incidents

Kubernetes Network Policies: Lessons from Production Incidents

Comments
4 min read
Reducing Toil: The Google SRE Book Applied to Startups
Cover image for Reducing Toil: The Google SRE Book Applied to Startups

Reducing Toil: The Google SRE Book Applied to Startups

Comments
4 min read
Incident Severity Levels: SEV-1 to SEV-5 Calibration
Cover image for Incident Severity Levels: SEV-1 to SEV-5 Calibration

Incident Severity Levels: SEV-1 to SEV-5 Calibration

Comments
4 min read
Memory Leak Detection in Long-Running Services
Cover image for Memory Leak Detection in Long-Running Services

Memory Leak Detection in Long-Running Services

Comments
3 min read
CI/CD Reliability: When Your Deploy Pipeline is Your SPOF
Cover image for CI/CD Reliability: When Your Deploy Pipeline is Your SPOF

CI/CD Reliability: When Your Deploy Pipeline is Your SPOF

Comments
3 min read
Multi-Region Failover: Lessons from Running It Hot
Cover image for Multi-Region Failover: Lessons from Running It Hot

Multi-Region Failover: Lessons from Running It Hot

Comments
3 min read
Multi-Region Failover: Lessons from Running It Hot
Cover image for Multi-Region Failover: Lessons from Running It Hot

Multi-Region Failover: Lessons from Running It Hot

Comments
3 min read
Disaster Recovery Drills That Actually Work
Cover image for Disaster Recovery Drills That Actually Work

Disaster Recovery Drills That Actually Work

Comments
3 min read
Disaster Recovery Drills That Actually Work
Cover image for Disaster Recovery Drills That Actually Work

Disaster Recovery Drills That Actually Work

Comments
3 min read
Feature Flags as a Reliability Tool, Not Just an A/B Platform
Cover image for Feature Flags as a Reliability Tool, Not Just an A/B Platform

Feature Flags as a Reliability Tool, Not Just an A/B Platform

Comments
3 min read
eBPF for SREs: Observability Without Agents
Cover image for eBPF for SREs: Observability Without Agents

eBPF for SREs: Observability Without Agents

Comments
3 min read
Observability as Code: Managing Dashboards and Alerts with Terraform
Cover image for Observability as Code: Managing Dashboards and Alerts with Terraform

Observability as Code: Managing Dashboards and Alerts with Terraform

Comments
2 min read
Service Level Objectives for Complex Microservices
Cover image for Service Level Objectives for Complex Microservices

Service Level Objectives for Complex Microservices

Comments
3 min read
Building a Culture of Reliability: Beyond the SRE Handbook
Cover image for Building a Culture of Reliability: Beyond the SRE Handbook

Building a Culture of Reliability: Beyond the SRE Handbook

Comments
3 min read
Debugging Kubernetes OOMKilled: A Step-by-Step Guide
Cover image for Debugging Kubernetes OOMKilled: A Step-by-Step Guide

Debugging Kubernetes OOMKilled: A Step-by-Step Guide

Comments
3 min read
Deployment Frequency: How We Went From Weekly to 20x/Day
Cover image for Deployment Frequency: How We Went From Weekly to 20x/Day

Deployment Frequency: How We Went From Weekly to 20x/Day

1
Comments
3 min read
Cost-Effective Observability: The 80/20 Stack for Startups
Cover image for Cost-Effective Observability: The 80/20 Stack for Startups

Cost-Effective Observability: The 80/20 Stack for Startups

Comments
3 min read
Incident Communication: The Status Page That Builds Trust
Cover image for Incident Communication: The Status Page That Builds Trust

Incident Communication: The Status Page That Builds Trust

Comments
3 min read
Load Testing in Production: How We Do It Safely
Cover image for Load Testing in Production: How We Do It Safely

Load Testing in Production: How We Do It Safely

Comments
3 min read
Effective On-Call Rotations: Lessons From Building Fair Schedules
Cover image for Effective On-Call Rotations: Lessons From Building Fair Schedules

Effective On-Call Rotations: Lessons From Building Fair Schedules

Comments
3 min read
GitOps for Infrastructure: How We Deploy With Zero SSH
Cover image for GitOps for Infrastructure: How We Deploy With Zero SSH

GitOps for Infrastructure: How We Deploy With Zero SSH

Comments
2 min read
Prometheus at Scale: Surviving the Cardinality Cliff
Cover image for Prometheus at Scale: Surviving the Cardinality Cliff

Prometheus at Scale: Surviving the Cardinality Cliff

Comments
2 min read
Database Reliability: The SRE Approach to Keeping Data Safe
Cover image for Database Reliability: The SRE Approach to Keeping Data Safe

Database Reliability: The SRE Approach to Keeping Data Safe

1
Comments
3 min read
Container Security for SREs: The Practical Checklist
Cover image for Container Security for SREs: The Practical Checklist

Container Security for SREs: The Practical Checklist

Comments
3 min read
The Incident Commander Role: Running Incidents Without Chaos
Cover image for The Incident Commander Role: Running Incidents Without Chaos

The Incident Commander Role: Running Incidents Without Chaos

1
Comments
2 min read
Terraform at Scale: Lessons from Managing 500+ Resources
Cover image for Terraform at Scale: Lessons from Managing 500+ Resources

Terraform at Scale: Lessons from Managing 500+ Resources

Comments
2 min read
Why Your Microservices Need Circuit Breakers (And How to Add Them)
Cover image for Why Your Microservices Need Circuit Breakers (And How to Add Them)

Why Your Microservices Need Circuit Breakers (And How to Add Them)

Comments
2 min read
The On-Call Handoff That Prevents Dropped Incidents
Cover image for The On-Call Handoff That Prevents Dropped Incidents

The On-Call Handoff That Prevents Dropped Incidents

Comments
2 min read
SLOs That Product Managers Actually Understand
Cover image for SLOs That Product Managers Actually Understand

SLOs That Product Managers Actually Understand

Comments
2 min read
MTTR Optimization: The 7 Levers That Actually Move the Needle
Cover image for MTTR Optimization: The 7 Levers That Actually Move the Needle

MTTR Optimization: The 7 Levers That Actually Move the Needle

Comments
3 min read
Service Maps: The Architectural Clarity Your Team Is Missing
Cover image for Service Maps: The Architectural Clarity Your Team Is Missing

Service Maps: The Architectural Clarity Your Team Is Missing

Comments
2 min read
AI in Incident Response: Hype vs. Reality in 2024
Cover image for AI in Incident Response: Hype vs. Reality in 2024

AI in Incident Response: Hype vs. Reality in 2024

Comments
3 min read
Monitoring Costs Are Out of Control — Here's How to Fix It
Cover image for Monitoring Costs Are Out of Control — Here's How to Fix It

Monitoring Costs Are Out of Control — Here's How to Fix It

Comments
2 min read
Hiring SREs: What I Look For After Interviewing 100+ Candidates
Cover image for Hiring SREs: What I Look For After Interviewing 100+ Candidates

Hiring SREs: What I Look For After Interviewing 100+ Candidates

Comments
3 min read
Log Management at Scale: How We Cut Costs 70% Without Losing Signal
Cover image for Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Comments
2 min read
Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%
Cover image for Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Comments 1
2 min read
Platform Engineering: Building an Internal Developer Platform That Teams Actually Use
Cover image for Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Comments
2 min read
Chaos Engineering for Teams That Aren't Netflix
Cover image for Chaos Engineering for Teams That Aren't Netflix

Chaos Engineering for Teams That Aren't Netflix

Comments
3 min read
Distributed Tracing: The Missing Piece of Your Observability Stack
Cover image for Distributed Tracing: The Missing Piece of Your Observability Stack

Distributed Tracing: The Missing Piece of Your Observability Stack

Comments
3 min read
The Golden Signals: A Practical Implementation Guide
Cover image for The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
The Golden Signals: A Practical Implementation Guide
Cover image for The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
The Golden Signals: A Practical Implementation Guide
Cover image for The Golden Signals: A Practical Implementation Guide

The Golden Signals: A Practical Implementation Guide

Comments
2 min read
Kubernetes Observability: What to Monitor and Why
Cover image for Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why
Cover image for Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why
Cover image for Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
Kubernetes Observability: What to Monitor and Why
Cover image for Kubernetes Observability: What to Monitor and Why

Kubernetes Observability: What to Monitor and Why

Comments
2 min read
On-Call Wellness: Protecting Your Engineers from Burnout
Cover image for On-Call Wellness: Protecting Your Engineers from Burnout

On-Call Wellness: Protecting Your Engineers from Burnout

Comments
2 min read
On-Call Wellness: Protecting Your Engineers from Burnout
Cover image for On-Call Wellness: Protecting Your Engineers from Burnout

On-Call Wellness: Protecting Your Engineers from Burnout

Comments
2 min read
Post-Mortem Best Practices That Actually Drive Change
Cover image for Post-Mortem Best Practices That Actually Drive Change

Post-Mortem Best Practices That Actually Drive Change

Comments
2 min read
Post-Mortem Best Practices That Actually Drive Change
Cover image for Post-Mortem Best Practices That Actually Drive Change

Post-Mortem Best Practices That Actually Drive Change

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries
Cover image for Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries
Cover image for Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries
Cover image for Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Runbook Automation: From 45-Minute Fixes to 90-Second Recoveries

Comments
2 min read
Error Budgets in Practice: A No-BS Guide
Cover image for Error Budgets in Practice: A No-BS Guide

Error Budgets in Practice: A No-BS Guide

Comments
2 min read
3am Incident Response: What I Learned from 200+ Pages
Cover image for 3am Incident Response: What I Learned from 200+ Pages

3am Incident Response: What I Learned from 200+ Pages

Comments
2 min read
The SRE's Guide to Surviving Tool Sprawl
Cover image for The SRE's Guide to Surviving Tool Sprawl

The SRE's Guide to Surviving Tool Sprawl

Comments
2 min read
I Reduced Our Alert Volume by 90%. Here's the Playbook
Cover image for I Reduced Our Alert Volume by 90%. Here's the Playbook

I Reduced Our Alert Volume by 90%. Here's the Playbook

Comments
2 min read
Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams
Cover image for Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Comments
3 min read
Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams
Cover image for Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Comments
3 min read
loading...