<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Osomudeya Zudonu</title>
    <description>The latest articles on Forem by Osomudeya Zudonu (@osomudeya_zudonu_7add6ca6).</description>
    <link>https://forem.com/osomudeya_zudonu_7add6ca6</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3858016%2F05393818-5666-4014-bb29-0ad1aadc99f5.jpg</url>
      <title>Forem: Osomudeya Zudonu</title>
      <link>https://forem.com/osomudeya_zudonu_7add6ca6</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/osomudeya_zudonu_7add6ca6"/>
    <language>en</language>
    <item>
      <title>6 Real Debugging Failures I Hit in My Homelab (And What They Taught Me)</title>
      <dc:creator>Osomudeya Zudonu</dc:creator>
      <pubDate>Thu, 02 Apr 2026 17:04:31 +0000</pubDate>
      <link>https://forem.com/osomudeya_zudonu_7add6ca6/6-real-debugging-failures-i-hit-in-my-homelab-and-what-they-taught-me-b02</link>
      <guid>https://forem.com/osomudeya_zudonu_7add6ca6/6-real-debugging-failures-i-hit-in-my-homelab-and-what-they-taught-me-b02</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywdwe8zyhhebkz79di43.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywdwe8zyhhebkz79di43.jpeg" alt="A developer working at a multi-monitor desk setup with code on screen"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first time a pod crashed in production, I ran kubectl logs and got nothing. It was empty, clean, and had no errors.&lt;/p&gt;

&lt;p&gt;I didn't know the container had already restarted, nor did I know about &lt;code&gt;--previous&lt;/code&gt;. I was staring at a blank screen while the app was down, and I had no idea why.&lt;/p&gt;

&lt;p&gt;The command exists, but I just didn't know to reach for it when it mattered. So I kept digging in the wrong place. I will restart pods, re-run requests, but nothing changes.&lt;/p&gt;

&lt;p&gt;Meanwhile, the actual error had already disappeared with the previous container.&lt;/p&gt;

&lt;p&gt;That's the part no tutorial really prepares you for. Not the command, but knowing when to use it.&lt;/p&gt;

&lt;p&gt;That only comes from things breaking in your own lab. From hitting errors, misreading them, and going back until it clicks.&lt;/p&gt;

&lt;p&gt;The more you build, the more it breaks. The more it breaks, the more you learn, if you slow down to understand what actually happened.&lt;/p&gt;

&lt;p&gt;This article walks through six of those failures. What they look like, how to debug them, and how to document them so the lesson sticks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Errors You'll Hit in Your Home Lab And What to Do When You Do
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Container keeps crashing, CrashLoopBackOff, OOMKilled, exit codes that tell you exactly what happened&lt;/li&gt;
&lt;li&gt;Your CI/CD pipeline reports success, but the app is down&lt;/li&gt;
&lt;li&gt;Terraform says "no changes," but your infra is out of sync&lt;/li&gt;
&lt;li&gt;Two services can't talk to each other, and you don't know why&lt;/li&gt;
&lt;li&gt;A Prometheus alert fires, and you need to trace it to a root cause&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each one comes with the exact commands, what you're looking for, and a five-line incident report template, so every error becomes an experience you can talk about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Building in a Home Lab Gives You Real Experience
&lt;/h2&gt;

&lt;p&gt;Tutorials show you what to do when everything works. Your home lab shows you what to do when it doesn't.&lt;/p&gt;

&lt;p&gt;When you build a real app in your lab, deploy it, wire up monitoring, and connect it to a database, things break. Containers crash, pipelines fail, services stop talking to each other, and Terraform and your actual infra fall out of sync.&lt;/p&gt;

&lt;p&gt;Most people hit these errors, Google the fix, copy-paste it, and move on. The error is gone. Nothing was learned.&lt;/p&gt;

&lt;p&gt;The engineers who get hired and trusted are the ones who stop when something breaks, read what the system was telling them, fix it based on what they found, and write down what happened.&lt;/p&gt;

&lt;p&gt;That's the difference between using a home lab and learning from one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need to Set This Up
&lt;/h2&gt;

&lt;p&gt;A laptop with 8GB RAM (16GB recommended) and 50GB free disk space. No cloud account. No Raspberry Pi.&lt;/p&gt;

&lt;p&gt;Never built a DevOps lab before? Start here first, it walks you through Docker, Kubernetes, Vagrant, Terraform, and Ansible from scratch on your laptop:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@osomudeyazudonu/how-to-set-up-a-devops-lab-on-your-laptop-without-spending-a-dime-7139349e0fb4?sk=4c61aad1961efd5a38b27dffa9fce0ea" rel="noopener noreferrer"&gt;→ How to Set Up a DevOps Lab on Your Laptop at Zero Cost.&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Set Up a DevOps Lab on Your Laptop (Without Spending a Dime)
&lt;/h3&gt;

&lt;p&gt;Come back when your lab is running.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1:&lt;/strong&gt; Install local Kubernetes. Pick one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;kind lightest, runs on any OS&lt;/li&gt;
&lt;li&gt;MicroK8scloser to production behavior, best on Ubuntu&lt;/li&gt;
&lt;li&gt;K3s good inside VMs
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# kind - fastest start&lt;/span&gt;
curl &lt;span class="nt"&gt;-Lo&lt;/span&gt; ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ./kind &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo mv&lt;/span&gt; ./kind /usr/local/bin/kind
kind create cluster &lt;span class="nt"&gt;--name&lt;/span&gt; devops-lab

kubectl get nodes
&lt;span class="c"&gt;# NAME                      STATUS   ROLES           AGE&lt;/span&gt;
&lt;span class="c"&gt;# devops-lab-control-plane  Ready    control-plane   30s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; Deploy a real app. You need something with actual moving parts, an API, a database, and services talking to each other. That's where real errors come from. A single hello-world container won't teach you anything.&lt;/p&gt;

&lt;p&gt;Clone this free repo and follow its setup steps to get it running in your cluster: → 🔗 &lt;a href="https://github.com/Osomudeya/DevOps-Home-Lab-2026-2027" rel="noopener noreferrer"&gt;DevOps Home-Lab 2026&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It builds a full multi-service app, API, Postgres, and Redis from Docker Compose through to Kubernetes.&lt;/p&gt;

&lt;p&gt;Follow the setup steps in the repo.&lt;/p&gt;

&lt;p&gt;Don't move to Step 3 until your pods are healthy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-A&lt;/span&gt;
&lt;span class="c"&gt;# Your app's pods should show STATUS: Running&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3:&lt;/strong&gt; Wire up observability. Install Prometheus and Grafana now that your app is deployed, and there's something real to monitor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm &lt;span class="nb"&gt;install &lt;/span&gt;kube-prometheus-stack prometheus-community/kube-prometheus-stack &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; monitoring &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;

&lt;span class="c"&gt;# Access Grafana&lt;/span&gt;
kubectl port-forward &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring svc/kube-prometheus-stack-grafana 3000:80
&lt;span class="c"&gt;# Login: admin / prom-operator&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open Grafana → "Kubernetes / Compute Resources / Pod" dashboard → confirm your app's pods are visible.&lt;/p&gt;

&lt;p&gt;That's your baseline. When something breaks as you build, this is where the evidence shows up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Turns Errors Into Experience
&lt;/h2&gt;

&lt;p&gt;Most people who build homelabs end up doing the same thing: installing tools, following a tutorial, and deleting the cluster, no experience gained.&lt;/p&gt;

&lt;p&gt;The ones who come out of it with real skills do something different every time something breaks. They follow this loop. Especially the last step.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Spot it:&lt;/strong&gt; something isn't working, let's say a pod won't start or a deploy failed, or two services can't reach each other. This is the beginning of a lesson.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read it:&lt;/strong&gt; before you Google anything, read what the system is telling you. Logs, events, metrics. The answer is almost always there. Your job is to learn how to see it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix it:&lt;/strong&gt; solve it from what you observed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document it:&lt;/strong&gt; write the incident report immediately, while it's fresh.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The incident report template. Copy this. Use it every time something breaks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCIDENT: [error type] - [service name]
ROOT CAUSE: [one sentence - what actually caused it]
DETECTION: [which command or metric showed you the problem]
FIX: [exactly what you did to resolve it]
LESSON: [what you now know that you didn't before]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every error you hit and document becomes an experience you can speak about. Not "I've read about CrashLoopBackOff." But: "I hit this at 11 pm, building my lab, here's what the logs showed, here's what fixed it."&lt;/p&gt;

&lt;p&gt;That's what interviewers are actually asking for.&lt;/p&gt;

&lt;h2&gt;
  
  
  6 Errors You'll Hit as You Build, and How to Debug Each One
&lt;/h2&gt;

&lt;p&gt;These are real errors from the DevOps Home-Lab 2026 repo - pulled from actual lab sessions, not invented for a tutorial. You will hit most of them.&lt;/p&gt;

&lt;p&gt;Throughout this section, replace &lt;code&gt;&amp;lt;your-deployment&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;pod-name&amp;gt;&lt;/code&gt; with your actual names. Run &lt;code&gt;kubectl get pods -A&lt;/code&gt; to see what's running.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Database Connection Refused on Startup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Docker Compose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt; &lt;code&gt;depends_on&lt;/code&gt; doesn't mean "wait until ready." It means "start after." Those are not the same thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it happens:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You run &lt;code&gt;docker compose up&lt;/code&gt;. Both containers start. But the backend throws an error immediately and dies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: connect ECONNREFUSED 127.0.0.1:5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; your backend container started before Postgres finished its initialization sequence. Docker Compose started them roughly in parallel, &lt;code&gt;depends_on&lt;/code&gt; only waits for the container to exist, not for the database inside it to be ready to accept connections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to read it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# What did the backend actually see?&lt;/span&gt;
docker logs &amp;lt;your-backend-container&amp;gt;

&lt;span class="c"&gt;# Did Postgres finish starting?&lt;/span&gt;
docker logs postgres

&lt;span class="c"&gt;# Look for: "database system is ready to accept connections"&lt;/span&gt;
&lt;span class="c"&gt;# Are both containers up?&lt;/span&gt;
docker ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How to fix it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Quick fix: restart the backend after Postgres is ready&lt;/span&gt;
docker restart &amp;lt;your-backend-container&amp;gt;

&lt;span class="c"&gt;# Permanent fix: add a healthcheck-based dependency in docker-compose.yml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add a healthcheck block to your Postgres service that runs &lt;code&gt;pg_isready&lt;/code&gt;. Docker Compose will wait for the health check to pass before starting dependent containers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write your incident report:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCIDENT: Backend container crashed on startup - ECONNREFUSED
ROOT CAUSE: Backend started before Postgres finished initializing.
  depends_on controls start order, not service readiness.
DETECTION: docker logs backend showed ECONNREFUSED to port 5432.
  docker logs postgres confirmed it hadn't finished booting yet.
FIX: Added healthcheck to postgres service. Set depends_on condition
  to service_healthy. Backend now waits until Postgres is ready.
LESSON: Always use healthchecks for stateful services.
  This same pattern applies in Kubernetes via readiness probes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Pods Stuck in Pending: Nothing Is Scheduling
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt; Pending means the scheduler wants to place the pod, but can't. &lt;code&gt;kubectl describe&lt;/code&gt; tells you exactly why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it happens:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You deploy your app to Kubernetes. The pods sit in Pending forever. No errors in the logs, because the container never started. This happens when your cluster doesn't have enough memory to satisfy the pod's resource requests. Running a full stack (Postgres, Redis, backend, frontend, Prometheus, Grafana) on a single-node cluster with Docker Desktop memory limited to 4GB will hit this fast.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-A&lt;/span&gt;
&lt;span class="c"&gt;# NAME                    READY   STATUS    RESTARTS&lt;/span&gt;
&lt;span class="c"&gt;# backend-7d9f-xk4m       0/1     Pending   0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How to read it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt;
&lt;span class="c"&gt;# Read the Events section at the bottom:&lt;/span&gt;
&lt;span class="c"&gt;# "0/1 nodes are available: 1 Insufficient memory."&lt;/span&gt;

&lt;span class="c"&gt;# What's the cluster actually using right now?&lt;/span&gt;
kubectl top nodes

&lt;span class="c"&gt;# Do nodes exist at all?&lt;/span&gt;
kubectl get nodes &lt;span class="nt"&gt;-o&lt;/span&gt; wide
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Events block in &lt;code&gt;kubectl describe&lt;/code&gt; is the most useful thing in Kubernetes for this error. It tells you exactly what the scheduler is thinking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to fix it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option A: recreate the cluster with more capacity&lt;/span&gt;
k3d cluster delete devops-lab
k3d cluster create devops-lab &lt;span class="nt"&gt;--agents&lt;/span&gt; 2

&lt;span class="c"&gt;# Option B: lower resource requests in your deployment YAML&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;128Mi"&lt;/span&gt;   &lt;span class="c1"&gt;# reduce from whatever you had&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50m"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Write your incident report:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCIDENT: Pods stuck in Pending - backend and frontend
ROOT CAUSE: Cluster had insufficient memory to schedule pods.
  Resource requests exceeded available node capacity.
DETECTION: kubectl describe pod showed "0/1 nodes available: Insufficient memory"
  in Events. kubectl top nodes confirmed node was at capacity.
FIX: Reduced memory requests in deployment YAML. Pods scheduled immediately.
LESSON: Kubernetes won't silently lower your resource requests.
  Pending forever = scheduler can't fit the pod. Describe it first.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. ImagePullBackOff: The Cluster Can't Find Your Local Image
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Kubernetes / k3d.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt; Building an image locally and deploying it to a cluster are two separate steps. The cluster has its own image context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it happens:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You build your Docker image locally, write a deployment YAML that references it, and apply it to your k3d cluster. The pod fails immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt;
&lt;span class="c"&gt;# NAME              READY   STATUS             RESTARTS&lt;/span&gt;
&lt;span class="c"&gt;# backend-abc123    0/1     ImagePullBackOff   0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; k3d runs inside Docker but has its own separate image registry. When you randocker build -t my-backend:latest .That image lives in Docker's local daemon; k3d nodes can't see it. The cluster tries to pull from Docker Hub, fails, and gives up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to read it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt;
&lt;span class="c"&gt;# Events will say:&lt;/span&gt;
&lt;span class="c"&gt;# "Failed to pull image: rpc error...&lt;/span&gt;
&lt;span class="c"&gt;#  repository does not exist or may require authentication"&lt;/span&gt;

&lt;span class="c"&gt;# The image IS here:&lt;/span&gt;
docker images | &lt;span class="nb"&gt;grep &lt;/span&gt;my-backend
&lt;span class="c"&gt;# But NOT here:&lt;/span&gt;
k3d image list devops-lab
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How to fix it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Import your local image into the k3d cluster&lt;/span&gt;
k3d image import my-backend:latest &lt;span class="nt"&gt;-c&lt;/span&gt; devops-lab

&lt;span class="c"&gt;# Also set imagePullPolicy: Never in your deployment YAML&lt;/span&gt;
&lt;span class="c"&gt;# so Kubernetes doesn't attempt to pull from Docker Hub&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-backend:latest&lt;/span&gt;
      &lt;span class="na"&gt;imagePullPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every time you rebuild the image, you need to re-import it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write your incident report:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCIDENT: ImagePullBackOff - backend deployment
ROOT CAUSE: Local Docker image not imported into k3d cluster.
  k3d has its own image context - it can't see Docker daemon images.
DETECTION: kubectl describe pod showed "repository does not exist" in Events.
  docker images confirmed image existed locally. k3d image list confirmed
  it was absent from the cluster.
FIX: Ran k3d image import. Set imagePullPolicy: Never in deployment YAML.
LESSON: Build → import → deploy. Every rebuild needs a re-import.
  Or set up a local registry to automate this.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Ingress Returns 404: Service Selector Mismatch
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Kubernetes / Ingress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt; 404 from an Ingress rarely means the Ingress is broken. It means the service behind it has no endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it happens:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You set up an Ingress. You visit the URL. NGINX returns a clean 404. The Ingress looks fine. The service exists. But something in the chain is broken, usually a label mismatch between the service selector and the pod labels, or a wrong port number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://yourapp.local
&lt;span class="c"&gt;# &amp;lt;html&amp;gt;404 Not Found&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How to read it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if the service actually has any endpoints&lt;/span&gt;
kubectl get endpoints &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt;
&lt;span class="c"&gt;# NAME       ENDPOINTS   AGE&lt;/span&gt;
&lt;span class="c"&gt;# frontend   &amp;lt;none&amp;gt;      5m   ← this is the problem&lt;/span&gt;

&lt;span class="c"&gt;# What labels are your pods actually using?&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt; &lt;span class="nt"&gt;--show-labels&lt;/span&gt;

&lt;span class="c"&gt;# What is the service selecting for?&lt;/span&gt;
kubectl describe svc &amp;lt;service-name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt;
&lt;span class="c"&gt;# Look at the Selector field&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zero endpoints means Kubernetes never connected the service to any pod. That's always a label mismatch or port mismatch, not an Ingress problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to fix it:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The service selector must exactly match the pod labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In your service:&lt;/span&gt;
&lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-frontend&lt;/span&gt;   &lt;span class="c1"&gt;# this must match exactly&lt;/span&gt;

&lt;span class="c1"&gt;# In your deployment pod template:&lt;/span&gt;
&lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-frontend&lt;/span&gt;   &lt;span class="c1"&gt;# must be identical - case, spelling, everything&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also verify &lt;code&gt;targetPort&lt;/code&gt; in the service matches the &lt;code&gt;containerPort&lt;/code&gt; in the deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write your incident report:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCIDENT: Ingress returning 404 - frontend unreachable
ROOT CAUSE: Service selector label didn't match pod labels.
  Service had zero endpoints - never connected to any pod.
DETECTION: kubectl get endpoints showed &amp;lt;none&amp;gt; for frontend service.
  kubectl get pods --show-labels revealed the label mismatch.
FIX: Updated service selector to match actual pod labels. Endpoints
  populated immediately. Ingress routed correctly.
LESSON: Check endpoints first, not the Ingress YAML. Zero endpoints
  = label selector or port mismatch, not an Ingress problem.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Grafana Shows "No Data": Metrics Not Reaching Prometheus
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Observability / Prometheus / Grafana&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt; "No data" in Grafana means something in the chain between your app and Grafana is broken. Walk it backwards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it happens:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You open Grafana. Your dashboards show "No data" on every panel. You didn't change anything recently. This happens when Prometheus isn't scraping your app, either because the app never exposed a &lt;code&gt;/metrics&lt;/code&gt; endpoint, or the ServiceMonitor is missing or misconfigured, or the label selectors don't match what Prometheus is watching for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to read it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Is Prometheus even trying to scrape your app?&lt;/span&gt;
kubectl port-forward svc/prometheus &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring 9090:9090
&lt;span class="c"&gt;# Open: http://localhost:9090/targets&lt;/span&gt;
&lt;span class="c"&gt;# Check the Status column - is your app listed? Is it UP or DOWN?&lt;/span&gt;

&lt;span class="c"&gt;# Step 2: Does your app actually expose metrics?&lt;/span&gt;
kubectl port-forward svc/&amp;lt;your-backend&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt; 3001:3001
curl http://localhost:3001/metrics
&lt;span class="c"&gt;# Should return Prometheus text format. If 404 - the endpoint isn't wired up.&lt;/span&gt;

&lt;span class="c"&gt;# Step 3: Does the ServiceMonitor exist?&lt;/span&gt;
kubectl get servicemonitor &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Walk the chain: Grafana → Prometheus data source → Prometheus targets → app &lt;code&gt;/metrics&lt;/code&gt; endpoint. The failure is always at one of those four links.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to fix it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Fix A: expose a /metrics endpoint in your backend (Node.js example)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;prom-client&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collectDefaultMetrics&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/metrics&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;register&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;register&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fix B: ensure ServiceMonitor label selector matches your app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-backend&lt;/span&gt;   &lt;span class="c1"&gt;# must match your pod labels&lt;/span&gt;
  &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;your-namespace&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Write your incident report:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCIDENT: Grafana showing "No data" - all panels blank
ROOT CAUSE: Backend had no /metrics endpoint. Prometheus had nothing to scrape.
DETECTION: Prometheus targets page showed app as absent. curl to /metrics
  returned 404 - endpoint was never implemented.
FIX: Added prom-client middleware to Express app. Exposed /metrics route.
  Prometheus began scraping within one scrape interval. Grafana populated.
LESSON: Grafana "No data" is not a Grafana problem. Walk the chain backwards.
  Prometheus targets page tells you exactly where the chain breaks.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. ERR_TOO_MANY_REDIRECTS: When Two Systems Both Try to Enforce HTTPS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Ingress / Cloudflare / ArgoCD&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt; Redirect loops happen when two layers both try to handle HTTPS. You fix it by deciding which layer owns TLS termination and disabling it everywhere else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it happens:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You expose your app through Cloudflare Tunnel. It works. Then you add ArgoCD or enable &lt;code&gt;ssl-redirect&lt;/code&gt; on your NGINX Ingress. Suddenly, your browser returns ERR_TOO_MANY_REDIRECTS on every request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's happening:&lt;/strong&gt; Cloudflare Tunnel terminates HTTPS at the edge and forwards plain HTTP into your cluster. NGINX Ingress sees HTTP traffic and immediately 301-redirects to HTTPS. Cloudflare sends that HTTPS back through the tunnel, where it gets redirected again. Infinite loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to read it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Is ssl-redirect enabled on the Ingress?&lt;/span&gt;
kubectl get ingress &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt; &lt;span class="nt"&gt;-o&lt;/span&gt; yaml | &lt;span class="nb"&gt;grep &lt;/span&gt;ssl

&lt;span class="c"&gt;# Is ArgoCD running with HTTPS enforcement?&lt;/span&gt;
kubectl get deployment argocd-server &lt;span class="nt"&gt;-n&lt;/span&gt; argocd &lt;span class="nt"&gt;-o&lt;/span&gt; yaml | &lt;span class="nb"&gt;grep &lt;/span&gt;insecure

&lt;span class="c"&gt;# Watch the redirect chain&lt;/span&gt;
curl &lt;span class="nt"&gt;-I&lt;/span&gt; http://yourapp.domain.com
&lt;span class="c"&gt;# You'll see 301 → 301 → 301 repeating&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How to fix it:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Rule: terminate TLS at ONE layer only. If Cloudflare handles TLS at the edge, everything inside the cluster runs plain HTTP.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fix A: disable ssl-redirect on the Ingress&lt;/span&gt;
&lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;nginx.ingress.kubernetes.io/ssl-redirect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;false"&lt;/span&gt;
  &lt;span class="na"&gt;nginx.ingress.kubernetes.io/force-ssl-redirect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;false"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fix B: run ArgoCD in insecure mode (behind Cloudflare, it's fine)&lt;/span&gt;
&lt;span class="c1"&gt;# In argocd-server deployment args:&lt;/span&gt;
&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--insecure"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Write your incident report:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCIDENT: ERR_TOO_MANY_REDIRECTS - app unreachable after adding Cloudflare Tunnel
ROOT CAUSE: Cloudflare terminates HTTPS at edge, forwards HTTP to cluster.
  NGINX Ingress had ssl-redirect enabled - redirected HTTP back to HTTPS.
  Cloudflare re-sent HTTPS, NGINX redirected again. Infinite loop.
DETECTION: curl -I showed 301 → 301 chain. kubectl get ingress yaml showed
  ssl-redirect: true. Cloudflare logs showed HTTP being forwarded.
FIX: Set ssl-redirect: false on Ingress. Cloudflare owns TLS. Cluster
  runs HTTP internally.
LESSON: Two systems enforcing HTTPS termination = redirect war.
  Decide one layer owns TLS. Disable it everywhere else.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Keep Going: Free Labs to Build On
&lt;/h2&gt;

&lt;p&gt;The six errors above are the ones you'll hit most often when you're starting. But there's a lot more to encounter. These free resources give you more environments to build in, and more things to break, debug, and learn from.&lt;/p&gt;

&lt;p&gt;The full structured path from foundations to cloud: 🔗 &lt;a href="https://github.com/Osomudeya/List-Of-DevOps-Projects" rel="noopener noreferrer"&gt;List of DevOps Projects&lt;/a&gt; - five phases, all free. Pick a specific problem or follow the whole thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Reference: When Something Breaks, Start Here
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Backend crashes on startup docker logs → Look for ECONNREFUSED A dependency isn't ready yet&lt;/li&gt;
&lt;li&gt;Pod stuck in Pending kubectl describe pod → Look for Events: "Insufficient memory" or "Insufficient CPU."&lt;/li&gt;
&lt;li&gt;ImagePullBackOff kubectl describe pod → Look for Events: "repository does not exist" image not imported&lt;/li&gt;
&lt;li&gt;Ingress returning 404 kubectl get endpoints -n → = label selector mismatch&lt;/li&gt;
&lt;li&gt;Grafana showing "No data." curl &lt;a href="http://localhost:/metrics" rel="noopener noreferrer"&gt;http://localhost:/metrics&lt;/a&gt; → 404 = app never exposed /metrics endpoint&lt;/li&gt;
&lt;li&gt;ERR_TOO_MANY_REDIRECTS kubectl get ingress -o yaml | grep ssl → ssl-redirect: true + Cloudflare = redirect loop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What You Have After All Six
&lt;/h2&gt;

&lt;p&gt;Most people will read this and move on, but please, build something in your lab. Hit one of these errors. Come back to the right section, read what the system is telling you, fix it, and write the incident report.&lt;/p&gt;

&lt;p&gt;Do that enough times, and the next time someone asks you to walk through a debugging incident, you won't be narrating something you read. You'll be recalling something you fixed.&lt;/p&gt;

&lt;p&gt;That's the difference.&lt;/p&gt;

&lt;p&gt;The six errors above are a starting point. If you want a structured system for working through failures you haven't seen before, STOP framework, intentional break scenarios, and production checklists, that's what &lt;a href="https://osomudeya.gumroad.com/l/jabzk" rel="noopener noreferrer"&gt;The Kubernetes Detective&lt;/a&gt; is built around.&lt;/p&gt;

&lt;p&gt;For building the full lab environment from scratch: &lt;a href="https://osomudeya.gumroad.com/l/BuildYourOwnDevOpsLab" rel="noopener noreferrer"&gt;Build Your Own DevOps Lab (V3.5)&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Let's connect on &lt;a href="https://www.linkedin.com/in/osomudeya-zudonu-17290b124/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every week, I share what I learned in my Newsletter. Case studies from real companies, the tactics that saved money, and the honest moments where everything broke.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://osomudeya.gumroad.com/subscribe" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt; if that sounds useful.&lt;/p&gt;

&lt;p&gt;Job hunting? Grab my &lt;a href="https://osomudeya.gumroad.com/l/free-resume-template" rel="noopener noreferrer"&gt;Free DevOps resume template&lt;/a&gt; that's helped 300+ people land interviews.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>homelab</category>
      <category>kubernetes</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
