<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ilia Gusev</title>
    <description>The latest articles on Forem by Ilia Gusev (@persikbl).</description>
    <link>https://forem.com/persikbl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3723238%2F023f195c-a081-472f-9b8f-b68096ab1fe6.png</url>
      <title>Forem: Ilia Gusev</title>
      <link>https://forem.com/persikbl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/persikbl"/>
    <language>en</language>
    <item>
      <title>Signed Images, Runtime Watchtowers, and Why Docker Pull Is an Act of Faith</title>
      <dc:creator>Ilia Gusev</dc:creator>
      <pubDate>Thu, 19 Feb 2026 20:27:10 +0000</pubDate>
      <link>https://forem.com/persikbl/signed-images-runtime-watchtowers-and-why-docker-pull-is-an-act-of-faith-13il</link>
      <guid>https://forem.com/persikbl/signed-images-runtime-watchtowers-and-why-docker-pull-is-an-act-of-faith-13il</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://podostack.com/p/signed-images-runtime-watchtowers-docker-pull-act-of-faith" rel="noopener noreferrer"&gt;Podo Stack&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every time you run &lt;code&gt;docker pull&lt;/code&gt;, you're trusting that nobody tampered with that image between the build and your cluster. npm has signatures. Go modules have checksums. Docker images? Most of us just... hope for the best.&lt;/p&gt;

&lt;p&gt;This week: supply chain security. The trust chain from build to runtime, and how to stop flying blind.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern: Supply Chain Trust
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The problem is invisible
&lt;/h3&gt;

&lt;p&gt;SolarWinds. Codecov. ua-parser-js. The pattern is always the same: attackers compromise the build or distribution pipeline, inject malicious code, and it flows downstream into production. Nobody notices because the artifact &lt;em&gt;looks&lt;/em&gt; legitimate.&lt;/p&gt;

&lt;p&gt;Container images have the same blind spot. You pull &lt;code&gt;nginx:1.25&lt;/code&gt;, but how do you know it wasn't modified after the maintainer pushed it? You don't. Not unless you verify.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three layers of defense
&lt;/h3&gt;

&lt;p&gt;Good supply chain security works in layers - multiple checks, each catching what the previous one missed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9ntkrklkp9oufpijs5q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9ntkrklkp9oufpijs5q.png" alt="Three layers of defense" width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Build time - scan in CI.&lt;/strong&gt; Tools like Trivy or Grype scan your images for known CVEs before they leave the pipeline. If something has a critical vulnerability, the build fails. You hear about it before it reaches a registry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Registry - sign with cosign.&lt;/strong&gt; After building, sign the image with &lt;a href="https://docs.sigstore.dev/cosign/overview/" rel="noopener noreferrer"&gt;cosign&lt;/a&gt; from the Sigstore project. The signature proves who built it and that the content hasn't changed. Think of it like a wax seal on a letter - break the seal, and everyone knows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Admission - verify at the gate.&lt;/strong&gt; Kyverno's &lt;code&gt;verifyImages&lt;/code&gt; rule checks that every image entering your cluster has a valid signature. No signature? Rejected. This is the last line of defense.&lt;/p&gt;

&lt;p&gt;Each layer alone has gaps. Together, they're solid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.sigstore.dev/" rel="noopener noreferrer"&gt;Sigstore&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slsa.dev/" rel="noopener noreferrer"&gt;SLSA Framework&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Hidden Gem: Falco
&lt;/h2&gt;

&lt;p&gt;Your IDS watches network traffic. Falco watches syscalls. Different universe.&lt;/p&gt;

&lt;p&gt;Falco is a CNCF Graduated project - the highest maturity level - that does runtime threat detection. Not "scan and report later." Real-time, in the kernel, while your containers are running.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;Falco hooks into Linux syscalls via eBPF. Every file open, every network connection, every process spawn - Falco sees it. Then it runs your rules against that stream. A rule says "if a shell is spawned inside a container, that's suspicious." Falco fires an alert within milliseconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Terminal shell in container&lt;/span&gt;
  &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Detect a shell spawned in a container&lt;/span&gt;
  &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;spawned_process and container&lt;/span&gt;
    &lt;span class="s"&gt;and proc.name in (bash, sh, zsh)&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Shell spawned in container&lt;/span&gt;
    &lt;span class="s"&gt;(user=%user.name container=%container.name&lt;/span&gt;
     &lt;span class="s"&gt;shell=%proc.name parent=%proc.pname)&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WARNING&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches things that scanning never will. A clean image can still be exploited at runtime. A zero-day doesn't show up in CVE databases. But someone opening a reverse shell inside your nginx container? Falco catches that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgzgqwxxkpfwp56209bpn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgzgqwxxkpfwp56209bpn.png" alt="Falco" width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why eBPF matters here
&lt;/h3&gt;

&lt;p&gt;eBPF lets Falco run its detection logic inside the kernel without modifying the kernel itself. No kernel modules to maintain, no recompilation. It hooks into syscall entry/exit points and streams events to userspace where the rules engine evaluates them.&lt;/p&gt;

&lt;p&gt;The performance overhead is minimal - you're adding a few microseconds to syscall paths. For a security tool that watches everything in real time, that's a remarkable trade-off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://falco.org/" rel="noopener noreferrer"&gt;falco.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/projects/falco/" rel="noopener noreferrer"&gt;CNCF Falco&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Showdown: Distroless vs Alpine
&lt;/h2&gt;

&lt;p&gt;Two approaches to minimal images. Very different trade-offs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alpine (the small one)
&lt;/h3&gt;

&lt;p&gt;5MB base. Uses musl libc instead of glibc. Ships with &lt;code&gt;apk&lt;/code&gt; package manager. You can &lt;code&gt;sh&lt;/code&gt; into it, install debugging tools, poke around. About 260 packages in the base, which means roughly 150 CVEs per year to track. Small, but not empty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distroless (the empty one)
&lt;/h3&gt;

&lt;p&gt;No package manager. No shell. No &lt;code&gt;ls&lt;/code&gt;, no &lt;code&gt;cat&lt;/code&gt;, no nothing. Just your binary and the runtime it needs. Google maintains the base images. Result: about 5 CVEs per year. There's almost nothing to exploit because there's almost nothing there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji5icfff6dbp48rckch4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji5icfff6dbp48rckch4.png" alt="Distroless vs Alpine" width="800" height="646"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When to choose what
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Alpine&lt;/strong&gt; - you need a shell for debugging, your app depends on C libraries that assume glibc (watch for musl compatibility issues), or you're in early development and need to iterate fast. It's the pragmatic choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distroless&lt;/strong&gt; - production workloads where security matters. Your Go or Rust binary is statically compiled anyway. You don't need a shell in production - that's what &lt;code&gt;kubectl debug&lt;/code&gt; with ephemeral containers is for.&lt;/p&gt;

&lt;p&gt;Worth mentioning: &lt;a href="https://www.chainguard.dev/chainguard-images" rel="noopener noreferrer"&gt;Chainguard Images&lt;/a&gt; offer a middle ground. Distroless-style images with better CVE tracking and daily rebuilds. If you haven't checked them out, they're worth a look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/GoogleContainerTools/distroless" rel="noopener noreferrer"&gt;GoogleContainerTools/distroless&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hub.docker.com/_/alpine" rel="noopener noreferrer"&gt;Alpine Docker Hub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Policy: Verify Image Signatures (Kyverno + cosign)
&lt;/h2&gt;

&lt;p&gt;Unsigned image gets deployed. Maybe it's fine. Maybe someone swapped the layers in your registry. You'd never know.&lt;/p&gt;

&lt;p&gt;This Kyverno policy verifies cosign signatures before admitting any image. No valid signature, no admission.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kyverno.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;verify-image-signatures&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Verify Image Signatures&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Supply Chain Security&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validationFailureAction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enforce&lt;/span&gt;
  &lt;span class="na"&gt;webhookTimeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;verify-cosign-signature&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;any&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
    &lt;span class="na"&gt;verifyImages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;imageReferences&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ghcr.io/your-org/*"&lt;/span&gt;
      &lt;span class="na"&gt;attestors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;entries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;keyless&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;issuer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://token.actions.githubusercontent.com"&lt;/span&gt;
            &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://github.com/your-org/*"&lt;/span&gt;
            &lt;span class="na"&gt;rekor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://rekor.sigstore.dev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;verifyImages&lt;/code&gt; is a dedicated Kyverno rule type - not a generic &lt;code&gt;validate&lt;/code&gt; block. It understands OCI signatures natively.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;keyless&lt;/code&gt; configuration works with GitHub Actions' OIDC tokens. Your CI signs the image automatically, no private keys to manage.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rekor&lt;/code&gt; is Sigstore's transparency log. It provides an audit trail of every signature - who signed what and when.&lt;/li&gt;
&lt;li&gt;Start with &lt;code&gt;validationFailureAction: Audit&lt;/code&gt; first. Roll out to &lt;code&gt;Enforce&lt;/code&gt; once your signing pipeline is solid.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kyverno.io/docs/writing-policies/verify-images/" rel="noopener noreferrer"&gt;Kyverno: Verify Images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.sigstore.dev/cosign/overview/" rel="noopener noreferrer"&gt;cosign Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The One-Liner: Trivy Image Scan
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy image &lt;span class="nt"&gt;--severity&lt;/span&gt; CRITICAL nginx:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scans &lt;code&gt;nginx:latest&lt;/code&gt; for critical CVEs. No daemon, no config - Trivy is a single binary that downloads the vulnerability database on first run.&lt;/p&gt;

&lt;p&gt;This is layer 1 of the trust pattern above. Put it in your CI pipeline: &lt;code&gt;trivy image --exit-code 1 --severity CRITICAL your-image:tag&lt;/code&gt;. Build fails if anything critical shows up. Five minutes to set up, catches problems before they leave your laptop.&lt;/p&gt;

&lt;p&gt;Bookmark it. You'll use it more than you think.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;Trivy GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;How does your team handle image signing? Are you using cosign, Notary, or something else? I'd love to hear what's working - drop a comment below.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;For weekly Cloud Native tools that actually work in production, subscribe to &lt;a href="https://podostack.com" rel="noopener noreferrer"&gt;Podo Stack&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>security</category>
      <category>devops</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Golden Paths, Guardrails, and Why Every Platform Needs a Catalog</title>
      <dc:creator>Ilia Gusev</dc:creator>
      <pubDate>Wed, 11 Feb 2026 10:57:58 +0000</pubDate>
      <link>https://forem.com/persikbl/golden-paths-guardrails-and-why-every-platform-needs-a-catalog-48g7</link>
      <guid>https://forem.com/persikbl/golden-paths-guardrails-and-why-every-platform-needs-a-catalog-48g7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://podostack.com/p/guardrails-backstage-crossplane" rel="noopener noreferrer"&gt;Podo Stack&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The last few issues of this newsletter covered individual tools -- image pulling, autoscaling, eBPF networking. All useful on their own. But tools don't help much if your engineers can't find them, use them safely, or provision infrastructure without filing a ticket and waiting three days.&lt;/p&gt;

&lt;p&gt;This week we zoom out to the platform layer. The boring stuff that makes everything else work.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern: Platform Engineering Guardrails
&lt;/h2&gt;

&lt;p&gt;Here's something I see a lot. A team builds a shiny Internal Developer Platform. Self-service. Kubernetes. The works. Then they write a 50-page "Platform Usage Guide" and email it to all engineers.&lt;/p&gt;

&lt;p&gt;Nobody reads it. Someone deploys a public S3 bucket. Chaos.&lt;/p&gt;

&lt;p&gt;Documentation is not a guardrail. A guardrail is code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gates vs Guardrails
&lt;/h3&gt;

&lt;p&gt;Think of a highway. The old model is a tollbooth -- you stop, show your papers, wait for approval. That's a Change Advisory Board. It works, but it kills velocity.&lt;/p&gt;

&lt;p&gt;Guardrails are the barriers on the sides of the road. You drive at full speed. If you try to go off the edge, something stops you. No human in the loop.&lt;/p&gt;

&lt;p&gt;In practice, this means automated policies that either warn or block -- but never require manual approval when rules are followed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Layers of Defense
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nsrmd7wm3t4kd8awsw7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nsrmd7wm3t4kd8awsw7.png" alt="Three Layers of Defense" width="800" height="1392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Good guardrails exist at every stage of the delivery pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design time&lt;/strong&gt; -- your IDE flags that you're using a banned instance type. Fix it before it even hits Git.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deploy time&lt;/strong&gt; -- OPA or Conftest checks your manifests in CI. No memory limits? Pipeline fails with a clear message. You don't find out in production at 2 AM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Runtime&lt;/strong&gt; -- Kyverno or Gatekeeper intercepts the API call. Pod running as root? Rejected. The cluster itself says no.&lt;/p&gt;

&lt;p&gt;Each layer catches what the previous one missed. Defense in depth, but for platform safety.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start Soft
&lt;/h3&gt;

&lt;p&gt;One mistake I've made (and seen others repeat): going full enforcement on day one. Engineers feel like a robot is slapping their hands every time they push code. Morale drops. People start looking for workarounds.&lt;/p&gt;

&lt;p&gt;Better approach: start with 80% of guardrails in &lt;code&gt;Audit&lt;/code&gt; mode. Let people see the warnings, understand the rules, ask questions. Give them a couple of weeks. Then gradually flip to &lt;code&gt;Enforce&lt;/code&gt; -- starting with the policies that matter most (security, cost).&lt;/p&gt;

&lt;p&gt;You'll get buy-in instead of resentment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tag-app-delivery.cncf.io/whitepapers/platform-eng-maturity-model/" rel="noopener noreferrer"&gt;CNCF Platform Engineering Maturity Model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Unsexy Tool: Backstage Software Catalog
&lt;/h2&gt;

&lt;p&gt;Nobody gets excited about a catalog. There's no demo that makes the crowd gasp. But here's what happens without one: engineers Slack each other "who owns the payment service?" and nobody knows where the API docs live. Someone built a wiki page six months ago. It's already outdated.&lt;/p&gt;

&lt;p&gt;Backstage is a CNCF Incubating project, originally built at Spotify. It's been around since 2020. Not new, not flashy. But it solves the "where is everything?" problem better than anything else I've seen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs005gtyep28ebnv7k8n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs005gtyep28ebnv7k8n.png" alt="Backstage Software Catalog" width="800" height="921"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The catalog-info.yaml Trick
&lt;/h3&gt;

&lt;p&gt;The key idea is &lt;code&gt;catalog-info.yaml&lt;/code&gt; -- a small file that lives next to your code. Developers own it. Backstage auto-discovers it from your Git repos. Here's what it looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backstage.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Component&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payment-service&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github.com/project-slug&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;acme/payment-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service&lt;/span&gt;
  &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;team-alpha&lt;/span&gt;
  &lt;span class="na"&gt;lifecycle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
  &lt;span class="na"&gt;providesApis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;payments-api&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Now Backstage knows this service exists, who owns it, what APIs it exposes, and what it depends on. No separate documentation to maintain. The catalog stays accurate because it lives with the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Entity Model
&lt;/h3&gt;

&lt;p&gt;Backstage organizes everything into entities: Components (services, libraries), APIs, Resources (databases, queues), Groups (teams), and Users. They connect to each other through ownership and dependency relationships.&lt;/p&gt;

&lt;p&gt;A team owns a component. That component provides an API. It depends on a database resource. You can trace the full graph in the UI. When something breaks at 3 AM, you know exactly who to page.&lt;/p&gt;

&lt;h3&gt;
  
  
  Golden Paths via the Scaffolder
&lt;/h3&gt;

&lt;p&gt;Here's where it gets really useful. Backstage's Scaffolder lets you define templates for new services. Need a new microservice? Click a button, fill out a form, and get a repo with CI/CD pipeline, Dockerfile, monitoring dashboards, and &lt;code&gt;catalog-info.yaml&lt;/code&gt; -- all pre-configured. Three minutes instead of three days.&lt;/p&gt;

&lt;p&gt;The platform team controls the templates, not individual developers. Want to enforce a new security standard? Update the template. Every new service created from that point forward gets it automatically.&lt;/p&gt;

&lt;p&gt;That's a golden path. You're not blocking engineers from doing things their own way. You're just making the right way the easiest way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://backstage.io/" rel="noopener noreferrer"&gt;Backstage.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/projects/backstage/" rel="noopener noreferrer"&gt;CNCF Backstage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Showdown: Crossplane vs Terraform
&lt;/h2&gt;

&lt;p&gt;Both manage your cloud infrastructure. Completely different philosophies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fma2frafip2v93e8d7mku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fma2frafip2v93e8d7mku.png" alt="Crossplane vs Terraform" width="800" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform: The Standard
&lt;/h3&gt;

&lt;p&gt;You write HCL files. You run &lt;code&gt;terraform plan&lt;/code&gt;. You review the diff. You run &lt;code&gt;terraform apply&lt;/code&gt;. Done.&lt;/p&gt;

&lt;p&gt;It's simple, well-understood, and has providers for everything. But it's a one-shot operation. Between applies, nothing watches your infrastructure. Someone deletes a resource manually? Terraform doesn't know until your next &lt;code&gt;plan&lt;/code&gt;. That could be days. Or weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crossplane: The K8s-Native Approach
&lt;/h3&gt;

&lt;p&gt;Crossplane runs inside your cluster. You define a custom resource -- say, &lt;code&gt;PostgreSQLCluster&lt;/code&gt; -- and Crossplane's controllers continuously reconcile it against reality. Just like how Kubernetes reconciles Deployments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;platform.acme.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostgreSQLCluster&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;orders-db&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15"&lt;/span&gt;
  &lt;span class="na"&gt;storageGB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
  &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The developer doesn't know (or care) whether this creates an RDS instance, a Cloud SQL database, or something else. The platform team defines that mapping in a Composition. Developers get a simple API. Platform engineers keep control.&lt;/p&gt;

&lt;p&gt;And if someone manually deletes the RDS instance? Crossplane notices and recreates it. Automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Choose What
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Terraform&lt;/strong&gt; -- you have a small team, simple infrastructure, or you're early in your platform journey. It's proven and everyone knows it. Don't overcomplicate things if you don't need to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crossplane&lt;/strong&gt; -- you're building a self-service platform. You want developers to request infrastructure through Kubernetes APIs without filing tickets. You need continuous reconciliation, not just plan-apply.&lt;/p&gt;

&lt;p&gt;They're not competitors at the same maturity level. They're tools for different stages of the platform engineering journey. Plenty of teams use both -- Terraform for the foundational stuff, Crossplane for the self-service layer on top.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.crossplane.io/" rel="noopener noreferrer"&gt;Crossplane Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://registry.terraform.io/" rel="noopener noreferrer"&gt;Terraform Registry&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Policy: Require PodDisruptionBudget
&lt;/h2&gt;

&lt;p&gt;Node drain. Three replicas. No PDB. All pods evicted at once. Service down.&lt;/p&gt;

&lt;p&gt;I've seen this happen in production more times than I'd like to admit. It's one of those things that doesn't matter until it really, really matters.&lt;/p&gt;

&lt;p&gt;This Kyverno policy prevents exactly that. If your Deployment has more than one replica, it must have a matching PodDisruptionBudget. Otherwise, the API server rejects it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kyverno.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;require-pdb&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validationFailureAction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enforce&lt;/span&gt;
  &lt;span class="na"&gt;background&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;check-for-pdb&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;any&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;StatefulSet&lt;/span&gt;
    &lt;span class="na"&gt;preconditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;all&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;request.object.spec.replicas&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
        &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GreaterThan&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;matchingPDBs&lt;/span&gt;
      &lt;span class="na"&gt;apiCall&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;urlPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/apis/policy/v1/namespaces/{{request.object.metadata.namespace}}/poddisruptionbudgets"&lt;/span&gt;
        &lt;span class="na"&gt;jmesPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items[].spec.selector.matchLabels"&lt;/span&gt;
    &lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
        &lt;span class="s"&gt;Deployment with {{ request.object.spec.replicas }} replicas&lt;/span&gt;
        &lt;span class="s"&gt;requires a matching PodDisruptionBudget.&lt;/span&gt;
      &lt;span class="na"&gt;deny&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;conditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;all&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;request.object.spec.template.metadata.labels&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
            &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NotIn&lt;/span&gt;
            &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;matchingPDBs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;preconditions&lt;/code&gt; block skips single-replica deployments. You don't need a PDB for a singleton -- there's nothing to disrupt gracefully.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;apiCall&lt;/code&gt; context actually queries the cluster for existing PDBs in the namespace, then checks whether any of them match the deployment's labels.&lt;/li&gt;
&lt;li&gt;This is a runtime guardrail -- exactly what the first section of this article describes. No documentation needed. The cluster enforces it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're starting with Kyverno, set &lt;code&gt;validationFailureAction: Audit&lt;/code&gt; first. Let it report violations for a week. Then flip to &lt;code&gt;Enforce&lt;/code&gt; once you've helped teams add their PDBs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kyverno.io/docs/" rel="noopener noreferrer"&gt;Kyverno Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/tasks/run-application/configure-pdb/" rel="noopener noreferrer"&gt;K8s PDB Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The One-Liner: Check Kubernetes EOL
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://endoflife.date/api/kubernetes.json | jq &lt;span class="s1"&gt;'.[0]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Platform teams have to track version support. &lt;a href="https://endoflife.date" rel="noopener noreferrer"&gt;endoflife.date&lt;/a&gt; aggregates EOL data for hundreds of products -- it's a free API, no authentication needed. This command shows the latest Kubernetes release with its support dates, end-of-life timeline, and whether it's still getting patches.&lt;/p&gt;

&lt;p&gt;Useful for audits, upgrade planning, or just settling the "should we upgrade yet?" debate in Slack.&lt;/p&gt;

&lt;p&gt;Bookmark the API. It covers &lt;a href="https://endoflife.date/" rel="noopener noreferrer"&gt;everything&lt;/a&gt; -- Node.js, PostgreSQL, Ubuntu, Go, Python, you name it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What does your platform layer look like? Are you using Backstage, Crossplane, or something else entirely? I'd love to hear what's working (and what isn't) -- drop a comment below.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;If you found this useful, consider subscribing to &lt;a href="https://podostack.com" rel="noopener noreferrer"&gt;Podo Stack&lt;/a&gt; - weekly curation of Cloud Native tools ripe for production.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>platformengineering</category>
      <category>devops</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Lazy Pull, Smart Scale, eBPF Network</title>
      <dc:creator>Ilia Gusev</dc:creator>
      <pubDate>Thu, 05 Feb 2026 11:38:14 +0000</pubDate>
      <link>https://forem.com/persikbl/lazy-pull-smart-scale-ebpf-network-33kl</link>
      <guid>https://forem.com/persikbl/lazy-pull-smart-scale-ebpf-network-33kl</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post was originally published on &lt;a href="https://podostack.substack.com" rel="noopener noreferrer"&gt;Podo Stack&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Welcome back to Podo Stack. This week: how infrastructure deals with scale. Three layers of optimization — images, nodes, network. Each one solves a problem you've probably hit.&lt;/p&gt;

&lt;p&gt;Here's what's good this week.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ The Pattern: Lazy Image Pulling with Stargz
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The problem
&lt;/h3&gt;

&lt;p&gt;You're scaling up. 50 new pods need to start. Every node pulls the same 2GB image. At the same time. Your registry groans. Your NAT gateway bill spikes. Containers sit there waiting instead of running.&lt;/p&gt;

&lt;p&gt;Here's the kicker: research shows your app uses about 6% of the files in that image at startup. The other 94%? Downloaded "just in case."&lt;/p&gt;

&lt;h3&gt;
  
  
  The solution
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj8b0h0sw5suwlf4phja.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj8b0h0sw5suwlf4phja.png" alt="Stargz" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stargz Snapshotter flips the model. Instead of "download everything, then run" — it's "run now, download what you need."&lt;/p&gt;

&lt;p&gt;The trick is a format called eStargz (extended seekable tar.gz). Normal tar.gz archives are sequential — to read a file at the end, you unpack the whole thing. eStargz adds a TOC (Table of Contents) at the start. Now you can jump directly to any file.&lt;/p&gt;

&lt;p&gt;When your container starts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Snapshotter fetches the TOC (kilobytes, not gigabytes)&lt;/li&gt;
&lt;li&gt;Mounts the image via FUSE&lt;/li&gt;
&lt;li&gt;Container starts immediately&lt;/li&gt;
&lt;li&gt;Files get fetched on-demand via HTTP Range requests&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The container is running while the image is still "downloading." Wild, right?&lt;/p&gt;

&lt;h3&gt;
  
  
  How this connects to Spegel
&lt;/h3&gt;

&lt;p&gt;In &lt;a href="https://podostack.substack.com/p/spegel-pixie-and-why-latest-is-evil" rel="noopener noreferrer"&gt;Issue #1&lt;/a&gt;, we covered Spegel — P2P caching that shares images across nodes. Stargz takes a different approach: instead of optimizing &lt;em&gt;distribution&lt;/em&gt;, it optimizes &lt;em&gt;what gets downloaded&lt;/em&gt; in the first place.&lt;/p&gt;

&lt;p&gt;They're complementary. Spegel says "pull once, share everywhere." Stargz says "only pull what you need." Use both and your image pull times will thank you.&lt;/p&gt;

&lt;h3&gt;
  
  
  The catch
&lt;/h3&gt;

&lt;p&gt;FUSE runs in userspace, so there's some overhead for I/O-heavy workloads. Databases probably shouldn't use this. But for your typical microservice that loads a few MB at startup? Perfect fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/containerd/stargz-snapshotter" rel="noopener noreferrer"&gt;GitHub: containerd/stargz-snapshotter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/containerd/stargz-snapshotter/blob/main/docs/estargz.md" rel="noopener noreferrer"&gt;eStargz Design Doc&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚔️ The Showdown: Karpenter vs Cluster Autoscaler
&lt;/h2&gt;

&lt;p&gt;Two autoscalers. Same job. Very different approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cluster Autoscaler (the veteran)
&lt;/h3&gt;

&lt;p&gt;CA has been around forever. It works through Node Groups (ASGs in AWS, MIGs in GCP). When pods are pending:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CA checks which Node Group could fit them&lt;/li&gt;
&lt;li&gt;Bumps the desired count on that ASG&lt;/li&gt;
&lt;li&gt;Cloud provider spins up a new node from the template&lt;/li&gt;
&lt;li&gt;Node joins cluster, scheduler places pods&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Time from pending to running: &lt;strong&gt;minutes&lt;/strong&gt;. And you're stuck with whatever instance types you pre-defined in your node groups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Karpenter (the new approach)
&lt;/h3&gt;

&lt;p&gt;Karpenter skips node groups entirely. When pods are pending:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Karpenter reads their requirements — CPU, memory, affinity, tolerations&lt;/li&gt;
&lt;li&gt;Calls the cloud API directly (EC2 Fleet in AWS)&lt;/li&gt;
&lt;li&gt;Provisions a node that &lt;em&gt;exactly&lt;/em&gt; fits what's waiting&lt;/li&gt;
&lt;li&gt;Node joins, pods run&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Time from pending to running: &lt;strong&gt;seconds&lt;/strong&gt;. And it picks the cheapest instance type that works.&lt;/p&gt;

&lt;h3&gt;
  
  
  The comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cluster Autoscaler:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model: Node Groups (ASG)&lt;/li&gt;
&lt;li&gt;Speed: Minutes&lt;/li&gt;
&lt;li&gt;Sizing: Fixed templates&lt;/li&gt;
&lt;li&gt;Cost: Often over-provisioned&lt;/li&gt;
&lt;li&gt;Consolidation: Basic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Karpenter:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model: Group-less&lt;/li&gt;
&lt;li&gt;Speed: Seconds&lt;/li&gt;
&lt;li&gt;Sizing: Right-sized&lt;/li&gt;
&lt;li&gt;Cost: Optimized&lt;/li&gt;
&lt;li&gt;Consolidation: Active&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Karpenter also does active consolidation. It constantly checks: "Can I replace these three half-empty nodes with one smaller node?" If yes, it does.&lt;/p&gt;

&lt;h3&gt;
  
  
  The verdict
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;New cluster?&lt;/strong&gt; Go Karpenter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Already running CA successfully?&lt;/strong&gt; Maybe keep it. Migration has costs. If it's not broken, weigh carefully.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://karpenter.sh/" rel="noopener noreferrer"&gt;Karpenter Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md" rel="noopener noreferrer"&gt;Cluster Autoscaler FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔬 The eBPF Trace: Cilium Replaces kube-proxy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The problem
&lt;/h3&gt;

&lt;p&gt;kube-proxy uses iptables. Every service creates rules. Every rule gets checked sequentially.&lt;/p&gt;

&lt;p&gt;1000 services = thousands of iptables rules. Every packet walks the chain. O(n) lookup. In 2025. In your kernel.&lt;/p&gt;

&lt;p&gt;At scale, this hurts. CPU spikes during rule updates. Latency creeps up. Source IPs get lost in the NAT maze.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Cilium replaces all of this with eBPF. Instead of iptables chains, it uses hash map lookups — O(1), regardless of how many services you have.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm &lt;span class="nb"&gt;install &lt;/span&gt;cilium cilium/cilium &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;kubeProxyReplacement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One flag. That's it.&lt;/p&gt;

&lt;p&gt;eBPF programs intercept packets before they hit the iptables stack, do a single hash lookup, and route directly. Faster, simpler, and source IPs stay intact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real numbers
&lt;/h3&gt;

&lt;p&gt;I've seen clusters go from 2-3ms service latency to sub-millisecond after switching. CPU usage during endpoint updates dropped significantly. The larger your cluster, the bigger the difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/" rel="noopener noreferrer"&gt;Cilium Docs: kube-proxy Replacement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ebpf.io/" rel="noopener noreferrer"&gt;eBPF.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔥 The Hot Take: eBPF is Eating Kubernetes
&lt;/h2&gt;

&lt;p&gt;Look around. The data plane is being rewritten:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;kube-proxy&lt;/strong&gt; → Cilium eBPF&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service mesh sidecars&lt;/strong&gt; → Cilium, Istio Ambient (ztunnel)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; → Pixie, Tetragon&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; → Falco, Tracee&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is clear. Userspace proxies are getting replaced by kernel-level programs.&lt;/p&gt;

&lt;p&gt;My hot take: In 3 years, half of the Kubernetes data plane will run on eBPF. The kernel is the new platform.&lt;/p&gt;

&lt;p&gt;Agree? Disagree? Reply and tell me I'm wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ The One-Liner: Karpenter Drift Detection
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get nodeclaims &lt;span class="nt"&gt;-o&lt;/span&gt; custom-columns&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'NAME:.metadata.name,DRIFT:.status.conditions[?(@.type=="Drifted")].status'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Karpenter tracks "drift" — when a node no longer matches its NodePool spec. Maybe the AMI updated. Maybe requirements changed.&lt;/p&gt;

&lt;p&gt;This command shows which nodes are marked as drifted and will be replaced during the next consolidation cycle.&lt;/p&gt;

&lt;p&gt;Pairs nicely with the Showdown section above. Once you're on Karpenter, this becomes part of your daily toolkit.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you're not on Karpenter yet
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl top nodes &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cpu | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shows your most loaded nodes. Useful for spotting where scaling is needed.&lt;/p&gt;




&lt;p&gt;Questions? Feedback? Reply to this email. I read every one.&lt;/p&gt;




&lt;p&gt;🍇 &lt;strong&gt;Podo Stack&lt;/strong&gt; — Ripe for Prod.&lt;/p&gt;

</description>
      <category>karpenter</category>
      <category>kubernetes</category>
      <category>cilium</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Sidecar-Free Mesh, SLO from YAML, and Labels as Contracts</title>
      <dc:creator>Ilia Gusev</dc:creator>
      <pubDate>Wed, 28 Jan 2026 09:20:00 +0000</pubDate>
      <link>https://forem.com/persikbl/sidecar-free-mesh-slo-from-yaml-and-labels-as-contracts-4lck</link>
      <guid>https://forem.com/persikbl/sidecar-free-mesh-slo-from-yaml-and-labels-as-contracts-4lck</guid>
      <description>&lt;p&gt;Welcome back to Podo Stack. This week we're looking at how Istio finally killed the sidecar tax, a tool that turns SLO monitoring into a one-liner, and a policy that'll save your platform team from label chaos.&lt;/p&gt;

&lt;p&gt;Here's what's good this week.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This post was originally published on &lt;a href="https://podostack.substack.com" rel="noopener noreferrer"&gt;Podo Stack&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🚀 Sandbox Watch: Istio Ambient Mesh
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4xtgkoa8gshus93ntt2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4xtgkoa8gshus93ntt2.png" alt="Istio Ambient Mesh flow" width="800" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What it is
&lt;/h3&gt;

&lt;p&gt;Service mesh without sidecars. Istio Ambient hit GA in version 1.24, and it's not just a minor tweak — it's a completely different architecture.&lt;/p&gt;

&lt;p&gt;Here's the problem with sidecars. Every pod gets an Envoy proxy injected. Run 100 pods, you're running 100 Envoys. Each one eats 50-100MB of RAM. Each one adds startup latency — your app waits for the sidecar to be ready before it can receive traffic. Scale to thousands of pods and you're burning serious resources on proxy overhead.&lt;/p&gt;

&lt;p&gt;Ambient flips this model.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;Two layers instead of one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ztunnel&lt;/strong&gt; — A lightweight L4 proxy that runs as a DaemonSet, one per node. It handles mTLS, basic routing, and telemetry. Most traffic never needs more than this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Waypoint proxy&lt;/strong&gt; — An L7 proxy that only spins up when you need HTTP-level features like header routing, retries, or traffic mirroring. It's on-demand. Don't need L7? Don't pay for it.&lt;/p&gt;

&lt;p&gt;Think of it as "service mesh à la carte." You get the security baseline everywhere (ztunnel), and you add the fancy features only where they matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I like it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Memory overhead drops from ~100MB per pod to ~20MB per node&lt;/li&gt;
&lt;li&gt;No more sidecar injection drama — pods start faster&lt;/li&gt;
&lt;li&gt;Incremental migration: some namespaces on sidecar, some on ambient, same control plane&lt;/li&gt;
&lt;li&gt;mTLS everywhere by default, no config needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to use it
&lt;/h3&gt;

&lt;p&gt;You're running a large cluster. You're tired of the sidecar tax. You want mesh security without mesh complexity. You're okay with a newer (but now GA) approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://istio.io/latest/docs/ambient/" rel="noopener noreferrer"&gt;Ambient Mode Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://istio.io/latest/news/releases/1.24.x/" rel="noopener noreferrer"&gt;Istio 1.24 Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚔️ The Showdown: Ambient vs Sidecar
&lt;/h2&gt;

&lt;p&gt;When should you stick with sidecars? When should you go ambient?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sidecar mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory: ~50-100MB per pod&lt;/li&gt;
&lt;li&gt;L7 features always available&lt;/li&gt;
&lt;li&gt;Sidecar must init before your app starts&lt;/li&gt;
&lt;li&gt;All-or-nothing migration per namespace&lt;/li&gt;
&lt;li&gt;5+ years in production, familiar debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ambient mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory: ~20MB per node (not per pod!)&lt;/li&gt;
&lt;li&gt;L7 features on-demand via waypoint&lt;/li&gt;
&lt;li&gt;No injection delay — pods start faster&lt;/li&gt;
&lt;li&gt;Gradual migration, per-workload&lt;/li&gt;
&lt;li&gt;GA since late 2024, newer tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The verdict:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose sidecar&lt;/strong&gt; when you need fine-grained L7 control on every pod, you're already running it successfully, or your team knows the debugging patterns cold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose ambient&lt;/strong&gt; when memory is tight, you want mesh security without the overhead, you're starting fresh, or you want to migrate gradually without downtime.&lt;/p&gt;

&lt;p&gt;Honestly? For new deployments in 2025+, ambient is the default choice. The sidecar tax was always the biggest complaint about service mesh — and now it's optional.&lt;/p&gt;




&lt;h2&gt;
  
  
  💎 The Hidden Gem: sloth
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What it is
&lt;/h3&gt;

&lt;p&gt;SLO monitoring without the PromQL PhD.&lt;/p&gt;

&lt;p&gt;You want error budgets. You want burn rate alerts. You want dashboards that show if you're meeting your 99.9% availability target. The standard approach: spend three days writing Prometheus recording rules, debug the math, hope you got the multi-window burn rate calculation right.&lt;/p&gt;

&lt;p&gt;sloth: write a YAML file, run one command, get everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payment-api&lt;/span&gt;
&lt;span class="na"&gt;slos&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;requests-availability&lt;/span&gt;
    &lt;span class="na"&gt;objective&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;99.9&lt;/span&gt;
    &lt;span class="na"&gt;sli&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;error_query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum(rate(http_requests_total{status=~"5.."}[{{.window}}]))&lt;/span&gt;
        &lt;span class="na"&gt;total_query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum(rate(http_requests_total[{{.window}}]))&lt;/span&gt;
    &lt;span class="na"&gt;alerting&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;page_alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
      &lt;span class="na"&gt;ticket_alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;sloth generate&lt;/code&gt; and you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus recording rules&lt;/li&gt;
&lt;li&gt;Prometheus alert rules (multi-window burn rates)&lt;/li&gt;
&lt;li&gt;Grafana dashboard JSON&lt;/li&gt;
&lt;li&gt;Proper error budget calculation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The math is correct. The windows are correct. You focus on "what's my SLO" instead of "how do I calculate burn rates."&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I like it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;One YAML → complete SLO monitoring stack&lt;/li&gt;
&lt;li&gt;Follows Google SRE book patterns exactly&lt;/li&gt;
&lt;li&gt;Works with any Prometheus setup&lt;/li&gt;
&lt;li&gt;The generated rules are readable — you can audit them&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/slok/sloth" rel="noopener noreferrer"&gt;GitHub: slok/sloth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sloth.dev/" rel="noopener noreferrer"&gt;sloth Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  👮 The Policy: Require Labels
&lt;/h2&gt;

&lt;p&gt;Copy this, apply it, and watch your platform governance improve overnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;Labels aren't documentation — they're contracts. Without enforced labels, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost allocation that's impossible ("which team owns this $50K/month workload?")&lt;/li&gt;
&lt;li&gt;Access control that's broken (RBAC by label doesn't work if labels are missing)&lt;/li&gt;
&lt;li&gt;Incident response that's slow ("who do I page for this failing deployment?")&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The policy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kyverno.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;require-labels&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Require Labels&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Best Practices&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;medium&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validationFailureAction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enforce&lt;/span&gt;
  &lt;span class="na"&gt;background&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;require-team-label&lt;/span&gt;
      &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;any&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
      &lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Labels&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'team',&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'cost-center',&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'environment'&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;are&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;required."&lt;/span&gt;
        &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;team&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?*"&lt;/span&gt;
              &lt;span class="na"&gt;cost-center&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?*"&lt;/span&gt;
              &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to roll it out
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Start with &lt;code&gt;validationFailureAction: Audit&lt;/code&gt; — see what would be blocked&lt;/li&gt;
&lt;li&gt;Fix your existing deployments&lt;/li&gt;
&lt;li&gt;Switch to &lt;code&gt;Enforce&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Watch your platform team breathe easier&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kyverno.io/policies/" rel="noopener noreferrer"&gt;Kyverno Policy Library&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛠️ The One-Liner: kubectl debug
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl debug &lt;span class="nt"&gt;-it&lt;/span&gt; my-pod &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox &lt;span class="nt"&gt;--target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your pod runs a distroless image. No shell. No curl. No nothing. How do you debug it?&lt;/p&gt;

&lt;p&gt;This command injects an ephemeral container into the running pod. Same network namespace. Same filesystem view. Full debugging power.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Check DNS resolution from inside the pod&lt;/li&gt;
&lt;li&gt;Inspect files in a distroless image&lt;/li&gt;
&lt;li&gt;Run tcpdump without rebuilding&lt;/li&gt;
&lt;li&gt;Test connectivity to other services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Works on Kubernetes 1.25+. No pod restart required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pro tip
&lt;/h3&gt;

&lt;p&gt;Add &lt;code&gt;--share-processes&lt;/code&gt; to see the target container's process tree. Great for debugging stuck applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/" rel="noopener noreferrer"&gt;Ephemeral Containers Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;🍇 &lt;strong&gt;Podo Stack&lt;/strong&gt; — Ripe for Prod.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>platformengineering</category>
      <category>kyverno</category>
    </item>
    <item>
      <title>Spegel, Pixie, and Why :latest Is Evil</title>
      <dc:creator>Ilia Gusev</dc:creator>
      <pubDate>Wed, 21 Jan 2026 09:48:37 +0000</pubDate>
      <link>https://forem.com/persikbl/spegel-pixie-and-why-latest-is-evil-1bk6</link>
      <guid>https://forem.com/persikbl/spegel-pixie-and-why-latest-is-evil-1bk6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post was originally published on &lt;a href="https://podostack.com" rel="noopener noreferrer"&gt;Podo Stack&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Welcome to the first issue of Podo Stack. No fluff, no hype — just tools that are actually ripe for prod.&lt;/p&gt;

&lt;p&gt;This week: a CNCF sandbox project that makes your cluster smarter about pulling images, an eBPF tool that sees your decrypted traffic (yes, really), a Kyverno policy you can deploy in 30 seconds, and a one-liner that gives you "terraform plan" for Kubernetes.&lt;/p&gt;

&lt;p&gt;Let's get into it.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Sandbox Watch: Spegel
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What it is
&lt;/h3&gt;

&lt;p&gt;P2P container image caching for Kubernetes. Nodes share images directly with each other, no registry involved.&lt;/p&gt;

&lt;p&gt;Here's the problem. You're scaling up your deployment — maybe 50 new pods need to start. Every single node goes to your registry and pulls the same image. At the same time. Your NAT gateway chokes. Docker Hub rate-limits you. Your cloud bill spikes from egress traffic.&lt;/p&gt;

&lt;p&gt;Spegel fixes this with a dead-simple idea: what if nodes just shared images they already have?&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;Spegel runs as a DaemonSet. When a node pulls an image, Spegel indexes its layers and announces to the cluster: "Hey, I've got this one." When another node needs that image, it asks Spegel first. If someone in the cluster has it — boom, local transfer at 10 Gbps instead of crawling through the internet.&lt;/p&gt;

&lt;p&gt;If nobody has it? Falls back to the external registry like normal.&lt;/p&gt;

&lt;p&gt;The best part? It's stateless. No database, no PVC, no separate storage to manage. It piggybacks on containerd's existing cache. Install and forget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I like it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;One Helm chart. That's the setup.&lt;/li&gt;
&lt;li&gt;No changes to your pod specs. Spegel works at the containerd level — it's transparent to your workloads.&lt;/li&gt;
&lt;li&gt;Handles Docker Hub rate limits gracefully (the image only gets pulled once per cluster).&lt;/li&gt;
&lt;li&gt;More nodes = more cache = faster pulls. It gets better as you scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to use it
&lt;/h3&gt;

&lt;p&gt;You're on containerd (most clusters are). You scale workloads frequently. You're tired of paying egress fees or hitting registry limits. You don't want to operate Harbor or Dragonfly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/spegel-org/spegel" rel="noopener noreferrer"&gt;GitHub: spegel-org/spegel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/projects/spegel/" rel="noopener noreferrer"&gt;CNCF Sandbox&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💎 The Hidden Gem: Pixie
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What it is
&lt;/h3&gt;

&lt;p&gt;eBPF-powered observability that requires zero code changes. Install it, and two minutes later you have a service map of your entire cluster.&lt;/p&gt;

&lt;p&gt;Most observability follows the same painful pattern: add a library, redeploy, wait for data. Pixie skips all of that.&lt;/p&gt;

&lt;p&gt;It uses eBPF to intercept data at the kernel level. HTTP requests, SQL queries, DNS lookups — Pixie sees them all. Automatically. Without touching your code.&lt;/p&gt;

&lt;p&gt;But here's the killer feature: &lt;strong&gt;it sees decrypted TLS traffic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you've ever tried to debug mTLS traffic in a service mesh, you know the pain. Wireshark shows garbage. Logs are empty because the developer forgot error handling. You're blind.&lt;/p&gt;

&lt;p&gt;Pixie intercepts data &lt;em&gt;before&lt;/em&gt; it hits the SSL library. You get the actual request body, in plain text, even when it's encrypted on the wire.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-world use case
&lt;/h3&gt;

&lt;p&gt;Your payment service throws 500 errors. Logs say nothing. Metrics show increased latency but no obvious cause.&lt;/p&gt;

&lt;p&gt;Old way: Add logging, rebuild, redeploy, wait for the error to happen again. Hope your logging catches it.&lt;/p&gt;

&lt;p&gt;Pixie way: Open the console, run &lt;code&gt;px/http_data&lt;/code&gt;, filter by status code 500. You see the exact request body and the SQL query that was running when it failed. Time to resolution: 2 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other tricks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Continuous profiling without recompiling. Your Go/Rust/Java service is burning CPU? Pixie builds a flamegraph in real time. You'll see the exact function causing trouble.&lt;/li&gt;
&lt;li&gt;PxL scripting. It's like Python with Pandas, but for your cluster telemetry. Query anything.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;px deploy                                                
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Seriously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://px.dev/" rel="noopener noreferrer"&gt;px.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pixie-io/pixie" rel="noopener noreferrer"&gt;GitHub: pixie-io/pixie&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  👮 The Policy: Disallow :latest Tags
&lt;/h2&gt;

&lt;p&gt;Copy this, apply it, and save yourself from future headaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;:latest&lt;/code&gt; is a lie. It doesn't mean "latest" — it means "whatever happened to be built last time someone didn't specify a tag." The image changes without the tag changing. Your Tuesday deployment works fine. Your Wednesday deployment breaks. Same manifest, different image.&lt;/p&gt;

&lt;p&gt;Three nodes in your cluster might have three different versions of &lt;code&gt;nginx:latest&lt;/code&gt; cached. You'll spend hours debugging why "the same pod" behaves differently on different nodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The policy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kyverno.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;disallow-latest-tag&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Disallow Latest Tag&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Best Practices&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;medium&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validationFailureAction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enforce&lt;/span&gt;
  &lt;span class="na"&gt;background&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;require-specific-image-tag&lt;/span&gt;
      &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;any&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
      &lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;':latest'&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tag&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;allowed.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;specific&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;version&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tag."&lt;/span&gt;
        &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;=(initContainers)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;!*:latest"&lt;/span&gt;
            &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;!*:latest"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;validationFailureAction: Enforce&lt;/code&gt; — blocks pods that violate the rule. Use &lt;code&gt;Audit&lt;/code&gt; if you just want warnings.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;image: "!*:latest"&lt;/code&gt; — the &lt;code&gt;!&lt;/code&gt; means "NOT". Any image except &lt;code&gt;:latest&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Kyverno auto-applies this to Deployments, StatefulSets, Jobs — anything that creates pods.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test before enforcing
&lt;/h3&gt;

&lt;p&gt;Start with &lt;code&gt;Audit&lt;/code&gt; mode. Check what would be blocked. Then switch to &lt;code&gt;Enforce&lt;/code&gt; when you're confident.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kyverno.io/docs/" rel="noopener noreferrer"&gt;Kyverno Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kyverno.io/policies/" rel="noopener noreferrer"&gt;Policy Library&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛠️ The One-Liner: flux diff
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;flux diff kustomization my-app &lt;span class="nt"&gt;--path&lt;/span&gt; ./clusters/prod/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is "terraform plan" for Kubernetes.&lt;/p&gt;

&lt;p&gt;You're about to merge a PR that changes your deployment. What actually changes in the cluster? With this command, you know before it happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it's better than kubectl diff
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Understands Flux Kustomization CRDs&lt;/li&gt;
&lt;li&gt;Handles SOPS-encrypted secrets (masks values in output)&lt;/li&gt;
&lt;li&gt;Filters out noisy fields like &lt;code&gt;status&lt;/code&gt; that change constantly&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Bonus — diff everything
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;flux diff kustomization &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shows what would change across all kustomizations in your cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pro tip
&lt;/h3&gt;

&lt;p&gt;Add this to your CI pipeline. Post the diff as a PR comment. Reviewers see the actual Kubernetes changes, not just YAML line diffs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://fluxcd.io/flux/cmd/flux_diff_kustomization/" rel="noopener noreferrer"&gt;Flux CLI Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>sre</category>
      <category>kyverno</category>
    </item>
  </channel>
</rss>
