<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Manuchim Oliver</title>
    <description>The latest articles on Forem by Manuchim Oliver (@mxnuchim).</description>
    <link>https://forem.com/mxnuchim</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1056340%2F07ef8bbd-e574-492b-99b2-a5b351d1dd42.jpeg</url>
      <title>Forem: Manuchim Oliver</title>
      <link>https://forem.com/mxnuchim</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mxnuchim"/>
    <language>en</language>
    <item>
      <title>Terraform Provisioners: The Most Misunderstood Feature in IaC</title>
      <dc:creator>Manuchim Oliver</dc:creator>
      <pubDate>Sun, 22 Mar 2026 11:22:51 +0000</pubDate>
      <link>https://forem.com/mxnuchim/terraform-provisioners-the-most-misunderstood-feature-in-iac-po6</link>
      <guid>https://forem.com/mxnuchim/terraform-provisioners-the-most-misunderstood-feature-in-iac-po6</guid>
      <description>&lt;p&gt;Most engineers don’t &lt;em&gt;start&lt;/em&gt; with Terraform provisioners.&lt;/p&gt;

&lt;p&gt;They arrive there naturally.&lt;/p&gt;

&lt;p&gt;You provision an EC2 instance.&lt;br&gt;&lt;br&gt;
You SSH into it.&lt;br&gt;&lt;br&gt;
You install what you need.&lt;/p&gt;

&lt;p&gt;Then you think:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why not automate this part too?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So you reach for provisioners.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;remote-exec&lt;/code&gt; to run commands
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;file&lt;/code&gt; to copy scripts
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;local-exec&lt;/code&gt; to glue workflows together
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And for a moment — everything feels clean and automated.&lt;/p&gt;

&lt;p&gt;Until it doesn’t.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Moment Things Break
&lt;/h2&gt;

&lt;p&gt;You update your script.&lt;/p&gt;

&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;And… nothing happens.&lt;/p&gt;

&lt;p&gt;No commands run.&lt;br&gt;
No changes applied.&lt;br&gt;
No errors.&lt;/p&gt;

&lt;p&gt;Just silence.&lt;/p&gt;




&lt;p&gt;This is the moment most people think something is broken.&lt;/p&gt;

&lt;p&gt;But nothing is broken.&lt;/p&gt;

&lt;p&gt;Terraform is doing exactly what it was designed to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Misunderstanding
&lt;/h2&gt;

&lt;p&gt;Terraform is &lt;strong&gt;declarative&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It cares about the &lt;em&gt;state of infrastructure&lt;/em&gt; — not the steps to configure it.&lt;/p&gt;

&lt;p&gt;Provisioners, on the other hand, are &lt;strong&gt;imperative&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;They introduce instructions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run this command&lt;/li&gt;
&lt;li&gt;Copy this file&lt;/li&gt;
&lt;li&gt;Execute this script&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a completely different model.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Provisioners Actually Are
&lt;/h2&gt;

&lt;p&gt;Provisioners are not part of your normal workflow.&lt;/p&gt;

&lt;p&gt;They are:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Lifecycle hooks that run &lt;strong&gt;once&lt;/strong&gt;, during resource creation (or destruction).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;They are not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuous configuration tools&lt;/li&gt;
&lt;li&gt;Script runners&lt;/li&gt;
&lt;li&gt;Update mechanisms&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Three Types (Quick Context)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. local-exec
&lt;/h3&gt;

&lt;p&gt;Runs on your local machine.&lt;/p&gt;

&lt;p&gt;Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logging&lt;/li&gt;
&lt;li&gt;Triggering external systems&lt;/li&gt;
&lt;li&gt;Quick integrations&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. remote-exec
&lt;/h3&gt;

&lt;p&gt;Runs on the instance via SSH.&lt;/p&gt;

&lt;p&gt;Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bootstrapping&lt;/li&gt;
&lt;li&gt;Installing packages&lt;/li&gt;
&lt;li&gt;Initial setup&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. file
&lt;/h3&gt;

&lt;p&gt;Copies files to the instance.&lt;/p&gt;

&lt;p&gt;Usually paired with &lt;code&gt;remote-exec&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;None of these are inherently bad.&lt;/p&gt;

&lt;p&gt;The problem is how they’re used.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Senior Engineers Avoid Overusing Provisioners
&lt;/h2&gt;

&lt;p&gt;It’s not about rules. It’s about experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. They Don’t Rerun
&lt;/h3&gt;

&lt;p&gt;Provisioners run only during creation.&lt;/p&gt;

&lt;p&gt;If you change the script, Terraform won’t care.&lt;/p&gt;

&lt;p&gt;To rerun them, you have to:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
terraform taint aws_instance.example&lt;br&gt;
terraform apply&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You’re now destroying infrastructure just to rerun a script.&lt;/p&gt;

&lt;p&gt;That’s friction — and a signal.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. They Depend on SSH
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;remote-exec&lt;/code&gt; and &lt;code&gt;file&lt;/code&gt; require connectivity.&lt;/p&gt;

&lt;p&gt;That introduces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network dependencies&lt;/li&gt;
&lt;li&gt;Timing issues&lt;/li&gt;
&lt;li&gt;Authentication complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At scale, this becomes fragile.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. They Break the Declarative Model
&lt;/h3&gt;

&lt;p&gt;Terraform is designed to describe &lt;em&gt;what should exist&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Provisioners introduce &lt;em&gt;how things should happen&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That shift seems small — but it compounds quickly.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. They Don’t Scale Cleanly
&lt;/h3&gt;

&lt;p&gt;What works for one instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Doesn’t work the same for 10&lt;/li&gt;
&lt;li&gt;Or 100&lt;/li&gt;
&lt;li&gt;Or across environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Provisioners don’t give you consistency guarantees.&lt;/p&gt;




&lt;h2&gt;
  
  
  So When Should You Use Them?
&lt;/h2&gt;

&lt;p&gt;Provisioners are still useful — when used intentionally.&lt;/p&gt;

&lt;p&gt;Good use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick bootstrapping in prototypes&lt;/li&gt;
&lt;li&gt;Small automation gaps&lt;/li&gt;
&lt;li&gt;One-time setup tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full configuration management&lt;/li&gt;
&lt;li&gt;Ongoing system changes&lt;/li&gt;
&lt;li&gt;Production-critical workflows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Better Alternatives
&lt;/h2&gt;

&lt;p&gt;Instead of pushing everything into Terraform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;user_data / cloud-init&lt;/strong&gt; for instance initialization&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Packer&lt;/strong&gt; to bake images&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;configuration management tools&lt;/strong&gt; for system setup&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;SSM&lt;/strong&gt; for remote execution without SSH&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tool has a clear responsibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Shift
&lt;/h2&gt;

&lt;p&gt;The biggest lesson isn’t about provisioners.&lt;/p&gt;

&lt;p&gt;It’s about thinking in layers.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I make this run in Terraform?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Where does this responsibility belong?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure → Terraform&lt;/li&gt;
&lt;li&gt;Instance setup → cloud-init / images&lt;/li&gt;
&lt;li&gt;Configuration → dedicated tools&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Provisioners are not the problem.&lt;/p&gt;

&lt;p&gt;Misusing them is.&lt;/p&gt;

&lt;p&gt;A senior engineer doesn’t avoid tools blindly —&lt;br&gt;
they understand the boundaries where each tool is strongest.&lt;/p&gt;

&lt;p&gt;And design systems that respect those boundaries.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>sre</category>
      <category>terraform</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>Blue-Green Deployment on AWS: Step-by-Step Guide to Zero-Downtime Releases (2026 guide)</title>
      <dc:creator>Manuchim Oliver</dc:creator>
      <pubDate>Wed, 04 Mar 2026 20:13:35 +0000</pubDate>
      <link>https://forem.com/mxnuchim/blue-green-deployment-on-aws-step-by-step-guide-to-zero-downtime-releases-2026-guide-56oa</link>
      <guid>https://forem.com/mxnuchim/blue-green-deployment-on-aws-step-by-step-guide-to-zero-downtime-releases-2026-guide-56oa</guid>
      <description>&lt;p&gt;&lt;em&gt;Your Deployment Just Took Down Production. Again. Here's How to Never Let That Happen.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;It was a Thursday afternoon. The kind where you're mentally halfway out the door, maybe already thinking about the weekend.&lt;/p&gt;

&lt;p&gt;Then Slack lights up.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Hey… the app is down."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You deployed thirty minutes ago. A "small" hotfix. Two lines. "It'll be fine."&lt;/p&gt;

&lt;p&gt;If you've been in production engineering long enough, you've lived this story. If you haven't yet — you will. The question isn't &lt;em&gt;whether&lt;/em&gt; a bad deployment will happen. The question is: &lt;strong&gt;when it does, how fast can you recover?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's the problem Blue-Green deployment solves. And today I'm walking you through exactly how to implement it on AWS using Elastic Beanstalk and Terraform — zero downtime, instant rollbacks, infrastructure-as-code from day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Blue-Green Actually Means (And Why Most Explanations Miss the Point)
&lt;/h2&gt;

&lt;p&gt;Most articles define it like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"You have two identical environments. Blue is live. Green is staging. You swap them."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Technically correct. Completely useless without context.&lt;/p&gt;

&lt;p&gt;Here's the mental model that actually sticks:&lt;/p&gt;

&lt;p&gt;Imagine your production environment is a patient on an operating table — heart beating, users connected, traffic flowing. Every deployment you push to that live environment is open-heart surgery &lt;em&gt;while the heart is still running&lt;/em&gt;. One wrong cut and the patient flatlines. 3am pages. Slack on fire. The works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blue-Green says: stop operating on the live patient.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead, spin up an &lt;em&gt;identical&lt;/em&gt; second patient — your green environment. Do all your surgery there. Test it. Benchmark it. Validate every edge case. When you're 100% confident, flip a switch. One DNS record change. Traffic moves from blue to green. The old patient sits warm and healthy as your fallback.&lt;/p&gt;

&lt;p&gt;Something goes wrong in production with the new version? Flip the switch back. &lt;strong&gt;Your previous version was never touched.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's the real power — not just "two environments." It's the ability to deploy with confidence because your escape hatch is always one click away.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: What We're Building
&lt;/h2&gt;

&lt;p&gt;Two fully independent Elastic Beanstalk environments, each with its own ALB, Auto Scaling group, health monitoring, and application version stored in S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────────────┐
│                 Elastic Beanstalk Application                │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────────────┐    ┌──────────────────────┐       │
│  │  Blue Environment    │    │  Green Environment   │       │
│  │  (Live Production)   │    │  (Staging / Next)    │       │
│  │  Version 1.0         │    │  Version 2.0         │       │
│  │  ALB + Auto Scaling  │    │  ALB + Auto Scaling  │       │
│  │  Health Checks       │    │  Health Checks       │       │
│  └──────────────────────┘    └──────────────────────┘       │
│             │                           │                   │
│             └─────────────┬─────────────┘                   │
│                           ▼                                 │
│               CNAME Swap ← this is the magic                │
└──────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "swap" is literally swapping two DNS CNAME records. Elastic Beanstalk handles this natively — one API call, no custom load balancer gymnastics needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Terraform Setup
&lt;/h2&gt;

&lt;p&gt;If it's not in code, it doesn't exist. Let's walk through the meaningful parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  IAM: The Foundation Nobody Talks About
&lt;/h3&gt;

&lt;p&gt;Before a single instance spins up, Beanstalk needs two distinct IAM roles — and confusing them is the #1 reason I see environments fail to provision.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Role for EC2 instances (so Beanstalk can manage them)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"eb_ec2_role"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.app_name}-eb-ec2-role"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ec2.amazonaws.com"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Attach the three managed policies Beanstalk needs&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy_attachment"&lt;/span&gt; &lt;span class="s2"&gt;"eb_web_tier"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eb_ec2_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;policy_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:iam::aws:policy/AWSElasticBeanstalkWebTier"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;eb_ec2_role&lt;/code&gt; lets instances do their job. The &lt;em&gt;service role&lt;/em&gt; (separate) lets Beanstalk itself make AWS API calls on your behalf — health reporting, managed updates, scaling events. Both are required. Most tutorials only mention one.&lt;/p&gt;

&lt;h3&gt;
  
  
  S3: Your App's Artifact Store
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"app_versions"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# Account ID in the name = globally unique without hardcoding&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.app_name}-versions-${data.aws_caller_identity.current.account_id}"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_public_access_block"&lt;/span&gt; &lt;span class="s2"&gt;"app_versions"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;app_versions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;block_public_acls&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;block_public_policy&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;ignore_public_acls&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;restrict_public_buckets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your deployment artifacts are not public content. Lock the bucket down from day one. The &lt;code&gt;aws_caller_identity&lt;/code&gt; data source ensures the bucket name is account-scoped — no manual uniqueness wrangling.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Blue Environment (Production)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_elastic_beanstalk_environment"&lt;/span&gt; &lt;span class="s2"&gt;"blue"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.app_name}-blue"&lt;/span&gt;
  &lt;span class="nx"&gt;application&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_elastic_beanstalk_application&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;version_label&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_elastic_beanstalk_application_version&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;tier&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"WebServer"&lt;/span&gt;

  &lt;span class="c1"&gt;# Rolling deploys: only redeploy 50% of instances at a time&lt;/span&gt;
  &lt;span class="nx"&gt;setting&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"aws:elasticbeanstalk:command"&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DeploymentPolicy"&lt;/span&gt;
    &lt;span class="nx"&gt;value&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Rolling"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;setting&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"aws:elasticbeanstalk:command"&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"BatchSize"&lt;/span&gt;
    &lt;span class="nx"&gt;value&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"50"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# Never fly blind&lt;/span&gt;
  &lt;span class="nx"&gt;setting&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"aws:elasticbeanstalk:healthreporting:system"&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"SystemType"&lt;/span&gt;
    &lt;span class="nx"&gt;value&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"enhanced"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"blue"&lt;/span&gt;
    &lt;span class="nx"&gt;Role&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"production"&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The green environment is structurally identical — same ALB, same scaling config, same health checks — with one difference: it points to &lt;code&gt;v2&lt;/code&gt; of the application. That's the whole point. &lt;strong&gt;Production parity is not optional.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Swap: One Command, Zero Downtime
&lt;/h2&gt;

&lt;p&gt;You've validated green. Smoke tests pass. Load tests pass. You've slept on it. Time to ship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws elasticbeanstalk swap-environment-cnames &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source-environment-name&lt;/span&gt; my-app-blue &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--destination-environment-name&lt;/span&gt; my-app-green &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Console:&lt;/strong&gt; Elastic Beanstalk → App → Blue Environment → Actions → &lt;em&gt;Swap Environment URLs&lt;/em&gt; → Select Green → Swap.&lt;/p&gt;

&lt;p&gt;Beanstalk modifies the Route 53 configuration. Within 60-90 seconds, traffic that was hitting your blue URL is now served by your green environment. The environment &lt;em&gt;names&lt;/em&gt; stay the same. The &lt;em&gt;URLs&lt;/em&gt; swap. Users experience nothing — no error pages, no dropped connections, no 502s.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Rollback That Doesn't Require a Hero
&lt;/h3&gt;

&lt;p&gt;Here's what makes this strategy genuinely production-grade: your rollback is identical to your deployment.&lt;/p&gt;

&lt;p&gt;Green is now production. Something's wrong — a memory leak that only appears under real user load, a third-party integration that behaves differently, anything. Run the swap again. Your v1 environment is still running, still healthy, still warm. That's a &lt;strong&gt;30-second rollback&lt;/strong&gt; with zero redeployment.&lt;/p&gt;

&lt;p&gt;No &lt;code&gt;terraform apply&lt;/code&gt;. No container rebuild. Just a DNS flip.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use Blue-Green (And When Not To)
&lt;/h2&gt;

&lt;p&gt;Blue-Green is not a universal answer. Part of being a senior engineer is knowing which tool fits the job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reach for Blue-Green when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero downtime is a hard requirement&lt;/li&gt;
&lt;li&gt;You need instant rollback capability (regulated industries, payment systems, healthcare)&lt;/li&gt;
&lt;li&gt;Your app is stateful or tightly coupled to a DB schema — gradual rollouts get complicated fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Canary Deployments when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want to validate with 5-10% of real traffic before full rollout&lt;/li&gt;
&lt;li&gt;You're doing ML model deployments or high-risk feature releases&lt;/li&gt;
&lt;li&gt;You have enough traffic volume to get statistically meaningful signal from a subset&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Rolling when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost is a hard constraint — Blue-Green effectively doubles your infrastructure spend during deployment windows&lt;/li&gt;
&lt;li&gt;Your background jobs make "two live versions simultaneously" operationally complex&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Cleanup Reminder (Seriously, Don't Skip This)
&lt;/h2&gt;

&lt;p&gt;Two full Elastic Beanstalk environments with load balancers run roughly $50-100/month. For a learning exercise, spin it up, validate the swap, tear it down.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Terraform code &lt;em&gt;is&lt;/em&gt; your infrastructure. You can recreate the whole thing in under 20 minutes. That's the point of infrastructure-as-code — your environment is disposable. Your &lt;em&gt;knowledge&lt;/em&gt; of it isn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Takeaway
&lt;/h2&gt;

&lt;p&gt;Blue-Green deployments aren't about Elastic Beanstalk. Or ECS. Or Kubernetes. The platform changes. The principle doesn't.&lt;/p&gt;

&lt;p&gt;The real takeaway is this: &lt;strong&gt;production deployments should be boring.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most dangerous deployment is the one that "should be fine." The hotfix at 4pm on a Thursday. The one-liner that touches the payments table. The change a developer calls "trivial."&lt;/p&gt;

&lt;p&gt;Boring means predictable. Boring means you have a plan when things go wrong — not &lt;em&gt;if&lt;/em&gt;. Boring means your on-call engineer isn't doing open-heart surgery on a live patient at 2am.&lt;/p&gt;

&lt;p&gt;Blue-Green gives you boring deployments. In production, boring is the highest compliment you can receive.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full Terraform source code in the repo linked below. Questions? Drop them in the comments — I read everything.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/mxnuchim/aws-terraform-blue-green-deployment" rel="noopener noreferrer"&gt;Code Repository&lt;/a&gt; &lt;/p&gt;




</description>
      <category>aws</category>
      <category>devops</category>
      <category>terraform</category>
      <category>cloudengineering</category>
    </item>
    <item>
      <title>Your Silent Superpower: Why Bash is Still the Most Dangerous Tool in Your Arsenal</title>
      <dc:creator>Manuchim Oliver</dc:creator>
      <pubDate>Thu, 29 Jan 2026 20:56:24 +0000</pubDate>
      <link>https://forem.com/mxnuchim/your-silent-superpower-why-bash-is-still-the-most-dangerous-tool-in-your-arsenal-728</link>
      <guid>https://forem.com/mxnuchim/your-silent-superpower-why-bash-is-still-the-most-dangerous-tool-in-your-arsenal-728</guid>
      <description>&lt;p&gt;I didn’t “learn Bash” this week.&lt;br&gt;
I remembered it.&lt;/p&gt;

&lt;p&gt;It was my first time doing "real" DevOps work, manually typing the same commands for the third time that week. &lt;code&gt;grep "ERROR" application.log&lt;/code&gt;. Then I'd count the errors with &lt;code&gt;grep -c "ERROR" application.log&lt;/code&gt;. Switch to system.log. Repeat. A senior engineer walked by, watched me for about 12 seconds, and said: "You know you can script that, right?"&lt;/p&gt;

&lt;p&gt;The conversation we had after that changed everything.&lt;/p&gt;

&lt;p&gt;Here's what nobody tells you about Bash scripting: It's not about being a programming wizard. It's about recognizing that &lt;strong&gt;if you're doing something more than twice, you're doing it wrong&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Real Power Isn't in the Code—It's in the Mindset Shift&lt;/p&gt;

&lt;p&gt;Let me show you what I mean. My daily log analysis used to look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check which log files changed in the last 24 hours (manual)&lt;/li&gt;
&lt;li&gt;Scan application.log for errors, fatal issues, critical alerts (manual)&lt;/li&gt;
&lt;li&gt;Repeat for system.log (manual)&lt;/li&gt;
&lt;li&gt;Mentally track everything (exhausting)&lt;/li&gt;
&lt;li&gt;Hope I don't get interrupted and lose my place (and I always did)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Time investment: 30-45 minutes&lt;/p&gt;

&lt;p&gt;Error rate: High (because humans aren't designed for repetitive tasks)&lt;/p&gt;

&lt;p&gt;Job satisfaction: Approaching zero&lt;/p&gt;

&lt;p&gt;But now? One command. Three seconds. A clean report that tells me if anything needs my immediate attention.&lt;/p&gt;

&lt;p&gt;The Journey from Commands to Intelligence&lt;/p&gt;

&lt;p&gt;What started as a simple script—just a few grep commands saved in a file—evolved into something genuinely intelligent. Here's what that progression looked like:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1:&lt;/strong&gt; Basic Automation&lt;br&gt;
Save the commands. Make them executable. Run once instead of ten times.&lt;br&gt;
&lt;strong&gt;Stage 2:&lt;/strong&gt; Smart Variables&lt;br&gt;
Stop hardcoding everything. Use variables for directories, file names, error patterns. Change one line instead of rewriting everything.&lt;br&gt;
&lt;strong&gt;Stage 3:&lt;/strong&gt; Dynamic Loops&lt;br&gt;
Why analyze two files when your script can detect and analyze every relevant file automatically? Loops transform rigid code into flexible automation.&lt;br&gt;
&lt;strong&gt;Stage 4:&lt;/strong&gt; Conditional Intelligence&lt;br&gt;
This is where it gets interesting. My script doesn't just dump data—it evaluates. More than 10 critical errors? It flags me immediately. Otherwise? Save the report and move on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Real Lesson:&lt;/strong&gt; Bash Scripts Are Living Documentation&lt;br&gt;
Here's something I didn't expect: my automation scripts became the best documentation our team ever had. New engineer joins? They read the backup script and immediately understand our backup strategy. Someone asks about our deployment process? The script tells the story better than any wiki ever could.&lt;/p&gt;

&lt;p&gt;This is what DevOps pioneers meant by "everything as code." It's not just about version control—it's about making your processes tangible, shareable, and improvable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What You Can Automate Today (No, Seriously—Today)&lt;/strong&gt;&lt;br&gt;
If you're thinking "this sounds great but I'm not a programmer," stop right there. Neither was I when I started. Here's what you can automate with basic Bash scripting:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environment setup:&lt;/strong&gt; New laptop? One script installs everything, configures your tools, clones your repos, and sets up your databases. From zero to productive in minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disk space management:&lt;/strong&gt; Automatically compress old logs, delete ancient ones, email you when space runs low. Set it and forget it.&lt;br&gt;
Deployment checks: Pre-deployment validation that runs every time, catches issues before they hit production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backup verification:&lt;/strong&gt; Don't just create backups—verify them. Automatically.&lt;/p&gt;

&lt;p&gt;The Business Case (Just in Case Your Manager Asks)&lt;br&gt;
Let's do the math:&lt;/p&gt;

&lt;p&gt;45 minutes daily on manual tasks × 20 work days = 15 hours per month&lt;br&gt;
15 hours × 12 months = 180 hours per year&lt;br&gt;
That's 4.5 weeks of work time spent on tasks a script can do in seconds&lt;/p&gt;

&lt;p&gt;And that's just one workflow. Multiply that across your team, across multiple repetitive tasks, and you're looking at hundreds of recovered hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start Small, Think Big&lt;/strong&gt;&lt;br&gt;
You don't need to automate everything tomorrow. Start with the task that annoys you most. That thing you groan about every time you have to do it? That's your first script.&lt;/p&gt;

&lt;p&gt;Mine was log analysis. Yours might be environment setup, or deployment, or backup validation, or test data generation. It doesn't matter what it is—what matters is that you start.&lt;/p&gt;

&lt;p&gt;Because here's the truth: in 2026, manual repetitive work isn't just inefficient. It's a waste of human potential. We have brains capable of solving complex problems, designing systems, and creating value. Using those brains to repeatedly type the same commands is like using a Ferrari to go get the mail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Bottom Line&lt;/strong&gt;&lt;br&gt;
Bash isn't just a tool—it's a mindset. It's the difference between being a human task-runner and being an engineer who builds systems that run tasks. It's the difference between spending your day in the weeds and spending your day solving actual problems.&lt;/p&gt;

&lt;p&gt;That senior engineer who showed me my first script? They gave me more than automation. They gave me time back. They gave me the mental space to think strategically instead of tactically. They gave me a superpower.&lt;/p&gt;

&lt;p&gt;And now I'm passing it on to you.&lt;br&gt;
Start scripting. Your future self will thank you.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>bash</category>
      <category>cloud</category>
      <category>linux</category>
    </item>
    <item>
      <title>From cronjobs to controllers: Building a production-grade Kubernetes Backup &amp; Restore Operator</title>
      <dc:creator>Manuchim Oliver</dc:creator>
      <pubDate>Sun, 25 Jan 2026 08:17:19 +0000</pubDate>
      <link>https://forem.com/mxnuchim/from-cronjobs-to-controllers-building-a-production-grade-kubernetes-backup-restore-operator-4g8h</link>
      <guid>https://forem.com/mxnuchim/from-cronjobs-to-controllers-building-a-production-grade-kubernetes-backup-restore-operator-4g8h</guid>
      <description>&lt;p&gt;There’s a moment every infrastructure engineer remembers.&lt;/p&gt;

&lt;p&gt;You’re calm. Confident. Someone asks, “Can we restore from last night’s backup?”&lt;br&gt;
You nod. Of course you can.&lt;/p&gt;

&lt;p&gt;Then you test the restore.&lt;/p&gt;

&lt;p&gt;The archive is incomplete. The job logs are gone. You’re not even sure when the backup last ran — only that a CronJob exists and no one has touched it in months.&lt;/p&gt;

&lt;p&gt;In that moment, “we run nightly backups” stops being a reassurance. It becomes a liability.&lt;/p&gt;

&lt;p&gt;This project started there — with the realization that backups are not a task. They’re a system, and systems demand design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why CronJobs Fail in Production (and Why We Pretend They Don’t)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CronJobs are Kubernetes’ sharpest double-edged sword. They’re easy to create and hard to operate.&lt;/p&gt;

&lt;p&gt;In real clusters, they introduce quiet failure modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Opacity:&lt;/strong&gt; kubectl get cronjob tells you that something is scheduled, not what actually happened&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent drift:&lt;/strong&gt; retention logic lives in shell scripts no one audits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restore anxiety:&lt;/strong&gt; partial writes, permission mismatches, and irreversible state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No lifecycle semantics&lt;/strong&gt;: success, failure, retries, ownership — all implied, none enforced&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams discover these problems during an incident. By then, it’s too late.&lt;/p&gt;

&lt;p&gt;I wanted to turn that uncertainty into confidence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design Goal: Make Backups a First-Class Kubernetes API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I’m a senior full-stack engineer who’s been intentionally ramping into SRE and platform engineering. One thing becomes obvious as you move closer to production systems:&lt;/p&gt;

&lt;p&gt;Reliability doesn’t come from tools.&lt;br&gt;
It comes from interfaces.&lt;/p&gt;

&lt;p&gt;So I set three non-negotiable principles for this operator:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Safety over convenience&lt;/li&gt;
&lt;li&gt;Observability over assumptions&lt;/li&gt;
&lt;li&gt;Automation over tribal knowledge&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result is a Kubernetes Backup &amp;amp; Restore Operator built with controller-runtime best practices and designed for real-world clusters — not demos.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;The Core Insight: Backups Should Be Resources, Not Side Effects&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Instead of scripts and schedules, this operator models backups as Kubernetes-native APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BackupPolicy — intent&lt;/li&gt;
&lt;li&gt;Backup — execution&lt;/li&gt;
&lt;li&gt;Restore — recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This single decision unlocks everything else.&lt;/p&gt;

&lt;p&gt;When backups are resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can kubectl get them&lt;/li&gt;
&lt;li&gt;You can kubectl describe them&lt;/li&gt;
&lt;li&gt;You can watch their status, conditions, and events&lt;/li&gt;
&lt;li&gt;You can reason about lifecycle, ownership, and safety&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Backups stop being something that happens.&lt;br&gt;
They become something you can operate.&lt;/p&gt;

&lt;p&gt;A Small Example with Big Implications&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: platform.example.com/v1
kind: BackupPolicy
metadata:
  name: daily-backups
spec:
  schedule: "0 2 * * *"   # daily at 02:00
  retention:
    keepLast: 3
  target:
    pvcSelector:
      matchLabels:
        app: postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn’t configuration glue. It’s an API contract.&lt;/p&gt;

&lt;p&gt;From this policy, the controller:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calculates the next run using cron parsing&lt;/li&gt;
&lt;li&gt;Schedules reconciliation using RequeueAfter (no polling)&lt;/li&gt;
&lt;li&gt;Spawns concrete Backup resources&lt;/li&gt;
&lt;li&gt;Enforces retention only after success&lt;/li&gt;
&lt;li&gt;The system does exactly what the user asked — and nothing more.&lt;/li&gt;
&lt;li&gt;Execution Model: Jobs, But with Guardrails&lt;/li&gt;
&lt;li&gt;Each Backup creates a Kubernetes Job with strict safety constraints:&lt;/li&gt;
&lt;li&gt;Source PVCs mounted read-only&lt;/li&gt;
&lt;li&gt;Backup artifacts written as tar.gz to shared storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Explicit phase transitions:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pending → Running → Completed | Failed&lt;/p&gt;

&lt;p&gt;No hidden state. No implicit success.&lt;/p&gt;

&lt;p&gt;Every transition is surfaced via:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.status.phase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Kubernetes Events&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Humans and automation see the same truth.&lt;/p&gt;

&lt;p&gt;Observability Isn’t Optional — It’s the Interface&lt;/p&gt;

&lt;p&gt;If you want operators to trust a system, it must explain itself.&lt;/p&gt;

&lt;p&gt;A completed backup tells a story:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Events:
  Normal  BackupStarted     Backup execution started
  Normal  JobCreated        Created backup job my-backup-job
  Normal  BackupCompleted   Backup completed successfully in 6s
  Normal  CleanupTriggered  Deleted 2 old backups (keepLast=3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is deliberate.&lt;/p&gt;

&lt;p&gt;No one should have to dig through Pod logs during a restore.&lt;br&gt;
The control plane should already know what happened.&lt;/p&gt;

&lt;p&gt;Retention as Policy, Not a Script&lt;/p&gt;

&lt;p&gt;Retention is where many systems quietly corrupt themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This operator treats retention as a post-success policy:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only Completed backups are eligible&lt;/li&gt;
&lt;li&gt;Running or failed backups are never touched&lt;/li&gt;
&lt;li&gt;Cleanup happens immediately after success&lt;/li&gt;
&lt;li&gt;Deletion is deterministic and auditable&lt;/li&gt;
&lt;li&gt;Retention stops being a best effort and becomes a guarantee.&lt;/li&gt;
&lt;li&gt;Restore Is a First-Class Concern (Not an Afterthought)&lt;/li&gt;
&lt;li&gt;Backups without restores are just storage costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Restores in this system:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can only reference completed backups&lt;/li&gt;
&lt;li&gt;Are validated before execution&lt;/li&gt;
&lt;li&gt;Run as tracked Jobs with explicit status&lt;/li&gt;
&lt;li&gt;Refuse unsafe operations by default&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This flips the mental model:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A restore is not an emergency script — it’s a rehearsed operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Back to SRE Principles (On Purpose)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google’s SRE discipline emphasizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reducing toil&lt;/li&gt;
&lt;li&gt;Making failure visible&lt;/li&gt;
&lt;li&gt;Designing systems that are safe by default&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Backups are a classic source of hidden toil.&lt;br&gt;
They only demand attention when they fail — usually during an incident.&lt;/p&gt;

&lt;p&gt;By modeling backups as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Observable&lt;/li&gt;
&lt;li&gt;Automated&lt;/li&gt;
&lt;li&gt;Policy-driven&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…you remove ambiguity and human error — exactly what SRE systems are meant to do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production Engineering Patterns Used&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project intentionally applies patterns you’d expect in mature controllers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Idempotent reconciliation — safe requeues and restarts&lt;/li&gt;
&lt;li&gt;OwnerReferences — automatic garbage collection&lt;/li&gt;
&lt;li&gt;Least-privilege RBAC — nothing more, nothing less&lt;/li&gt;
&lt;li&gt;Race-safe Job creation — no duplicate execution&lt;/li&gt;
&lt;li&gt;Terminal state enforcement — no half-finished resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren’t academic choices. They’re scars from operating systems at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current Limitations (and Why They’re Explicit)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Production systems earn trust by admitting what they don’t do yet.&lt;/p&gt;

&lt;p&gt;Planned improvements include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backup integrity verification (checksums)&lt;/li&gt;
&lt;li&gt;Restore guards for non-empty PVCs&lt;/li&gt;
&lt;li&gt;Prometheus metrics and SLO-driven alerts&lt;/li&gt;
&lt;li&gt;Automated restore drills and canarying&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each item is tracked intentionally — because reliability is a roadmap, not a checkbox.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Bigger Lesson&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project isn’t really about backups.&lt;br&gt;
It’s about treating operational workflows as products:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With APIs&lt;/li&gt;
&lt;li&gt;With UX&lt;/li&gt;
&lt;li&gt;With safety guarantees&lt;/li&gt;
&lt;li&gt;With observability as a feature&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can check it out yourself here: &lt;a href="https://github.com/mxnuchim/k8s-backup-dr-operator" rel="noopener noreferrer"&gt;Code Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re building platforms, enabling SRE teams, or tired of backups being a leap of faith — this is the shift that matters.&lt;/p&gt;

&lt;p&gt;Don’t ask whether backups run.&lt;br&gt;
Design systems that can prove they did.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>sre</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Kubernetes Is Not a Container Platform (And That Changes Everything)</title>
      <dc:creator>Manuchim Oliver</dc:creator>
      <pubDate>Sat, 10 Jan 2026 14:11:38 +0000</pubDate>
      <link>https://forem.com/mxnuchim/kubernetes-is-not-a-container-platform-and-that-changes-everything-4bpa</link>
      <guid>https://forem.com/mxnuchim/kubernetes-is-not-a-container-platform-and-that-changes-everything-4bpa</guid>
      <description>&lt;p&gt;&lt;strong&gt;Most people learn Kubernetes backwards.&lt;/strong&gt;&lt;br&gt;
We start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pods&lt;/li&gt;
&lt;li&gt;Deployments&lt;/li&gt;
&lt;li&gt;Helm charts&lt;/li&gt;
&lt;li&gt;Copy-pasting YAML&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But Kubernetes was never designed to be “a container orchestrator”.&lt;/p&gt;

&lt;p&gt;It was designed as:&lt;/p&gt;

&lt;p&gt;An extensible, declarative API backed by control loops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Core Idea&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes works like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You declare desired state (YAML / JSON)&lt;/li&gt;
&lt;li&gt;The API server stores it&lt;/li&gt;
&lt;li&gt;Controllers continuously reconcile reality to match it&lt;/li&gt;
&lt;li&gt;Nothing “runs” just because YAML exists.&lt;/li&gt;
&lt;li&gt;Controllers do the work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why CRDs Exist&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes only knows built-in types:&lt;br&gt;
Pods, Services, Nodes, etc.&lt;/p&gt;

&lt;p&gt;CRDs let you say:&lt;br&gt;
&lt;strong&gt;“Here is a new type of thing Kubernetes should understand.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kind: Backup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But CRDs alone do nothing.&lt;/p&gt;

&lt;p&gt;They’re nouns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Operators Exist&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Operators are controllers that understand your CRDs.&lt;/p&gt;

&lt;p&gt;They turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kind: Backup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jobs&lt;/li&gt;
&lt;li&gt;Snapshots&lt;/li&gt;
&lt;li&gt;S3 uploads&lt;/li&gt;
&lt;li&gt;Retention logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are verbs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Helm Isn’t Special&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Helm doesn’t “deploy apps”.&lt;/p&gt;

&lt;p&gt;It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Renders templates&lt;/li&gt;
&lt;li&gt;Outputs YAML&lt;/li&gt;
&lt;li&gt;Sends it to the Kubernetes API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;The intelligence lives inside controllers, not tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Mental Model That Changed Everything for Me&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An API&lt;/li&gt;
&lt;li&gt;A database (etcd)&lt;/li&gt;
&lt;li&gt;A set of controllers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Containers are just one workload type.&lt;/p&gt;

&lt;p&gt;Once I understood this:&lt;/p&gt;

&lt;p&gt;Operators stopped feeling scary&lt;/p&gt;

&lt;p&gt;Kubernetes felt simple (not easy — simple)&lt;/p&gt;

&lt;p&gt;And once you see it this way, you stop fighting YAML and start designing operators, CI/CD flows, and observability that actually work at scale.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>sre</category>
      <category>cloud</category>
    </item>
    <item>
      <title>A Day in the Life of a Lead Software Engineer</title>
      <dc:creator>Manuchim Oliver</dc:creator>
      <pubDate>Wed, 06 Sep 2023 14:39:33 +0000</pubDate>
      <link>https://forem.com/mxnuchim/a-day-in-the-life-of-a-lead-software-engineer-4o26</link>
      <guid>https://forem.com/mxnuchim/a-day-in-the-life-of-a-lead-software-engineer-4o26</guid>
      <description>&lt;p&gt;With a projected 24 percent growth by 2026, the software engineering field boasts stunning job prospects. If you’re interested in coding, software engineering is an industry you should consider in 2023, but what does an actual day in the life of a software engineer look like?&lt;/p&gt;

&lt;p&gt;Before we dive in, we should add two disclaimers: Obviously, the job varies day to day. Also, every company has its own culture and quirks.&lt;/p&gt;

&lt;p&gt;I started off as a Civil Engineer, but as time went by and I began exploring my passions, I realized I was doing the wrong line of work.&lt;/p&gt;

&lt;p&gt;Pursuing a career in software engineering was a daunting task at the time as I was fully engaged at my engineering job and later in my country’s National Youth Service Corps (NYSC) program. My approach was highly unconventional too. I never explicitly learned any programming languages. All I had was the determination to pick up projects I loved, read documentation, fail more times than I could count, and learn on the go. And somehow, it worked out.&lt;/p&gt;

&lt;p&gt;All I can say today is, it requires perseverance, curiosity, and a genuine desire to be a good software engineer.&lt;/p&gt;

&lt;p&gt;What quickly became apparent to me was that a background in civil engineering was extremely useful for a career in tech because I was already good at math (somewhat) and I’d picked up interpersonal communication at my job. I’d say that the additional responsibility of being a Lead comes with the additional requirement to be a good communicator.&lt;/p&gt;

&lt;p&gt;Anyway, on to the day-to-day stuff.&lt;/p&gt;

&lt;p&gt;Now this honestly varies for every company and is something I think about a lot, but this is my experience at my company.&lt;/p&gt;

&lt;p&gt;There are two distinct aspects of my day.&lt;/p&gt;

&lt;p&gt;The first is the time I spend actually writing code, building features and solving bugs. That time is incredibly fun, something I cherish during my day, and allows me to have stimulating conversations with coworkers and gain more experience writing code.&lt;/p&gt;

&lt;p&gt;It also tends to be more of a solo operation. The difficulty of professional software engineering is considering the architecture and side-effects of any code you’re going to write. Once you have a design and implementation ready to go, going off to write the code can be a relaxing and stimulating experience.&lt;/p&gt;

&lt;p&gt;Number two is my responsibilities as a team-lead, (less so as a regular engineer, but still relevant) is what I call gathering requirements and defending decisions.&lt;/p&gt;

&lt;p&gt;My job as a team lead is to provide insight into what my team can and can’t do in regards to pushing a product forward. That might mean a few ad-hoc meetings a day, or conversation with other team leads in regards to the system as a whole.&lt;/p&gt;

&lt;p&gt;I think that software engineering has a reputation as a non-interactive profession, but that couldn’t be further from the truth. I’m more successful at my job because I already had to interact with people in an engineering setting previously. Ultimately, the main strength of a developer is understanding business constraints in order to help the company be successful.&lt;/p&gt;

&lt;p&gt;I’ll give you a schedule that I usually follow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9 AM — 11:30&lt;/strong&gt; — I try to pick up what I was doing the day before, usually by looking at a failed test or note that I left myself to remind myself. It’s difficult to get all the context that you lost back, so leaving these little breadcrumbs to follow is largely a satisfying way to jump right back into it. This usually involves finishing up a feature, fixing a bug, writing a test, or looking at the priority for the day to determine what has to happen next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;11:30 AM — 2 PM&lt;/strong&gt; — Meetings with other team leads in order to determine priority start. Since we’re a startup, it’s hard to define what will be the most important thing from week to week, although we use Atlassian stack for some of these. When that’s working, it’s easy to know what to work on next, but there can be times when collaboration in my department is needed.&lt;/p&gt;

&lt;p&gt;I usually find some time for lunch in there somewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2 PM — 4&lt;/strong&gt; — I like to pair program, which basically means working on a single feature or bug with another engineer. We’d usually jump on a huddle on Slack to get this done. The general idea looks like this: One engineer thinks about the implementation, and the other writes the code. It’s a nice way to learn about the other engineer's perspective and a great time to pick up some new information and knowledge!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4 PM — 5&lt;/strong&gt; — I usually buckle down for the rest of the day, and work on features/bugs that are part of my product roadmap or the priorities that were set earlier in the day. I get a lot done here. The pressure of the day closing is a nice motivating factor to get loose ends tied up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5 PM — 5:30&lt;/strong&gt; — This is where I start thinking about the break point I can find in order to clean up from the day, and where I try to leave myself a starting point for the next day that I mentioned earlier.&lt;/p&gt;

&lt;p&gt;With all this said, there’s always something unexpected that comes up. I try to stick to a schedule, but the main thing I feel every day is that Software Development is a rewarding field that allows the people in it to directly contribute to a companies success or failure. It can be full of pressure sometimes, but that also leads to an immense amount of satisfaction.&lt;/p&gt;

&lt;p&gt;I’m happy to answer any questions you might have about my routine or time at Kiko!&lt;/p&gt;

&lt;p&gt;Questions and comments are always welcome. You can read more about me and some of my other articles &lt;a href="https://manuchimoliver.vercel.app" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>softwaredevelopment</category>
      <category>javascript</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
