<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Chikara Inohara</title>
    <description>The latest articles on Forem by Chikara Inohara (@chikarainohara).</description>
    <link>https://forem.com/chikarainohara</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3469032%2Fab2a9ebe-4bf1-4b2b-b125-80d53c688865.jpeg</url>
      <title>Forem: Chikara Inohara</title>
      <link>https://forem.com/chikarainohara</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/chikarainohara"/>
    <language>en</language>
    <item>
      <title>Deep Dive: How Proxmox Actually Keeps Your Cluster in Sync (Corosync &amp; pmxcfs Internals)</title>
      <dc:creator>Chikara Inohara</dc:creator>
      <pubDate>Sat, 07 Mar 2026 21:18:49 +0000</pubDate>
      <link>https://forem.com/chikarainohara/deep-dive-how-proxmox-actually-keeps-your-cluster-in-sync-corosync-pmxcfs-internals-5f7g</link>
      <guid>https://forem.com/chikarainohara/deep-dive-how-proxmox-actually-keeps-your-cluster-in-sync-corosync-pmxcfs-internals-5f7g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;em&gt;Fair warning: I'm still learning this stuff, so some details might not be 100% perfect. Take it as a fellow homelab explorer's notes, not official docs!&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In my last post, we talked about the &lt;strong&gt;outside view&lt;/strong&gt; of a Proxmox cluster — quorum, split-brain, and how Corosync's strict timeouts decide when a node is declared dead. We looked at token-passing and fencing from a bird's eye view.&lt;/p&gt;

&lt;p&gt;This time, let's crack open the hood and look &lt;strong&gt;inside&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Proxmox VE's cluster features are incredibly powerful, but for a lot of us, it feels like a black box. How does it &lt;em&gt;actually&lt;/em&gt; stay in sync? What happens byte-by-byte when you change a VM config?&lt;/p&gt;

&lt;p&gt;I went down a research rabbit hole diving into the source code of &lt;strong&gt;Corosync&lt;/strong&gt; and &lt;strong&gt;pmxcfs&lt;/strong&gt;, and here's what I found.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ Architecture Overview: Two Key Components
&lt;/h2&gt;

&lt;p&gt;Everything in a Proxmox cluster boils down to two layers working in tight coordination:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3z6y3ytfucbdu3jca2lu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3z6y3ytfucbdu3jca2lu.png" alt="Architecture diagram showing two Proxmox nodes. Each has a UI layer (Pveproxy + Pvedaemon), a cluster management layer (pmxcfs in RAM + Corosync), a VM layer, and a disk layer storing config.db. An arrow between nodes shows pmxcfs syncing via Corosync, with FUSE mounting the in-memory DB to the filesystem."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;📝 Note: This diagram is reused from &lt;a href="https://qiita.com/chikara_inohara/items/191fcebe191dfe5280fb" rel="noopener noreferrer"&gt;my original Japanese article on Qiita&lt;/a&gt; — too lazy to redraw it in English, sorry! The key things to spot: pmxcfs lives in RAM on each node, syncs between nodes via Corosync, and the SQLite DB is persisted to disk (shown as USB here — more on why that matters later!).&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Corosync (Totem Protocol)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Job&lt;/strong&gt;: Cluster membership management + message ordering guarantee&lt;/p&gt;

&lt;p&gt;It provides something called &lt;strong&gt;Virtual Synchrony&lt;/strong&gt; — every node receives messages in the exact same order. This is achieved through the token-passing mechanism we covered last time.&lt;/p&gt;

&lt;h3&gt;
  
  
  pmxcfs (Proxmox Cluster File System)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Job&lt;/strong&gt;: Manages all the config files you see under &lt;code&gt;/etc/pve&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here's the fun part — &lt;strong&gt;it's actually a SQLite database living in memory on each node&lt;/strong&gt;. It just &lt;em&gt;looks&lt;/em&gt; like a regular filesystem thanks to FUSE mounting. Wild, right?&lt;/p&gt;




&lt;h2&gt;
  
  
  🔄 Corosync / Totem Protocol: The Details
&lt;/h2&gt;

&lt;p&gt;The heart of Corosync is the &lt;strong&gt;Totem Single-Ring Protocol&lt;/strong&gt;. Regardless of your physical network topology, it creates a &lt;em&gt;logical ring&lt;/em&gt; of nodes and circulates a special packet called a &lt;strong&gt;token&lt;/strong&gt; around that ring to control who can send messages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token Passing
&lt;/h3&gt;

&lt;p&gt;Only the node currently holding the token is allowed to broadcast (multicast) a message. This elegantly prevents write conflicts — no two nodes can write simultaneously. Everything is serialized.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Node A → [Token] → Node B → [Token] → Node C → [Token] → back to Node A
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsdi8xqzxj8tt14aw3lb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsdi8xqzxj8tt14aw3lb.png" alt="Two diagrams of a 4-node ring. Left: node 4 holds the TOKEN and multicasts Message1 out to nodes 1, 2, and 3 simultaneously. Right: the token has moved to node 1, which now multicasts Message2 to all other nodes."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;📝 Another one from &lt;a href="https://qiita.com/chikara_inohara/items/191fcebe191dfe5280fb" rel="noopener noreferrer"&gt;the Japanese version&lt;/a&gt;! Left: node 4 holds the token and multicasts Message1 to all nodes. Right: the token has passed to node 1, which now multicasts Message2. Only the token holder gets to send — everyone else just listens.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ARU (All Received Up to)
&lt;/h3&gt;

&lt;p&gt;The token carries a sequence number called &lt;strong&gt;aru&lt;/strong&gt; — short for "All Received Up to." Think of it as a receipt: &lt;em&gt;"Everyone in the ring has confirmed they got messages up to this point."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When the token completes a full loop and comes back with an updated ARU, the original sender knows with certainty: &lt;strong&gt;everyone got it.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What Actually Happens When a Token Arrives (&lt;code&gt;totemsrp.c&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;Based on the Corosync source code (&lt;code&gt;exec/totemsrp.c&lt;/code&gt;), here's the processing order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Receive token&lt;/strong&gt; from the previous node&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retransmit check&lt;/strong&gt; — did I miss any messages? If so, request retransmission&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multicast send&lt;/strong&gt; — flush any pending messages (like pmxcfs config changes) out to the network&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update &amp;amp; pass&lt;/strong&gt; — increment the sequence number, hand the token to the next node&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  💾 How pmxcfs Syncs Data: The Journey of a Write
&lt;/h2&gt;

&lt;p&gt;Okay, here's where it gets &lt;em&gt;really&lt;/em&gt; interesting. What actually happens when you edit a VM config in the Proxmox UI?&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Write Request from Application
&lt;/h3&gt;

&lt;p&gt;A process like &lt;code&gt;pvedaemon&lt;/code&gt; writes to &lt;code&gt;/etc/pve/qemu-server/100.conf&lt;/code&gt;. This gets intercepted by FUSE and handed off to the &lt;code&gt;pmxcfs&lt;/code&gt; process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: CPG Broadcast via Corosync
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;pmxcfs&lt;/code&gt; bundles the change as a transaction and sends it through Corosync's &lt;strong&gt;CPG (Closed Process Group) API&lt;/strong&gt; — essentially asking Corosync to deliver this to every node in the cluster.&lt;/p&gt;

&lt;p&gt;The data sits in Corosync's send buffer, waiting for the token to come around.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Receive and &lt;strong&gt;Immediately Persist&lt;/strong&gt; ← This is the critical part
&lt;/h3&gt;

&lt;p&gt;When each node's &lt;code&gt;pmxcfs&lt;/code&gt; receives the transaction (including the original sender!), it does two things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Update in-memory SQLite DB&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The change is applied to the node's in-memory database instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;fsync()&lt;/code&gt; to disk&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the big one. pmxcfs immediately calls &lt;code&gt;fsync()&lt;/code&gt; on the backing SQLite file to flush it to the physical disk.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;fsync()&lt;/code&gt; &lt;strong&gt;blocks&lt;/strong&gt; until the OS confirms the data has been physically written to storage. No faking it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 4: Transaction Committed
&lt;/h3&gt;

&lt;p&gt;Once every node's &lt;code&gt;fsync()&lt;/code&gt; completes and the token comes back with an updated ARU, that transaction is officially &lt;strong&gt;committed cluster-wide&lt;/strong&gt;. Consistency guaranteed. ✅&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ Why Your System Disk I/O Matters More Than You Think
&lt;/h2&gt;

&lt;p&gt;Now the architectural picture should make the consequences clear: &lt;strong&gt;Proxmox's config sync waits for every node to finish writing to disk.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Domino Effect of Slow I/O
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Slow fsync on Node C
  → pmxcfs on Node C is blocked
    → Corosync process stalls
      → Token circulation delayed
        → Timeout triggered
          → Node declared dead 💀
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All because of a slow disk write. That's how tightly coupled these components are.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚠️ Homelab Warning: Watch Your System Disk!
&lt;/h3&gt;

&lt;p&gt;This is particularly nasty for common homelab setups:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cheap USB sticks&lt;/strong&gt; as boot media&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Old spinning HDDs&lt;/strong&gt; for the OS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network-attached storage&lt;/strong&gt; running the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;VM/container storage being slow? Usually fine. But &lt;strong&gt;Proxmox's own system disk&lt;/strong&gt; being slow? That can destabilize your entire cluster.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 You can measure this yourself! Proxmox ships with a built-in benchmark tool called &lt;code&gt;pveperf&lt;/code&gt;. Run it and check the &lt;strong&gt;fsync/s&lt;/strong&gt; number. In my own testing: a USB stick scored &lt;strong&gt;30–50 fsync/s&lt;/strong&gt;, while an SSD hit &lt;strong&gt;3,000+&lt;/strong&gt;. That's nearly a &lt;strong&gt;100x difference&lt;/strong&gt;!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  What to Actually Use
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Type&lt;/th&gt;
&lt;th&gt;Homelab OK?&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NVMe / SATA SSD&lt;/td&gt;
&lt;td&gt;✅ Great&lt;/td&gt;
&lt;td&gt;Ideal for system disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise SSD (with PLP)&lt;/td&gt;
&lt;td&gt;✅ Best&lt;/td&gt;
&lt;td&gt;Power-loss protection = extra safety&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.5" HDD&lt;/td&gt;
&lt;td&gt;⚠️ Okay-ish&lt;/td&gt;
&lt;td&gt;Watch for latency spikes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;USB stick&lt;/td&gt;
&lt;td&gt;❌ Avoid&lt;/td&gt;
&lt;td&gt;Way too slow for fsync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SD card&lt;/td&gt;
&lt;td&gt;❌ Avoid&lt;/td&gt;
&lt;td&gt;Same problem, often worse&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  📋 Summary
&lt;/h2&gt;

&lt;p&gt;Here's the full picture of what we covered:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Corosync / Totem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Token-passing ring, message ordering, membership&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ARU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Confirms all nodes received each message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;pmxcfs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;In-memory SQLite DB, FUSE-mounted as &lt;code&gt;/etc/pve&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;fsync()&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Blocks until data hits physical disk on every node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;System disk I/O&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Directly impacts cluster stability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Practical Takeaway
&lt;/h3&gt;

&lt;p&gt;When choosing hardware for a Proxmox cluster, most people think: CPU → RAM → Network → Storage. But for &lt;strong&gt;cluster stability&lt;/strong&gt;, you should actually be thinking about &lt;strong&gt;fsync latency&lt;/strong&gt; early in your planning.&lt;/p&gt;

&lt;p&gt;Even in a homelab, using a fast SSD for the system disk (not just VM storage) will make your cluster dramatically more stable.&lt;/p&gt;

&lt;p&gt;Pair this knowledge with the timeout tuning from the last post, and you'll have a much more resilient setup!&lt;/p&gt;




&lt;h2&gt;
  
  
  📚 References
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://corosync.github.io/corosync/doc/tocssrp95.pdf" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;The Totem Single-Ring Ordering and Membership Protocol (paper)&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/corosync/corosync" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Corosync Source Code (exec/totemsrp.c, lib/cpg.c)&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_cluster_network" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Proxmox VE Docs — Cluster Network&lt;/a&gt;
&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this useful, drop a ❤️! And if you spot anything I got wrong, please call it out in the comments — I'm still learning and corrections are very welcome 🙏&lt;/em&gt;&lt;/p&gt;

</description>
      <category>proxmox</category>
      <category>homelab</category>
      <category>devops</category>
      <category>learning</category>
    </item>
    <item>
      <title>🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab</title>
      <dc:creator>Chikara Inohara</dc:creator>
      <pubDate>Tue, 16 Sep 2025 13:31:33 +0000</pubDate>
      <link>https://forem.com/chikarainohara/the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab-1h2k</link>
      <guid>https://forem.com/chikarainohara/the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab-1h2k</guid>
      <description>&lt;h2&gt;
  
  
  📝 Introduction
&lt;/h2&gt;

&lt;p&gt;Setting up a Proxmox cluster feels like unlocking a new superpower, doesn't it? You get to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manage multiple servers from a single interface &lt;/li&gt;
&lt;li&gt;Live-migrate VMs like you're in The Matrix&lt;/li&gt;
&lt;li&gt;Feel like a proper sysadmin (even if you're just wearing pajamas)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fief31eezm9ei6122pybc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fief31eezm9ei6122pybc.png" alt="Cluster screenshot"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Please don't judge my messy cluster... pve1 decided to take a vacation and these VMs are just test dummies!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But here's the thing - I never really stopped to think about &lt;strong&gt;what's actually happening under the hood&lt;/strong&gt; to make all this magic work. It just... worked, you know?&lt;/p&gt;

&lt;p&gt;
  What changed my mind?
  &lt;br&gt;
Recently at work, I had to do some research on cluster technologies, and I fell down the rabbit hole of learning about &lt;strong&gt;Corosync&lt;/strong&gt; - the critical component that keeps Proxmox clusters from falling apart. It was one of those "aha!" moments where everything suddenly clicked!&lt;br&gt;


&lt;/p&gt;

&lt;p&gt;So today, let's dive into what I learned about Corosync, why it matters, and answer the big question for us homelabbers: &lt;strong&gt;"Should I actually care about this stuff?"&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🤝 What Exactly is Corosync?
&lt;/h2&gt;

&lt;p&gt;Think of Corosync as the &lt;strong&gt;nervous system of your cluster&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;It's the open-source software that lets all your Proxmox servers gossip with each other, constantly checking if everyone's still alive and sharing important updates. Without it, your cluster would be like a group chat where nobody knows if anyone else is online.&lt;/p&gt;

&lt;h3&gt;
  
  
  Corosync's Main Jobs:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;📋 Membership Management&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keeps track of who's in the club&lt;/li&gt;
&lt;li&gt;Knows exactly which nodes are active right now&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;💬 Messaging&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Makes sure commands reach all nodes&lt;/li&gt;
&lt;li&gt;"Hey everyone, we're starting VM 101 on node 3!"&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;⚖️ Quorum Management&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The "majority rules" system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This is the big one!&lt;/strong&gt; (More on this in a sec)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  ⚖️ Understanding "Quorum" - The Cluster's Democracy
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_quorum" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Deep dive into Proxmox Quorum docs&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;If you remember just one thing from this post, make it &lt;strong&gt;Quorum&lt;/strong&gt;. It's basically democracy for servers - decisions only happen when the majority agrees.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 The Dreaded "Split-Brain" Problem
&lt;/h3&gt;

&lt;p&gt;Let me paint you a picture of what could go wrong without quorum:&lt;/p&gt;

&lt;p&gt;Imagine you have a 4-node cluster, and suddenly your network has a bad day. The cluster splits into two groups of two nodes each.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjtxl04ro07gc0jjyamm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjtxl04ro07gc0jjyamm.png" alt="Split brain diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Without quorum rules, both groups would think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The other guys must have crashed!"&lt;/li&gt;
&lt;li&gt;"We're the real cluster now!"&lt;/li&gt;
&lt;li&gt;"Let's start all those VMs that were on the other nodes!"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result?&lt;/strong&gt; Both sides try to run the same VMs, write to the same storage, and basically create digital chaos. This nightmare scenario is called a &lt;strong&gt;split-brain&lt;/strong&gt;, and yes, it's as scary as it sounds! 😱&lt;/p&gt;

&lt;h3&gt;
  
  
  How Quorum Saves the Day
&lt;/h3&gt;

&lt;p&gt;The solution is elegantly simple:&lt;/p&gt;

&lt;p&gt;
  The Majority Rules
  &lt;br&gt;
&lt;strong&gt;Only the group with MORE than half the total votes can keep operating.&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Got 3 nodes and 2 are talking? ✅ You have quorum (2 &amp;gt; 1.5)&lt;/li&gt;
&lt;li&gt;Got 4 nodes and only 2 are talking? ❌ No quorum (2 = 2, not greater)&lt;/li&gt;
&lt;li&gt;Got 5 nodes and 3 are talking? ✅ You have quorum (3 &amp;gt; 2.5)
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;Any group without a majority goes into "safe mode" and stops all cluster operations. This is called &lt;strong&gt;fencing&lt;/strong&gt;, and while it might seem harsh, it's way better than data corruption!&lt;/p&gt;

&lt;p&gt;When you see this scary red X in Proxmox:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjjpmfexlevhc7enmffy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjjpmfexlevhc7enmffy.png" alt="No Quorum screenshot"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your node is basically saying: &lt;em&gt;"I'm in the minority, so I'm sitting this one out to avoid causing problems!"&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  💥 When Things Get Aggressive
&lt;/h3&gt;

&lt;p&gt;Nodes take "safety first" to the extreme. If a node loses contact with the cluster for too long (usually after a few tens of seconds), it might literally &lt;strong&gt;reboot itself&lt;/strong&gt; as a precaution!&lt;/p&gt;

&lt;p&gt;I learned this the hard way when a brief network hiccup caused one of my nodes to panic and restart. Not fun when you have important VMs running! &lt;/p&gt;

&lt;p&gt;You can watch the drama unfold in real-time in your system logs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoa222dzu9m70w2xo9ia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmoa222dzu9m70w2xo9ia.png" alt="System log screenshot"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  🔢 Why Odd Numbers are Your Friend
&lt;/h2&gt;

&lt;p&gt;Here's why everyone recommends an &lt;strong&gt;odd number of nodes&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Quorum=⌊Total Nodes2⌋+1
\text{Quorum} = \left\lfloor \frac{\text{Total Nodes}}{2} \right\rfloor + 1
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Quorum&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;&lt;span class="delimsizing size3"&gt;⌊&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Total Nodes&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;&lt;span class="delimsizing size3"&gt;⌋&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;Let me break it down with real examples:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Nodes&lt;/th&gt;
&lt;th&gt;Can Survive&lt;/th&gt;
&lt;th&gt;Why?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3 nodes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 failure&lt;/td&gt;
&lt;td&gt;2 remaining &amp;gt; 1.5 ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4 nodes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 failure&lt;/td&gt;
&lt;td&gt;2 remaining = 2 ❌ Risk of 2v2 split!&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5 nodes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2 failures&lt;/td&gt;
&lt;td&gt;3 remaining &amp;gt; 2.5 ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;
  The takeaway?
  &lt;br&gt;
Even numbers = potential 50/50 splits = bad times

&lt;p&gt;Stick with 3, 5, or 7 nodes for a happier cluster life!&lt;br&gt;
&lt;/p&gt;

&lt;br&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  🏢 The "Enterprise-Grade" Setup (aka Overkill for Most of Us)
&lt;/h2&gt;

&lt;p&gt;If you're running mission-critical stuff, here's what the pros recommend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redundant dedicated networks&lt;/strong&gt; for Corosync&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separate physical switches&lt;/strong&gt; just for cluster traffic
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple NICs&lt;/strong&gt; on each node&lt;/li&gt;
&lt;li&gt;Basically, treat Corosync traffic like it's made of gold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a homelab? Yeah... probably not happening. But it's good to know what "best practice" looks like!&lt;/p&gt;




&lt;h2&gt;
  
  
  🏡 The Realistic Homelab Approach
&lt;/h2&gt;

&lt;p&gt;Here's what I'm actually running (and it works fine!):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4ve44cusz1kvhiyyn4b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4ve44cusz1kvhiyyn4b.png" alt="Homelab network diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everything goes through a single NIC per node - management, VM traffic, Corosync, the works. Is it perfect? Nope. Does it work? Absolutely!&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚠️ Watch Out For These Gotchas:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Network Saturation&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't try to migrate VMs while uploading ISOs while backing up while... you get it&lt;/li&gt;
&lt;li&gt;I've definitely made my cluster unhappy by being too ambitious with simultaneous transfers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cheap Switches&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;That $20 switch might save money but could cause random cluster hiccups&lt;/li&gt;
&lt;li&gt;Invest in something decent if you're having stability issues&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;My advice?&lt;/strong&gt; Start simple with single NICs. Only add complexity when you actually hit problems!&lt;/p&gt;




&lt;h2&gt;
  
  
  🤔 "But I Only Have 2 Nodes!"
&lt;/h2&gt;

&lt;p&gt;A 2-node cluster isn't great for High Availability (since losing one = losing quorum), but it's totally fine if you just want easier management!&lt;/p&gt;

&lt;h3&gt;
  
  
  The Emergency Recovery Trick
&lt;/h3&gt;

&lt;p&gt;When one node dies in a 2-node cluster, here's your lifeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if you've lost quorum&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;pvecm status
&lt;span class="c"&gt;# Output: Quorum: No 😱&lt;/span&gt;

&lt;span class="c"&gt;# Tell the surviving node it's now a 1-node cluster&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;pvecm expected 1

&lt;span class="c"&gt;# Check again&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;pvecm status
&lt;span class="c"&gt;# Output: Quorum: Yes 🎉&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;
  Pro tip: QDevice to the rescue!
  &lt;br&gt;
You can also add a &lt;strong&gt;QDevice&lt;/strong&gt; - basically a tiny third voter (like a Raspberry Pi) that breaks ties in 2-node clusters. It's a bit more complex to set up, but worth investigating if you're stuck with 2 nodes long-term.

&lt;p&gt;Check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://forum.proxmox.com/threads/2-node-ha-with-external-qdevice.135429/" rel="noopener noreferrer"&gt;Proxmox Forum Discussion on QDevice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support" rel="noopener noreferrer"&gt;Official Proxmox Wiki on QDevice&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;br&gt;
&lt;br&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  💭 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;So that's what I've learned about Corosync - the unsung hero keeping our Proxmox clusters from descending into chaos!&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;TL;DR&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand &lt;strong&gt;Quorum&lt;/strong&gt; (majority rules!)&lt;/li&gt;
&lt;li&gt;Keep your &lt;strong&gt;network stable&lt;/strong&gt; (especially latency)&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;odd numbers&lt;/strong&gt; of nodes when possible&lt;/li&gt;
&lt;li&gt;Don't overthink it for a homelab&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The beauty of homelabbing is learning enterprise concepts and then figuring out what actually matters for your setup. You don't need redundant 10Gb networks and enterprise switches - you just need to understand the principles and adapt them to your reality (and budget)!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your cluster setup like?&lt;/strong&gt; Are you running the recommended odd number of nodes, or living dangerously with an even number? Let me know in the comments!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this helpful? Drop a ❤️ and follow for more homelab adventures and my Devops learning adventures too! I'm always breaking things and (usually) fixing them, so there's plenty more to come!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>proxmox</category>
      <category>beginners</category>
      <category>learning</category>
      <category>virtualmachine</category>
    </item>
    <item>
      <title>My 2-Year Journey to Becoming a DevOps Engineer - The Roadmap</title>
      <dc:creator>Chikara Inohara</dc:creator>
      <pubDate>Sun, 31 Aug 2025 09:07:58 +0000</pubDate>
      <link>https://forem.com/chikarainohara/my-2-year-journey-to-becoming-a-devops-engineer-the-roadmap-5k0</link>
      <guid>https://forem.com/chikarainohara/my-2-year-journey-to-becoming-a-devops-engineer-the-roadmap-5k0</guid>
      <description>&lt;p&gt;Hello everyone! 👋&lt;/p&gt;

&lt;p&gt;I usually write about my homelab setup on Qiita (a Japanese blog site), but today, I want to share something different—a major professional goal I'm setting for myself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'm officially starting a two-year journey to become a DevOps Engineer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't just about learning new tech. It's a public commitment. I plan to document my progress, my struggles, and my victories right here. Think of it as a captain's log for my career voyage. This first post is about sharing the map I'll be using to navigate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why DevOps?
&lt;/h2&gt;

&lt;p&gt;What draws me to DevOps is the holistic approach to the software lifecycle. I'm fascinated by the idea of using tools like Infrastructure as Code (IaC) to automate everything, building resilient and reliable systems from the ground up. It's about bridging the gap between development and operations, and I want to be that bridge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Inspiration Behind This Journey
&lt;/h2&gt;

&lt;p&gt;Before diving into my roadmap, I want to share what sparked this ambitious plan. I came across several YouTube videos that not only inspired me but also helped crystallize my approach to this career transition:&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 "How to become a DevOps Engineer in 2025"
&lt;/h3&gt;

&lt;p&gt;This video is a comprehensive roadmap that provides a structured approach to becoming a DevOps Engineer. It covers everything from setting up a home lab and mastering Linux fundamentals to diving deep into containers, programming, cloud technologies, and Kubernetes. The creator also emphasizes the importance of soft skills and even touches on the role of AI in the future of DevOps, making this an invaluable guide for my journey.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/8s0DWeHuEaw"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  💪 "My Self-Taught Coding Story"
&lt;/h3&gt;

&lt;p&gt;This personal and inspiring story of a career change from a non-technical background (hospital worker) to a software engineer, and eventually a Developer Relations Engineer, was a huge motivation. It's a powerful reminder that with dedication and a willingness to learn, a career transformation like this is entirely possible.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/eFJGyT3C-Y0"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;
  Why these videos matter
  &lt;br&gt;
These resources didn't just give me technical knowledge—they gave me the confidence that with a structured plan and consistent effort, this career transition is absolutely achievable. Each video addressed different aspects of my journey: the technical roadmap, the process understanding, and the human element of career transformation.&lt;br&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  The 2-Year Goal: A Four-Phase Roadmap
&lt;/h2&gt;

&lt;p&gt;My journey is broken down into four distinct phases. This roadmap will be my guide and my promise to myself.&lt;/p&gt;

&lt;h3&gt;
  
  
  📚 Phase 1: Building the Foundation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Timeline: First 6 Months&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;
  Goal &amp;amp; Strategy
  &lt;br&gt;
&lt;strong&gt;Goal:&lt;/strong&gt; Master the fundamentals of Linux, Networking, and AWS. Make output on GitHub a daily, natural habit.

&lt;p&gt;&lt;strong&gt;Strategy:&lt;/strong&gt; Start with the basics that every DevOps engineer needs. Focus on understanding rather than memorization.&lt;br&gt;
&lt;/p&gt;

&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Deliverables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Earn the LPIC-1 certification&lt;/li&gt;
&lt;li&gt;✅ Earn the CCNA certification&lt;/li&gt;
&lt;li&gt;✅ Earn the AWS Certified Cloud Practitioner certification&lt;/li&gt;
&lt;li&gt;✅ Maintain a well-organized GitHub profile with study scripts and notes&lt;/li&gt;
&lt;li&gt;✅ Document learning journey through weekly blog posts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ☁️ Phase 2: Cloud, IaC, and Containers in Practice
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Timeline: Months 7-12&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;
  Goal &amp;amp; Strategy
  &lt;br&gt;
&lt;strong&gt;Goal:&lt;/strong&gt; Move beyond theory to practical application. Leave manual infrastructure setup behind and learn to containerize applications.

&lt;p&gt;&lt;strong&gt;Strategy:&lt;/strong&gt; Every piece of infrastructure should be code. Every application should be containerized. No exceptions.&lt;br&gt;
&lt;/p&gt;

&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Deliverables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Earn the AWS Certified Solutions Architect - Associate (SAA) certification&lt;/li&gt;
&lt;li&gt;✅ Manage AWS infrastructure entirely with Terraform code&lt;/li&gt;
&lt;li&gt;✅ Write custom Dockerfiles for at least 5 different application types&lt;/li&gt;
&lt;li&gt;✅ Create a multi-container application with Docker Compose&lt;/li&gt;
&lt;li&gt;✅ Implement GitOps practices in personal projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔄 Phase 3: Building an Automated Pipeline
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Timeline: Months 13-18&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;
  Goal &amp;amp; Strategy
  &lt;br&gt;
&lt;strong&gt;Goal:&lt;/strong&gt; Understand Kubernetes and build a complete CI/CD pipeline that automates everything from code commit to deployment.

&lt;p&gt;&lt;strong&gt;Strategy:&lt;/strong&gt; Build real pipelines for real projects. Learn by breaking things and fixing them.&lt;br&gt;
&lt;/p&gt;

&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Deliverables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ A fully functional CI/CD pipeline (GitHub Actions/Jenkins)&lt;/li&gt;
&lt;li&gt;✅ Deploy applications to a Kubernetes cluster&lt;/li&gt;
&lt;li&gt;✅ Implement blue-green and canary deployments&lt;/li&gt;
&lt;li&gt;✅ Create a detailed guide on setting up a Kubernetes cluster at home&lt;/li&gt;
&lt;li&gt;✅ Contribute to open-source DevOps projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  📊 Phase 4: SRE Practices and Job Hunting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Timeline: Months 19-24&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;
  Goal &amp;amp; Strategy
  &lt;br&gt;
&lt;strong&gt;Goal:&lt;/strong&gt; Learn to monitor and improve the reliability of the systems I've built. Polish my portfolio and begin the job search.

&lt;p&gt;&lt;strong&gt;Strategy:&lt;/strong&gt; Think like an SRE. Measure everything. Automate everything. Document everything.&lt;br&gt;
&lt;/p&gt;

&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Deliverables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Build a monitoring dashboard using Prometheus and Grafana&lt;/li&gt;
&lt;li&gt;✅ Implement alerting with PagerDuty/Opsgenie&lt;/li&gt;
&lt;li&gt;✅ Create chaos engineering experiments&lt;/li&gt;
&lt;li&gt;✅ Develop SLIs, SLOs, and error budgets for personal projects&lt;/li&gt;
&lt;li&gt;✅ Polish GitHub portfolio with 10+ production-ready projects&lt;/li&gt;
&lt;li&gt;✅ Start applying for DevOps positions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯 The First 90 Days: Breaking It Down
&lt;/h2&gt;

&lt;p&gt;
  Week 1-2: Environment Setup
  &lt;ul&gt;
&lt;li&gt;Set up home lab with virtualization → done&lt;/li&gt;
&lt;li&gt;Configure Git and GitHub → done&lt;/li&gt;
&lt;li&gt;Start daily commits habit
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;
  Week 3-4: Linux Deep Dive
  &lt;ul&gt;
&lt;li&gt;Master basic commands and file system → already learned at work but recap&lt;/li&gt;
&lt;li&gt;Learn shell scripting basics&lt;/li&gt;
&lt;li&gt;Understand permissions and processes&lt;/li&gt;
&lt;li&gt;Practice with systemd and services
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;
  Week 5-8: Networking Fundamentals
  &lt;ul&gt;
&lt;li&gt;OSI model and TCP/IP stack → already learned at work but recap&lt;/li&gt;
&lt;li&gt;Subnetting and VLANs → already learned at work but recap&lt;/li&gt;
&lt;li&gt;DNS, DHCP, and routing → already learned at work but recap&lt;/li&gt;
&lt;li&gt;Hands-on with virtual networks
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;
  Week 9-12: AWS Foundations
  &lt;ul&gt;
&lt;li&gt;Core services (EC2, S3, VPC)&lt;/li&gt;
&lt;li&gt;IAM and security best practices&lt;/li&gt;
&lt;li&gt;Cost optimization strategies&lt;/li&gt;
&lt;li&gt;Prepare for Cloud Practitioner exam
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 Success Metrics
&lt;/h2&gt;

&lt;p&gt;I'm setting clear, measurable goals to track my progress:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Success Rate=Completed MilestonesTotal Planned Milestones×100%
\text{Success Rate} = \frac{\text{Completed Milestones}}{\text{Total Planned Milestones}} \times 100\%
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Success Rate&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Total Planned Milestones&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Completed Milestones&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;100%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Monthly Targets:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📝 4 technical blog posts&lt;/li&gt;
&lt;li&gt;💻 20+ GitHub commits&lt;/li&gt;
&lt;li&gt;📚 40 hours of structured learning&lt;/li&gt;
&lt;li&gt;🔨 2 practical projects completed&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 Tools &amp;amp; Resources I'll Be Using
&lt;/h2&gt;

&lt;p&gt;
  My Tech Stack
  &lt;br&gt;
&lt;strong&gt;Learning Platforms:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Homelab&lt;/li&gt;
&lt;li&gt;Udemy&lt;/li&gt;
&lt;li&gt;YouTube&lt;/li&gt;
&lt;li&gt;Official documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hands-On Labs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Free Tier&lt;/li&gt;
&lt;li&gt;Home Lab (Proxmox/Kubernates)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Core Tools to Master:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Version Control:&lt;/strong&gt; Git, GitHub&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IaC:&lt;/strong&gt; Terraform, Ansible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Containers:&lt;/strong&gt; Docker, Kubernetes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD:&lt;/strong&gt; GitHub Actions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring:&lt;/strong&gt; Prometheus, Grafana, ELK Stack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud:&lt;/strong&gt; AWS (primary), Azure, GCP (basics)
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 What Makes This Different?
&lt;/h2&gt;

&lt;p&gt;This isn't just another "learn DevOps" post. Here's what I'm committed to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Public Accountability&lt;/strong&gt;: Weekly progress updates right here&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real Projects&lt;/strong&gt;: Everything I learn gets applied to actual projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Source&lt;/strong&gt;: All my learning materials and projects will be open source&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community First&lt;/strong&gt;: I'll help others who are starting their journey&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://github.com/ChikaraInohara" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Follow my journey on GitHub - All resources will be open source!&lt;/a&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  🤝 Join Me on This Journey
&lt;/h2&gt;

&lt;p&gt;
  Want to follow along?
  &lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Star&lt;/strong&gt; my GitHub repository for updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follow&lt;/strong&gt; me here on DEV and on X for weekly progress posts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect&lt;/strong&gt; on LinkedIn for professional updates
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;I'm sharing this roadmap to hold myself accountable and to connect with others who might be on a similar path. If you have advice, encouragement, or just want to follow along, I'd love to hear from you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let the journey begin!&lt;/strong&gt; 🚀&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Drop a comment below with your thoughts or advice! Are you on a similar journey? What resources have helped you the most?&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Next Post Preview:&lt;/strong&gt; &lt;em&gt;"Week 1: Setting Up My DevOps Home Lab - A Complete Guide"&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow me to get notified when it drops!&lt;/em&gt; 🔔&lt;/p&gt;

</description>
      <category>devops</category>
      <category>careerdevelopment</category>
      <category>aws</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
